Regex - Remove specific HTML Tags

I have already found a solution in the forum to remove all html tags but I need some specific tags - img, a, b, i, u - and also their closing tags - </a>, </b>.
The regex also needs to differ between img without the class attribute and with class attribute - it should remove elements with class attribute
So I have tried to modifiy the found solution:
result = Regex.Replace(result, "<[^(img|a|b|i|u)][^>]*>", " ");
It works not optimal because it also removes the closing tags, doesn't differ and doesn't remove the br tags. It's not necessary to do all these actions in one statement.

you can use regular
<[(/body|html)\s]*>
in c#:
var result = Regex.Replace(html, @"<[(/body|html)\s]*>", "");
<html>
<body>
< / html>
< / body>

Similar Messages

  • Need to copy Data from a specific Html Tag

    Hello,
    I am trying to use CF to access website and capture data from a specific tag to the end of that tag and store same in a csv file or database.
    The tag based search of an open file is where I am not able to get any head way. Any one has done this?

    You'll need to use a regular expression for that. CF supports regular expressions with the REFind, REFindNoCase and REReplace functions. Here's an example of using regular expressions to capture the value within an HTML tag:
    http://www.javamex.com/tutorials/regular_expressions/example_scraping_html.shtml
    It's in Java, but the syntax for regular expressions is the same in CF.
    Dave Watts, CTO, Fig Leaf Software
    http://www.figleaf.com/
    http://training.figleaf.com/
    Fig Leaf Software is a Veteran-Owned Small Business (VOSB) on
    GSA Schedule, and provides the highest caliber vendor-authorized
    instruction at our training centers, online, or onsite.
    Read this before you post:
    http://forums.adobe.com/thread/607238

  • RegExp for replacing specific HTML tags.

    Hi All,
    I need to replace some of the HTML tags in flex.
    for eg:-
    in the follwing html i have to replace "span","div" and "a"
    <h2>AAAAAAAAA<br /><span >BBBBBBBBBBB</span></h2><div><p>QQQQQQ</p><p>EE<br />FFFF<br />TTT<br /></p
    <a href="#">Click</a>
    </div>

    input = "yor html string";
    var urlPattern:RegExp = new RegExp("/(<a)(.*)(</a)/","ig");
    var result:String = input.replace(urlPattern, "<b$2</b"); 
    var urlPattern2:RegExp = new RegExp("/(<div)(.*)(</div)/","ig");
    var result2:String = result.replace(urlPattern2, "<table$2</table"); 
    and so on....
    hope this helps ...

  • Removing specific XML tag in XSLT mapping

    Hi there,
    I've asked before about a XML to string XSLT mapping and the answers provided here helped me to successfully do that mapping! Thanks a lot!
    I'm using the following mapping to convert a string back to XML.
    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:m="http://my.namespace.com">
         <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="no"/>
         <xsl:template match="/">
              <xsl:for-each select="//m:my_tag">
                   <xsl:value-of select="." disable-output-escaping="yes"/>
              </xsl:for-each>
         </xsl:template>
    </xsl:stylesheet>
    But now, I'm having a problem converting back from string to XML. The response tag "m:mytag" has a string like this:
    "<?xml version="1.0" encoding="UTF-8"?><tag1><tag2>Data</tag2></tag1>".
    And  when I use the XSLT mapping shown above, the output file comes like this:
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml version="1.0" encoding="UTF-8"?>
    <tag1>
    <tag2>Data</tag2>
    </tag1>
    As you can see, the initial <?xml ...> tag is duplicated, and it generates a parsing error in XI.
    How can I eliminate one of the "<?xml version="1.0" encoding="UTF-8"?>" strings in the mapping?
    Thanks a lot.

    Wow!!!
    The output="html" actually worked on XML Spy!
    Removing the XSLT file initial tag didn't work.
    I had already resolved this problem using the replace-string method that I found here: http://aspn.activestate.com/ASPN/Cookbook/XSLT/Recipe/65426
    But your method is way more elegant and efficient. :o)
    I'll test the html method on XI, but I'm almost sure it'll work too.

  • How to remove HTML tags from a String ?

    Hello,
    How can I remove all HTML Tags from a String ?
    Would you please to give me a simple example ?
    Best regards,
    Eric

    Here's some code I cooked up. I have created an object that processes code so that it can be incorporated directly into a project. There is some redundancy so that the it can be used in more than one way. Depending on your situation you might have to make the condition statement a little more sophisticated to catch stray ">" tags.
    I have also included a Tester application.
    //This removes Html tags from a String either by submitting the String during construction and then
    // calling getProcessedString() or
    // by simply calling " stringwithoutTags=removeHtmlTags(stringWithTagsSubmission); "
    //Note: This code assumes that all"<" tags are accompanied by a ">" tag in the proper order.
    public class HtmlTagRemover
         private String stringSubmission,processedString,stringBeingProcessed;
         private int indexOfTagStart,indexOfTagEnd;
         public HtmlTagRemover()
         public HtmlTagRemover(String s)
              removeHtmlTags(s);          
         public String removeHtmlTags(String s)
              stringSubmission=s;
              stringBeingProcessed=stringSubmission;
              removeNextTag();
              return processedString;
         private void removeNextTag()
              checkForNextTag();
              while((!(indexOfTagStart==-1||indexOfTagEnd==-1))&<indexOfTagEnd)
                   removeTag();
                   checkForNextTag();
              processedString=stringBeingProcessed;
         private void checkForNextTag()
              indexOfTagStart=stringBeingProcessed.indexOf("<");
              indexOfTagEnd=stringBeingProcessed.indexOf(">");
         private void removeTag()
              StringBuffer sb=new StringBuffer("");
              sb.append(stringBeingProcessed);
              sb.delete(indexOfTagStart,indexOfTagEnd+1);
              stringBeingProcessed=sb.toString();
         public String getProcessedString()
              return processedString;
         public String getLastStringSubmission()
              return stringSubmission;
    public class HtmlRemovalTester
         static void main(String[] args)
              String output;
              HtmlTagRemover h=new HtmlTagRemover();
              output="The processed String: "+h.removeHtmlTags("<Html tag>This is a test<another Html tag> string<yet another Html tag>.");
              output=output+"\n"+" The original string:"+h.getLastStringSubmission();
              System.out.print(output);

  • Remove HTML tags in text

    Hi,
    I have to read some text from a text editor, that can be formatted for example with Bold, which means that when I execute the function to read its content, it returns something like this:
    Do you know how can I remove these HTML tags from the text?
    Thanks in advance.
    Regards,
    Sónia Gonçalves

    Hi,
    Something like this should do the trick.
    report  ztag.
    data: v_data type char30 value '<H>blablabla</H>'.
    if v_data(1) = '<' and
      v_data cs '>'.
    * Remove the HTML opening header
      shift v_data left up to '>'.
      shift v_data left.
    * Remove the HTML closing header
      shift v_data right up to '<'.
      shift v_data right.
      shift v_data left deleting leading space.
    endif.
    write: / v_data.
    Regards,
    Darren

  • Fastest way to remove html tags except url in href from string using java

    Hi All,
    Please suggest the, fastest way to remove html tags (stripe) except url in href of an anchor tag from string using java.
    Please help me with the best solution as I use parser but it's taking time to remove the html tags from string of file.
    I want the program should give the performance as 1 millisecond for 2kb file.
    Please help me out... Thanks in advance

    Hi,
    how can I replace the anchor tag in a string, by the url in the href of that anchor tag by using jsoup,
    e. g.
    <code>
    suppose input text is :
    test, string using, dsfg, 1:14 PM, < a t a r ge t="_ablank" s t y l e = " color: red" h r e f = " h t t p : / / t e s t u r l . c o m / i n d e x . j s p ? a = 1 2 3 4 " > s u p p o r t < / a >, s c h e d u l a r t a g , < a t a r g e t = " _ vbblank " s t y l e = " c o l o r : g r e e n " h r e f = " h t t p : / / t e s t u r l g r e e n . c o m / i n d e x . j s p ? a = a s d f a s df 4 " > s u pp o r t r e q < / a > a s d f pq r
    then out put text should be :
    test, string using, dsfg, 1:14 PM, http://testurl.com/index.jsp?a=1234, schedular tag, http://testurlgreen.com/index.jsp?a=asdfasdf4 asdf pqr
    </code>
    Please help at the earliest..
    Thanks in advance
    as this text editor is not supporting html anchor tag the example is not displaying correctly
    Edited by: 976815 on Dec 17, 2012 5:17 AM

  • Remove HTML tag

    Hi all,
    Given a string consist of string encoded in HTML. Any standard Function Module to remove the HTML code?
    eg, Change "<HTML><B>Sample Text</B></HTML>" to "Sample Text"
    Any suggestion/comments are welcome!
    Thanks
    Best regards,
    Prakesh.

    Hi Prakesh,
    To remove the HTML tags from a string, use the following sample formula:
    whileprintingrecords;
    stringvar sample := {table.stringfield};
    numbervar counter := ubound(split(sample,"<"))-1;
    numbervar i;
    for i := 1 to counter do(
    numbervar openbracket := instr(sample,"<");
    numbervar closebracket := instr(sample,">");
    sample := left(sample,openbracket-1) & mid(sample,closebracket+1));
    sample;
    ====================
    NOTE:
    This formula removes all text between the '<' and '>' characters. Adjustments may be required if only some tags should be removed, or if the '<' or '>' characters appear by themselves in the original string.
    ====================
    Thanks & Regards,
    Sarita Singh Rathour
    Edited by: Sarita Rathour on Jul 24, 2009 6:06 AM
    Edited by: Sarita Rathour on Jul 24, 2009 6:07 AM
    Edited by: Sarita Rathour on Jul 24, 2009 6:09 AM

  • Suppressing certain HTML tags before setting text to JEditorPane

    Sir,
    I am setting the text(html format) for JEditorPane using the setText(String) method. But I need to suppress all the <IMG> tags that are present in this text before setting it to JEditorPane. Is there any way in which I can write my extended HTMLEditorKit wherein I can check for specific HTML.Tag and prevent it from getting added to the document.
    Can you please help me out with some example.
    Regards,
    Alex

    Instead of trying to extend the HTMLEditorKit, you don't you add a normal method to scan the text for the <IMG> tag and get rid of it yourself before putting it in the JEditorPane using the setText() method. You can try something like this:
    public String removeImgTag(String text) {
       String tmp=text.toLowerCase();
       int i=tmp.indexOf("<img");
       if (i<0) return text;
       int j=tmp.substring(i).indexOf(">");
       return tmp.substring(0,i-1)+tmp.substring(i+j);
    };o)
    V.V.

  • Function to remove ALL HTML codes

    I have a field to extract from primavera which is in BLOB format in the database and pump it into a view.
    we have injected codes to remove the HTML tags such as </> and a few other ASCII stuff ( got the help of a developer on that ACII stuff hence I don't know how to explain pardon me) but we still have surprises like &nbsp and &amp appearing depending on what is entered in primavera.
    how do I eliminate ALL html codes? currently the code looks like this:
    REPLACE(REPLACE(REGEXP_REPLACE(utl_raw.cast_to_varchar2(dbms_lob.substr(tm.task_memo)), '<[^>]+>'),CHR(13),''),CHR(10),'') AS Narratives
    Thank you.

    Your welcome, please do not forget to mark it as answered.
    If you can examine the string in the database, you can use dump to see which character is at the end of the line.
    In this sample I have placed a chr(0) at the end of the string to show it:
    Decimal:
    select dump('testing'||chr(0),8) from dual;
    Output: Typ=1 Len=8: 164,145,163,164,151,156,147,0
    Hex:
    select dump('testing'||chr(0),16) from dual;
    Output: Typ=1 Len=8: 74,65,73,74,69,6e,67,0
    Edited by: specdev on 6-aug-2012 5:08
    Answered without receiving helpful or correct answer points :-( but we make somebody happy today :-)

  • Problem removing html tags from the text retrived

    Hi there,
    I am using jdbc to connect the database and retriving the data. In one of the columns along with the description there are some html tags in few of the recors of that column. is there a way to retrive the text only ignoring the html tags in between. Or can i retrive and then strip off the html code in the text to display only normal text.
    example of the data retrived which are pipe seperated and one of the columns has html tags in it:
    209|The euphoria |187945-2|http://www.abc/lst.jsp?mktgChannel=I86023&sku=18791-2&siteID=qpF0HYnRugA|http://www.abc.com/assets/images/product/medium/18793-2_198.jpg|Rooftop Singers: Walk Right In | abc Music proudly presents THE FOLK YEARS, an unforgettable era in music history!<BR><BR><B>Featuring:</B><BR>
    <LI>The most complete collection of folk and folk-rock songs ever put together -- 132 classics!
    <LI>Original hits by the original artists!
    Now i need to remove the tags before displaying this on the output. Is there a simple way to do this.
    Thanks...

    Did you read the documentation of the trim() method,
    where it describes which whitespace it removes?I believe his problem is that
    "Some text here  
    <blah> 
    More text"becomes
    "Some text here  
    More text"... and he wants ...
    "Some text here
    More text"So, your problem is that your regex isn't matching whitespace as well.
    See the "Trimming Whitespace" section:
    http://www.regular-expressions.info/examples.html

  • How to remove html tags from a column

    Hi
    Problem is this: I get a column with a comma separated list of id's and I can successfully parse these id's and use them elsewhere. BUT, occasionally there are html tags within that id list like this:
    1082471,1237423<br xmlns="http://www.w3.org/1999/xhtml" />
    Is there a way to just automatically remove all tags from a column? Could do this with regex, but since there is no support, I don't know what to do.

    Hi,
    If the HTML can be detected by a starting symbol like „<“, then you could use the following:
    Unfortuntely the operation “ReplaceRange” is only available on a Text-level, so you have to invoke a function (at least to my knowledge). You also need an Index-column in your table, so if you don’t have it, you need to create one as well.
    This is your function:
    let
       fnRemoveHTML = (Value, Index) =>
    let
       Source = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
       IndeNo = Index,
       Value_ = Source{IndeNo-1}[Value],
       length = Text.Length(Text.From(Value_)),
       position = Text.PositionOf(Text.From(Value_), "<"),
       range = length-position,
       new= if Value_ is number then Value_ else Text.ReplaceRange(Value_, position, range, "")
    in
        new
    in
      fnRemoveHTML
    And this is how you invoke it:
    let
        Quelle = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
        Last = Table.AddColumn(Quelle, "Custom", each fn_RemoveHTML([Value], [Index])),
        ChangedType = Table.TransformColumnTypes(Last,{{"Custom", type number}})
    in
        ChangedType
    Provided your table is called “Tabelle1” & the column with your values to be replaced “Value” & your index-col “Index”
    Imke

  • How to REMOVE [b] H1 Tag[/b] from HTML without changing other Tags?

    Hi there!
    I'm searching for a method to remove Tags from HTML (using HTMLEditorKit, HTMLDocument ...).
    My current code is as follows:
    // first get the whole paragraph
       int iCaretPos = tpMyTextPane.getCaretPosition();
       Object oAttrib;
       HTMLDocument.BlockElement oElem = (HTMLDocument.BlockElement)oMyDocument.getParagraphElement(iCaretPos);
       AttributeSet oAttribs;
       SimpleAttributeSet oNewAttribs;
       int iParaStart = oElem.getStartOffset();
       int iParaEnd = oElem.getEndOffset();
       tpMyTextPane.select(iParaStart, iParaEnd);
       // the following only fetches the Tags that are valid for the whole paragraph!!!!!
       oAttribs = tpMyTextPane.getCharacterAttributes();
       oNewAttribs = new SimpleAttributeSet(oAttribs);
       if(iParaEnd - iParaStart > 0)
          // now analyse the attributes (remove all paragraph-tags)
          for(int iIndex = 0; iIndex < oaOurFormatTags.length; iIndex++)
             oNewAttribs.removeAttribute(oaOurFormatTags[iIndex]);
          if(iParaEnd - iParaStart > 0)
             oMyDocument.setCharacterAttributes(iParaStart, iParaEnd - iParaStart, oNewAttribs, true);
             tpMyTextPane.setCaretPosition(iCaretPos);
          tpMyTextPane.requestFocus();
          tpMyTextPane.repaint();
       }This code works for me, but all Tags of the selected paragraph are removed. That means:
    <P><H1>This is a <B>test</B> text<H2></P>
    will be converted to:
    <P>This is a test text</P>
    but I want it to be converted to:
    <P>This is a <B>test</B> text</P>
    Is there any other method to remove specific Tags (<H1>, ... <H6>) without touching other tags????

    In February I wrote a feature request about this. Today it has been accepted to the bug database. Please make your vote:
    http://developer.java.sun.com/developer/bugParade/bugs/4760082.html

  • Filting html tags, css, and javascript with regex

    hi everyone,
    im writing a small application where a user types in a url, and the text of the webpage is displayed in a text area.
    ive got it to work, however it takes some time, and also alot of content i dont want is displayed - tags, scripts and sometimes css.
    initally i filtered out the html tags with a regular expression, but i still get alot of unwanted content.
    im not a confident java programmer, and the idea of parsing html, css and javascript is the scariest idea ever to me, so my next idea is to keep only everything between the <body> tags - everything above and below it is deleted - hopefully that should leave me only with the visible content on the site.
    ive messed around with regular expressions but i cant get it to work, can anyone help out?
    thanks alot,
    Torre

    Darryl.Burke wrote:
    I tried out the regexes I posted on the source of a forum page, which is not valid html (contains two each opening and closing body tags). With a bit of trial and error I was able to remove everything upto the first, and not the second, opening tag by using a reluctant qualifier, ^.*?, but couldn't for the life of me achieve removal of only the last closing tag, leaving the other, invalid one intact. How would you do that?Regexes always try to match the first occurrence of whatever they're looking for (the sentinel), and there's no way to change that behavior (but it would be handy if you could). What you have to do instead is make sure the rest of the regex can't match the sentinel. For that you need lookahead, and the simplest way to use it is to scan the rest of the text looking for the sentinel and, if it doesn't find one, go ahead and gobble up the remaining text: "(?is)</body>(?!.*</body).*$" However, if there are many occurrences of the sentinel, you could take a serious performance hit. Here's a much more efficient way: "(?is)</body>(?:[^<]++|<(?!/body>))*+$" After matching the sentinel, this regex gobbles up anything that's not the first character of the sentinel, or the first character as long as it isn't followed by the remaining characters of the sentinel. The advantages of this regex are that it never has to backtrack, and the lookahead is only applied when it's necessary, where the first regex applies it every time.

  • Html tags removed when #COLUMN_HEADER# is used in column template

    Hi all,
    I'm using APEX 4.0.2, theme 2 Builder Blue.
    I am trying to add html tags to dynamically generated column headings of a dynamic SQL Report.
    When using a standard report template, the headings contain the html tags. However when I want to use one of the vertical lay-outs all html tags are removed. After some research I found out that when the substitution string #COLUMN_HEADER# is used within the column headings part of the template, the html tags are being preserved. They are removed however when the #COLUMN_HEADER# substitution string is used in the column templates part of the template.
    This is easily testable by using for instance "return htf.bold ('COL01')" as dynamic column header.
    Is this a bug or am I overlooking something? Is there another solution maybe to preserve html tags in the column heading?
    Cheers, Erik

    webdynpro appears to use XHTML instead of HTML so the syntax is a bit more limitted and more picky.
    this link explains the difference between the two syntaxes:
    http://reference.sitepoint.com/html/html-xhtml-syntax
    you can test your tags in this validator tool
    http://validator.w3.org
    solved?  have a good week, and holidays.

Maybe you are looking for