Stripping all HTML tags from a CLOB

Hi all,
Running Oracle 9.2.0.8 on AIX...
We have a table which stores HTML document fragments in a clob. I have a requirement to convert these to plain/text (strip all HTML tags) for sending in a plain/text email body.
I have read the following solution from Tom Kyte's site:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:25695084847068
Basically creating an Oracle text index on the CLOB column and calling ctx_doc.filter with "plaintext" parameter set to true.
I noticed in Tom's example, he uses the default filter, which based on the docs, is NULL_FILTER, which applies no filtering. I have tried his example in my dev box, creating the text index on the CLOB column with no parameters.
The call to ctx_doc.filter did not filter the html at all. I re-created the index and specified the INSO_FILTER and the filtering was done. I was under the impression that INSO_FILTER was for filtering binary content to plaintext...
create table filter ( query_id number, document clob );
create table demo
  ( id            int primary key,
    theclob       clob
create index demo_idx on demo(theClob) indextype is ctxsys.context;
SET DEFINE OFF;
Insert into DEMO
   (ID, THECLOB)
Values
   (1, '<html><body><p>This is a test of <strong>ctx_doc.filter</strong> and plaintext filtering.</p></body></html>');
COMMIT;
exec ctx_doc.filter('demo_idx',1, 'filter',1, true);The above code does not convert the html to plaintext...
Now re-create with the index with INSO_FILTER
drop index demo_idx;
create index demo_idx on demo(theClob) indextype is ctxsys.context parameters ('filter ctxsys.inso_filter');
exec ctx_doc.filter('demo_idx',1, 'filter',1, true);Above scenario returns string "This is a test of ctx_doc.filter and plaintext filtering."
The ORacle documentation doesn't specify any special filter parameter that needs to be set... just wondering if I'm missing soemthing here... or better yet, if there is a better solution to my problem. ;-)
Thanks
Stephane

The difference between what you did and what Tom Kyte did is that you created your index on a clob column and Tom created his index on a blob column. What I don't know is why that makes a difference. I have demonstrated below with one blob column and one clob column, one index on the blob and one index on the clob, using the same code on both, with different results.
SCOTT@orcl_11gR2> create table filter
  2    (query_id  number,
  3       document  clob)
  4  /
Table created.
SCOTT@orcl_11gR2> create table demo
  2    (id       int primary key,
  3       theblob   blob,
  4       theclob   clob)
  5  /
Table created.
SCOTT@orcl_11gR2> create index demo_blob_idx
  2  on demo (theblob)
  3  indextype is ctxsys.context
  4  /
Index created.
SCOTT@orcl_11gR2> create index demo_clob_idx
  2  on demo (theclob)
  3  indextype is ctxsys.context
  4  /
Index created.
SCOTT@orcl_11gR2> insert into demo values
  2    (1,
  3       utl_raw.cast_to_raw (
  4         '<html>
  5            <body>
  6              <p>
  7             This is a test of
  8             <strong> ctx_doc.filter </strong>
  9             and plaintext filtering.
10              </p>
11            </body>
12          </html>'),
13       '<html>
14          <body>
15            <p>
16              This is a test of
17              <strong> ctx_doc.filter </strong>
18              and plaintext filtering.
19            </p>
20          </body>
21        </html>')
22  /
1 row created.
SCOTT@orcl_11gR2> exec ctx_doc.filter ('demo_blob_idx', 1, 'filter', 1, true)
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> exec ctx_doc.filter ('demo_clob_idx', 1, 'filter', 2, true)
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> select id, utl_raw.cast_to_varchar2 (theblob), theclob from demo
  2  /
        ID
UTL_RAW.CAST_TO_VARCHAR2(THEBLOB)
THECLOB
         1
<html>
        <body>
          <p>
            This is a test of
            <strong> ctx_doc.filter </strong>
            and plaintext filtering.
          </p>
        </body>
      </html>
<html>
      <body>
        <p>
          This is a test of
          <strong> ctx_doc.filter </strong>
          and plaintext filtering.
        </p>
      </body>
    </html>
1 row selected.
SCOTT@orcl_11gR2> select query_id, document from filter
  2  /
  QUERY_ID
DOCUMENT
         1
This is a test of ctx_doc.filter and plaintext filtering.
         2
<html>
      <body>
        <p>
          This is a test of
          <strong> ctx_doc.filter </strong>
          and plaintext filtering.
        </p>
      </body>
    </html>
2 rows selected.
SCOTT@orcl_11gR2>

Similar Messages

  • How to remove HTML tags from a String ?

    Hello,
    How can I remove all HTML Tags from a String ?
    Would you please to give me a simple example ?
    Best regards,
    Eric

    Here's some code I cooked up. I have created an object that processes code so that it can be incorporated directly into a project. There is some redundancy so that the it can be used in more than one way. Depending on your situation you might have to make the condition statement a little more sophisticated to catch stray ">" tags.
    I have also included a Tester application.
    //This removes Html tags from a String either by submitting the String during construction and then
    // calling getProcessedString() or
    // by simply calling " stringwithoutTags=removeHtmlTags(stringWithTagsSubmission); "
    //Note: This code assumes that all"<" tags are accompanied by a ">" tag in the proper order.
    public class HtmlTagRemover
         private String stringSubmission,processedString,stringBeingProcessed;
         private int indexOfTagStart,indexOfTagEnd;
         public HtmlTagRemover()
         public HtmlTagRemover(String s)
              removeHtmlTags(s);          
         public String removeHtmlTags(String s)
              stringSubmission=s;
              stringBeingProcessed=stringSubmission;
              removeNextTag();
              return processedString;
         private void removeNextTag()
              checkForNextTag();
              while((!(indexOfTagStart==-1||indexOfTagEnd==-1))&<indexOfTagEnd)
                   removeTag();
                   checkForNextTag();
              processedString=stringBeingProcessed;
         private void checkForNextTag()
              indexOfTagStart=stringBeingProcessed.indexOf("<");
              indexOfTagEnd=stringBeingProcessed.indexOf(">");
         private void removeTag()
              StringBuffer sb=new StringBuffer("");
              sb.append(stringBeingProcessed);
              sb.delete(indexOfTagStart,indexOfTagEnd+1);
              stringBeingProcessed=sb.toString();
         public String getProcessedString()
              return processedString;
         public String getLastStringSubmission()
              return stringSubmission;
    public class HtmlRemovalTester
         static void main(String[] args)
              String output;
              HtmlTagRemover h=new HtmlTagRemover();
              output="The processed String: "+h.removeHtmlTags("<Html tag>This is a test<another Html tag> string<yet another Html tag>.");
              output=output+"\n"+" The original string:"+h.getLastStringSubmission();
              System.out.print(output);

  • Stripping HTML Tags from a String

    What's the best way to remove html tags from a string (i.e. user input)?

    Can you give an example? You can do substring, if your passing spaces between pages you can do a trim to the variable. Also look at the indexOf(). Look at methods relating to java.lang.String.

  • How to aviod html tags from Report column heading while export to csv

    Hi All,
    How to aviod html tags from Report column heading while export to excel.
    We used like Employee<br> Department in column heading, but the problem is the <br> tag also exporting into csv file.
    If any column data 3/2009 formatt the it will exporting as marh 2009.
    Please help on this.
    Thanks,
    Nr
    Edited by: pnr on Jul 5, 2011 5:00 AM

    Hi Nr
    Here is how I approached this problem.
    Go to report attributes tab
    under column attributes check PLSQL radio button.
    Create a function to return the heading of your report as shown below in your database.
    create function get_heading return clob as
    v_request VARCHAR2(20) := V('REQUEST');
    v_col_heading CLOB;
    begin
    IF INSTR(v_request,'FLOW_EXCEL_OUTPUT',1) > 0 THEN
    v_col_heading := 'Employee Number:Employee Name';
    ELSE
    v_col_heading := 'Employee breaktag Number:Employee break tag Name';
    END IF;
    return v_col_heading;
    end;
    Type the function below under ( Function returning colon delimited headings:) as follows.
    return get_heading;
    Similarly for data base it on PLSQL function body returning SQL and follow the same approach as headings.
    Hope this helps.
    Thanks
    Sukarna
    Edited by: user513776 on Jul 5, 2011 2:24 PM
    Edited by: user513776 on Jul 5, 2011 2:27 PM

  • ReReplace all html tags except selected

    Folks,
    I'm trying to figure out how to eliminate all html tags in a
    string except for <img> and <a> tags. Any ideas? I've
    been stumped for several days.
    thanks,
    /r

    Answer from the Regex Advise Forums at this link:
    http://regexadvice.com/forums/AddPost.aspx?PostID=40752
    ======================
    In it's simplest form I would suggest that might be:
    <(?!(?:a|img)\s|/a>)[^>]*>
    if CF doesn't like (?:):
    <(?!(a|img)\s|/a>)[^>]*>

  • How to exlcude HTML Tags from Excel Reports

    Hi Guys
    Within Project Online - OData extract to Excel
    Has anyone found a way to eliminate the HTML tags from Multi Line Text fields within Project Server? I can easily extract the text and generate nice Excel Reports, but the html tag is very annoying in the Excel Reports and it doesn't read easily.
    Any help would be appreciated.
    Marc Soester [MVP] http://marcsoester.blogspot.com

    Marc, 
    What you could do (given that you find the required time and energy to write the lines),
    would be to replace all (!) html characters like here (http://stackoverflow.com/questions/14705605/remove-html-tags-from-cell-strings-excel-formula -
    this is one of the Excel UDF/VB-based solutions, but will not refresh in Excel Services - however there is a good list of what to replace) with PowerQuery.
    That would refresh over a PowerBI subscription in the least..
    -Ville

  • Does Firefox work with all html tags/CSS properties?

    I am considering Firefox because MSIE has been and is becoming more annoying.
    I want a browser that simply implements all html tags and CSS properties.
    I want Firefox to install without screwing with any other application on my computer.
    Possible?

    Sure no problems to install Firefox alongside other browsers.<br />
    You only need to decide which browser to set as the default browser that is used when you click a link in other programs.
    *http://developer.mozilla.org/en/Mozilla_CSS_support_chart
    *https://developer.mozilla.org/en/HTML

  • How do I set BI Publisher to read html tags from the database?

    How do I set BI Publisher (Release 10.1.3.4) to read html tags from the database? For example if the text is quoted with a bold tag I want my output to display the text in bold. Is there a setting or something I can set?

    I took a look at Tim Dexter's blog as suggested and the sample worked, but for the elements in the xml file not for the value coming from the database, however this is good to know as well!
    I have data in the data base column which looks like this:
    'MS Applied <B(bold tag)> Mathematics</B(bold tag)>University of Southern California'
    I want the data to be rendered like this:
    'MS Applied <B>Mathematics</B> University of Southern California'.
    In Report Builder on the property sheet I would set Contains HTML Tags property to Yes and the report would render correctly.
    In BI Publisher 10.1.3.4 I can not seem set it to read this I have change the configure properties of the report to Character set to HTML and Make HTML output accessible to True. I just can't figure out what I'm missing.
    Thank you for any assistance you can offer.

  • Remove HTML tags from a text area

    Hi, here is my problem:
    I have a form with a text area item; this item is “Display as Editor HTML standard”. So it is possible to enter formatted text with tags HTML. Then I save the text in a table. In the column the text maintain the HTML tags. Afterwards I can put the text in a report, and I can see the formatted text with the tags HTML interpreted.
    But I need also to use that text for other aims, (i.e. sending it in a mail) with the html tags removed.
    Is there any way to remove HTML tags from a text item?
    Regards
    Dario

    From http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:769425837805
       FUNCTION str_html (line IN VARCHAR2)
          RETURN VARCHAR2
       IS
          x         VARCHAR2 (32767) := NULL;
          in_html   BOOLEAN          := FALSE;
          s         VARCHAR2 (1);
       BEGIN
          IF line IS NULL
          THEN
             RETURN line;
          END IF;
          FOR i IN 1 .. LENGTH (line)
          LOOP
             s := SUBSTR (line, i, 1);
             IF in_html
             THEN
                IF s = '>'
                THEN
                   in_html := FALSE;
                END IF;
             ELSE
                IF s = '<'
                THEN
                   in_html := TRUE;
                END IF;
             END IF;
             IF NOT in_html AND s != '>'
             THEN
                x := x || s;
             END IF;
          END LOOP;
          RETURN x;
       END str_html;There's also a reqular expression approach that I've not tried. Remove HTML Tags and parse the text out of it

  • Way to remove HTML tags from a page-scoped attribute using JSTL?

    Hi,
    I'm using JSTL 1.2 with Tomcat 6.0.26. Does anyone know of a way to remove HTML tags from a page attribute, "${myExpr}". I would prefer a solution that uses JSTL only, but ultimately whatever gets the job done is fine with me.
    Thanks, - Dave

    I'm sorry, I don't understand your requirement. What do you mean by "remove HTML tags from a page attribute"?
    If you are dealing with a value of an attribute, it is most likely a String, and should be treated as such. The best approach would probably be java coding.

  • How do I strip out all HTML tags except for safe ones I designate?

      // FLAG ALL INCIDENTS OF LEGITIMATE TAGS BY CONVERTING <..> TO <!..!>
      for (int i = 0; i < parseVector.size(); i++) {
       this.content =
        this.content.replaceAll("<(/?)(" + (String)parseVector.elementAt(i) + "[\\s\\t=]+[^>]*)>",
                       "\\<!$1$2!\\>");
      }I have a Vector of HTML string text consisting of things like:
    {"i", "b", "u", "blockquote", "font"}
    And within this.content, which contains HTML, I want to strip out all HTML except for certain "safe" tags.
    Problem is, my code fails to do just that.. while <img..> is gone, so is <i> and I want to keep the latter.
    I feel the problem is in my regular expression pattern within this.content.replaceAll() method, but maybe I'm wrong. What do you say?
    Entire code can be found in http://www.myjavaserver.com/~ppowell/HTMLParser.java
    Thanx
    Phil

    Let me give you an example of what I want:
    I understand what you want... I don't see how the code that you posted is supposed to that.
    The best advice I can give is that a for loop is not part of the equation here. You will need to do it in one regex I think. Because if what you are doing there is saying replace all the tags that aren't <i> on one loop and replace all the tags that are not <b> in another loop guess what is happening?
    On the first pass you don't replace the <i> tags but you do replace the <b> tags. On the next pass you replace <i> tags because they don't match the <b> etc.
    You see?
    So I think you need to do this all in one regex where the starting portion of the tag is NOT one of the set that you want to keep.

  • Strip html tags from string & convert ampelsand charachters

    hello, i'm converting html into xml, and i need to convert html code & content into xml content, withouth the html tags ...
    so, for example, I strip this out of an html file:
    <A NAME="b_betreft"></A>STUDIEOPDRACHT "UITBREIDING VIPA NAAR MEERDERE SUBSECTOREN" HERVERDELING VASTLEGGINGS- EN VEREFFENINGSKREDIETEN VAN HET VIPA VOOR HET JAAR 1999 ONTWERPBESLUIT VAN DE VLAAMSE REGERING TOT HERVERDELING VAN BASISALLOCATIES VAN DE BEGROTING VAN HET VLAAMS INFRASTRUCTUURFONDS VOOR PERSOONSGEBONDEN AANGELEGENHEDEN VOOR HET BEGROTINGSJAAR 1999<A NAME="e_betreft">
    and i want to get rid of the "<A NAME="b_betreft"></A>" & "<A NAME="e_betreft">", are there classes that can do this ???
    probably there are, i know in php there are ..., how about java ???
    also i'll need to correct stuff like:
    Financi&euml;le => Financi�le
    Comit&eacute=>Comit�
    you see, then, i'm done, cool ...
    thanks dudessssss

    hello, i'm converting html into xml, and i need to
    convert html code & content into xml content,
    withouth the html tags ...Why didn't you continue to post in your other thread?
    http://forum.java.sun.com/thread.jspa?threadID=777660
    It's not nice to create multiple threads with the same question.
    Kaj

  • How can I eliminate HTML tags from Oracle Text Snippet?

    I perform a search on many tables and on many columns of those tables.
    Some of those columns are VARCHAR2 and some CLOB.
    Also, some of the searchable data are HTML and some are plain text.
    My problem is that ctx_doc.snippet fetches the HTML tags.
    For example I get this, as a snippet result in one of my searches: Qual Germany n1 &lt;p&gt;Test Qual Germany n1&lt;/p&gt;
    I want the result to be fetched without the HTML tags.
    In my index configuration I have used NULL FILTER and HTML_SECTION_GROUP.With that configuration I managed to eliminate the HTML tags but not in all cases!
    For example:
    I search table CONTENTS columns TITLE(VARCHAR2) and MAIN_TEXT(CLOB)
    I created the following procedure that concatenates the two columns:
    CREATE OR REPLACE PROCEDURE CONTENTS_PROC( p_id in rowid, p_lob IN OUT clob)
    IS
    BEGIN
    FOR c1 IN (SELECT main_text||' '||title data FROM contents WHERE ROWID = p_id)
    LOOP
    dbms_lob.copy( p_lob, c1.data,
    dbms_lob.getlength( c1.data ));
    END LOOP;
    END;
    I created a user Datastore:
    BEGIN
    ctx_ddl.create_preference( 'content_trans_datastore', 'user_datastore' );
    ctx_ddl.set_attribute( 'content_trans_datastore', 'procedure', 'CONTENTS_PROC' );
    END;
    and finally I create the index:
    CREATE INDEX content_trans_ot_idx ON contents(ORACLE_TEXT_COLUMN)
    INDEXTYPE IS ctxsys.CONTEXT PARAMETERS ('datastore content_trans_datastore SYNC(ON COMMIT) STORAGE INDEX_STORAGE filter ctxsys.null_filter section group ctxsys.html_section_group');
    When I perform the search on those data: &lt;p&gt; &lt;strong&gt;Test Doc-Test &lt;/strong&gt; &lt;/p&gt; the snippet I get is: Test Doc-Test.
    That's fine, the html tags are removed!
    In another case I search table NCP columns NAME(VARCHAR2) and BODY(VARCHAR2)
    I created the following procedure that concatenates the two columns:
    CREATE OR REPLACE PROCEDURE NCP_PROC( p_id in rowid, p_lob IN OUT clob)
    IS
    BEGIN
    FOR c1 IN (SELECT name||' '||body data FROM ncp WHERE ROWID = p_id)
    LOOP
    dbms_lob.copy( p_lob, c1.data,
    dbms_lob.getlength( c1.data ));
    END LOOP;
    END;
    I created a user Datastore:
    BEGIN
    ctx_ddl.create_preference( 'ncp_trans_datastore', 'user_datastore' );
    ctx_ddl.set_attribute( 'ncp_trans_datastore', 'procedure', 'NCP_PROC' );
    END;
    and finally I create the index:
    CREATE INDEX ncp_trans_ot_idx ON ncp(ORACLE_TEXT_COLUMN)
    INDEXTYPE IS ctxsys.CONTEXT PARAMETERS('datastore ncp_trans_datastore SYNC(ON COMMIT) STORAGE INDEX_STORAGE filter ctxsys.null_filter section group ctxsys.html_section_group');
    When I perform the search on those data: test &lt;strong&gt; &lt;/strong&gt;http://deleteme.com the snippet I get is: test &lt;strong&gt; &lt;/strong&gt;http://deleteme.com!!!!!!!!!!
    How is this possible? Why in the first case the HTML tags are eliminated and in the second case they are not?
    Thanks,
    Margarita
    Edited by: user13312701 on 07-Sep-2010 08:51

    Doing various tests I found out that the problem is when I need to search in multiple columns of a table.
    That is when I create a user_datastore that uses a procedure that concatenates the columns.
    And especially when the data with the html tags is in a VARCHAR2 column.
    e.g
    --create the table*
    CREATE TABLE CONTENT_TRANS (content_trans_id NUMBER,
    main_text CLOB,
    title vARCHAR2(2000),
    oracle_text_column VARCHAR2(1));
    alter table "CONTENT_TRANS" add constraint CONTENT_PK primary key("CONTENT_TRANS_ID") ;
    --Insert dummy data*
    Insert into CONTENT_TRANS
    (CONTENT_TRANS_ID,MAIN_TEXT,TITLE)
    values
    (1,'lorem','lorem <p>qualification</p> 2.1 ');
    Insert into CONTENT_TRANS
    (CONTENT_TRANS_ID,MAIN_TEXT,TITLE)
    values
    (2,'lorem','lorem <br>qualification</br> 2.1 ');
    --CREATE THE procedure that concatenates main_text(CLOB) and title(VARCHAR2)*
    CREATE OR REPLACE PROCEDURE CONTENT_TRANS_PROC( p_id in rowid, p_lob IN OUT clob)
    IS
    BEGIN
    FOR c1 IN (SELECT main_text||' '||title data FROM content_trans WHERE ROWID = p_id)
    LOOP
    dbms_lob.copy( p_lob, c1.data,
    dbms_lob.getlength( c1.data ));
    END LOOP;
    END;
    --Create the user datastore*
    BEGIN
    ctx_ddl.create_preference( 'content_trans_datastore', 'user_datastore' );
    ctx_ddl.set_attribute( 'content_trans_datastore', 'procedure', 'CONTENT_TRANS_PROC' );
    END;
    --Create the index*
    CREATE INDEX content_trans_ot_idx ON content_trans(ORACLE_TEXT_COLUMN)
    INDEXTYPE IS ctxsys.CONTEXT PARAMETERS ('datastore content_trans_datastore SYNC(ON COMMIT) filter ctxsys.null_filter section group ctxsys.html_section_group');
    exec ctx_doc.set_key_type('PRIMARY_KEY');
    --Perform the query
    SELECT SCORE(1),ct.content_trans_id, ctx_doc.snippet('content_trans_ot_idx', ct.content_trans_id, 'lorem') as snippet
    from content_trans ct
    where contains(ct.ORACLE_TEXT_COLUMN, 'lorem', 1) > 1;
    Results WITH NOT WANTED HTML TAGS:
    6     1     <b>lorem</b> <b>lorem</b> &lt;p&gt;qualification&lt;/p&gt; 2.1
    6     2     <b>lorem</b> <b>lorem</b> &lt;br&gt;qualification&lt;/br&gt; 2.1
    Edited by: user13312701 on 13-Oct-2010 01:18

  • Need a quicker way to strip most html tag while retaining a few.

    I have to strip the majority of non plane text (html, javascript, css) from a file whilst retaining the html tags I want.
    I have found many solutions for stripping ALL of the tags but not for stripping most whilst retaining a few.
    I first looked at regular expressions but I could not find a solution that didnt involve specifying all the tags I want removed insted of just listing all the tags I want to keep.
    I had something like this> str.replaceAll("\\<+?(^p|^th|^tr|^td|^h2|^h3|^h4|^li).+?\\>","") but it does not work.
    A regular expression would be great but I could not find one so I coded my own solution.
    The code below goes through each tag (both start and end tags) and strips it from the string if it is not a wanted tag.
    private void stripTags(int startIndex){
              int startArrow = 0;
              int endArrow = 0;
              while(true){
                   startArrow = text.indexOf("<", startIndex);
                   endArrow = text.indexOf(">", startArrow+1);
                   //reached EOF?
                   if(startArrow == -1 || endArrow == -1)   return; // return -1; 
                   if(text.substring(startArrow+1, startArrow+2).equals("/")){
                        //deal with the end tag
                        if(isWantedTag(text.substring(startArrow+2, startArrow+4))) {
                             startIndex = endArrow+1;
                        } else {
                             //remove the tag
                             text = text.substring(0, startArrow).concat(text.substring(endArrow+1, text.length()));
                             startIndex = startArrow;
                   } else {
                        //deal with the start tag
                        if(isWantedTag(text.substring(startArrow+1, startArrow+3))){
                             //remove tag parameters
                             if(endArrow-(startArrow+2) > 1) text = text.substring(0, startArrow+3).concat(text.substring(endArrow, text.length()));
                             startIndex = text.indexOf(">", startArrow+1)+1;
                        } else {
                             //remove the tag
                             text = text.substring(0, startArrow).concat(text.substring(endArrow+1, text.length()));
                             startIndex = startArrow;
    private boolean isWantedTag(String tag){
              return (tag.equals("p>") || tag.equals("p ") || tag.equals("th") || tag.equals("tr") || tag.equals("td") || tag.equals("h2") || tag.equals("h3") || tag.equals("h4") || tag.equals("li"));
         }The problem is that this code is taking ages to run. It takes about 10 seconds to run this method on a very large html file. I need a solution that takes only a 10th of that time.
    Does anybody have a regExp that could do the job or a more efficient version on my tag stripper above?
    Thanks in advance

    I suggest the following template:
    int copyFrom = 0;
    StringBuilder sb = new StringBuilder();
    while (copyFrom < text.length()) {
        int copyTo = findStartTag(text, copyFrom);
        sb.append(text, copyFrom, copyTo);
        copyFrom = findEndTag(text, copyTo + 1);
    text = sb.toString();where findStartTag(String s, int offset) returns the index of the next start tag in s, starting search at offset. It should return s.length() rather than -1 if nothing is found. findEndTag is similar.
    Sorry, that's obviously wrong since you don't want to remove everything between the tags, just the tags themselves. The basic idea is the same, though:
    int copyFrom = 0;
    StringBuilder sb = new StringBuilder();
    while (copyFrom < text.length()) {
        int copyTo = findStartOfTagToRemove(text, copyFrom);
        sb.append(text, copyFrom, copyTo);
        copyFrom = findEndOfTagToRemove(text, copyTo) + 1;
    text = sb.toString();Here, findStartOfTagToRemove finds the position of the '<' character of the next tag to remove.
    And findEndOfTagToRemove finds the position of the '>' character of that tag.
    Edited by: nygaard on Sep 30, 2008 10:39 AM

  • Remove HTML tags from a string

    I have a string that contains a couple of HTML or XHTML tag, for example
    lv_my_string = '<p style="something">Hello <strong>World</strong>!</p>'.
    For a special use case, I want to remove all HTML from that string and process only the plain text
    lv_my_new_string = 'Hello World!'.
    Is there any method, function module, XSLT or anything else for that already?

    Hi Daniel,
    I tried using the FM (SWA_STRING_REMOVE_SUBSTRING) but I guess it is expecting a particular pattern which is not so apparent in your case. Iu2019ve written a small piece of code which you can try using in a FM or a PERFORM and that should do the trick. Please let me know if you have any questions.
    PARAMETER: P_LINE(100).
    TYPES: BEGIN OF TY_LINE,
             LINE(100),
           END OF TY_LINE.
    DATA: T_LINE TYPE STANDARD TABLE OF TY_LINE,
          WA_LINE LIKE LINE OF T_LINE.
    DATA: W_LINE(100),
          W_LEN(100),
          W_COUNT TYPE I,
          W_FLAG,
          W_FLAG1,
          W_I TYPE I.
    W_COUNT = STRLEN( P_LINE ).
    DO W_COUNT TIMES.
      IF P_LINE+W_I(1) = '<'.
        W_FLAG = 1.
        W_I = W_I + 1.
        IF NOT WA_LINE-LINE IS INITIAL.
          APPEND WA_LINE-LINE TO T_LINE.
          CLEAR WA_LINE.
        ENDIF.
        CONTINUE.
      ELSEIF P_LINE+W_I(1) = '>'.
        W_FLAG = 0.
        W_I = W_I + 1.
        CONTINUE.
      ENDIF.
      IF W_FLAG = 1.
        W_I = W_I + 1.
        CONTINUE.
      ELSE.
        CONCATENATE WA_LINE-LINE P_LINE+W_I(1) INTO WA_LINE-LINE.
        W_I = W_I + 1.
      ENDIF.
    ENDDO.
    LOOP AT T_LINE INTO WA_LINE.
      CONCATENATE W_LINE WA_LINE-LINE INTO W_LINE SEPARATED BY SPACE.
    ENDLOOP.
    SHIFT W_LINE LEFT DELETING LEADING SPACE.
    WRITE: W_LINE.
    Input:
    <p style="something">Hello <strong>World</strong>!</p>
    Output:
    HELLO WORLD !
    Regards,
    Pritam

Maybe you are looking for

  • EA6400 keeps restarting after setup

    I got the router setup and the Internet is working but it restarts itself and disconnects the devices every 30 seconds. How do I stop this?

  • Why won't my dock app open

    I have guitar pro 6 on my mac and in my dock, but after installing lion it won't open. The icon bounces then starts to open and then closes. How can I get this app to open? Also when a disk is put in it goes in dosen't show the disk and then ejects i

  • Please help me!!!!!!! i have no idea what im doing!

    please can someone help me? i was on ustream.com this morning and all the videos were just fine. i turned it on this evening and all i have is a white screen! i have been through as many of the forums as i can and i cant find any answers that i under

  • Print-only Policy Restrictions with Digital Signature

    I'm trying to lock down all form fields on a PDF document and add a digital signature. Is this possible? I don't see a Document Restrictions option in the drop-down menu that enables Print capability ONLY as a Policy setting. There doesn't seem to be

  • Outbound delivery with reference to STO

    Hi all, When i run the Functionmodule   BAPI_OUTB_DELIVERY_CREATE_STO in SE37. It is creating delivery but the generated delivery number is not available in LIKP or LIPS. Please tell how can this generated delivery be available in LIPS or LIKP. input