Fastest way to remove html tags except url in href from string using java

Hi All,
Please suggest the, fastest way to remove html tags (stripe) except url in href of an anchor tag from string using java.
Please help me with the best solution as I use parser but it's taking time to remove the html tags from string of file.
I want the program should give the performance as 1 millisecond for 2kb file.
Please help me out... Thanks in advance

Hi,
how can I replace the anchor tag in a string, by the url in the href of that anchor tag by using jsoup,
e. g.
<code>
suppose input text is :
test, string using, dsfg, 1:14 PM, < a t a r ge t="_ablank" s t y l e = " color: red" h r e f = " h t t p : / / t e s t u r l . c o m / i n d e x . j s p ? a = 1 2 3 4 " > s u p p o r t < / a >, s c h e d u l a r t a g , < a t a r g e t = " _ vbblank " s t y l e = " c o l o r : g r e e n " h r e f = " h t t p : / / t e s t u r l g r e e n . c o m / i n d e x . j s p ? a = a s d f a s df 4 " > s u pp o r t r e q < / a > a s d f pq r
then out put text should be :
test, string using, dsfg, 1:14 PM, http://testurl.com/index.jsp?a=1234, schedular tag, http://testurlgreen.com/index.jsp?a=asdfasdf4 asdf pqr
</code>
Please help at the earliest..
Thanks in advance
as this text editor is not supporting html anchor tag the example is not displaying correctly
Edited by: 976815 on Dec 17, 2012 5:17 AM

Similar Messages

  • Way to remove HTML tags from a page-scoped attribute using JSTL?

    Hi,
    I'm using JSTL 1.2 with Tomcat 6.0.26. Does anyone know of a way to remove HTML tags from a page attribute, "${myExpr}". I would prefer a solution that uses JSTL only, but ultimately whatever gets the job done is fine with me.
    Thanks, - Dave

    I'm sorry, I don't understand your requirement. What do you mean by "remove HTML tags from a page attribute"?
    If you are dealing with a value of an attribute, it is most likely a String, and should be treated as such. The best approach would probably be java coding.

  • Is there a way to remove html tags

    I've have a column which I have used a Textarea with HTML Editor to populate. The problem is now I need the extract the contents of that column as plain text without all html tags to be used outside of htmldb. Is there a pl/sql function I can call which will do this?
    Thanks,
    Ben

    That is brilliant! Didnt realize Oracle Text could be used for such beneficial side-effects like this!
    Thanks a lot

  • Removing html tags

    Hi there,
    i have two questions:
    1.)how to check if a string contains html?
    2.)does aybody know how to remove html tags except some special (<a> ..)?
    thanx, jules

    You could do it the lazy way:
    String string = "bla bla bla <a href='sdfd'> asdffd <b>";
    Pattern pattern = Pattern.compile("\<.*\>");
    while(pattern.find())
       String tag = string.substring(pattern.start(), pattern.end());
    }

  • Remove HTML tags from a text area

    Hi, here is my problem:
    I have a form with a text area item; this item is “Display as Editor HTML standard”. So it is possible to enter formatted text with tags HTML. Then I save the text in a table. In the column the text maintain the HTML tags. Afterwards I can put the text in a report, and I can see the formatted text with the tags HTML interpreted.
    But I need also to use that text for other aims, (i.e. sending it in a mail) with the html tags removed.
    Is there any way to remove HTML tags from a text item?
    Regards
    Dario

    From http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:769425837805
       FUNCTION str_html (line IN VARCHAR2)
          RETURN VARCHAR2
       IS
          x         VARCHAR2 (32767) := NULL;
          in_html   BOOLEAN          := FALSE;
          s         VARCHAR2 (1);
       BEGIN
          IF line IS NULL
          THEN
             RETURN line;
          END IF;
          FOR i IN 1 .. LENGTH (line)
          LOOP
             s := SUBSTR (line, i, 1);
             IF in_html
             THEN
                IF s = '>'
                THEN
                   in_html := FALSE;
                END IF;
             ELSE
                IF s = '<'
                THEN
                   in_html := TRUE;
                END IF;
             END IF;
             IF NOT in_html AND s != '>'
             THEN
                x := x || s;
             END IF;
          END LOOP;
          RETURN x;
       END str_html;There's also a reqular expression approach that I've not tried. Remove HTML Tags and parse the text out of it

  • How to remove HTML tags from a String ?

    Hello,
    How can I remove all HTML Tags from a String ?
    Would you please to give me a simple example ?
    Best regards,
    Eric

    Here's some code I cooked up. I have created an object that processes code so that it can be incorporated directly into a project. There is some redundancy so that the it can be used in more than one way. Depending on your situation you might have to make the condition statement a little more sophisticated to catch stray ">" tags.
    I have also included a Tester application.
    //This removes Html tags from a String either by submitting the String during construction and then
    // calling getProcessedString() or
    // by simply calling " stringwithoutTags=removeHtmlTags(stringWithTagsSubmission); "
    //Note: This code assumes that all"<" tags are accompanied by a ">" tag in the proper order.
    public class HtmlTagRemover
         private String stringSubmission,processedString,stringBeingProcessed;
         private int indexOfTagStart,indexOfTagEnd;
         public HtmlTagRemover()
         public HtmlTagRemover(String s)
              removeHtmlTags(s);          
         public String removeHtmlTags(String s)
              stringSubmission=s;
              stringBeingProcessed=stringSubmission;
              removeNextTag();
              return processedString;
         private void removeNextTag()
              checkForNextTag();
              while((!(indexOfTagStart==-1||indexOfTagEnd==-1))&<indexOfTagEnd)
                   removeTag();
                   checkForNextTag();
              processedString=stringBeingProcessed;
         private void checkForNextTag()
              indexOfTagStart=stringBeingProcessed.indexOf("<");
              indexOfTagEnd=stringBeingProcessed.indexOf(">");
         private void removeTag()
              StringBuffer sb=new StringBuffer("");
              sb.append(stringBeingProcessed);
              sb.delete(indexOfTagStart,indexOfTagEnd+1);
              stringBeingProcessed=sb.toString();
         public String getProcessedString()
              return processedString;
         public String getLastStringSubmission()
              return stringSubmission;
    public class HtmlRemovalTester
         static void main(String[] args)
              String output;
              HtmlTagRemover h=new HtmlTagRemover();
              output="The processed String: "+h.removeHtmlTags("<Html tag>This is a test<another Html tag> string<yet another Html tag>.");
              output=output+"\n"+" The original string:"+h.getLastStringSubmission();
              System.out.print(output);

  • How to remove html-tags from a text.

    Hello!
    I have a text-field which I will remove html-tag's from.
    Example:
    "This is a test<br><p> and another test"
    The function must return a similar text, but without the html-
    tags <br> and <p> (in this case).
    Anybody that can help me with this little problem?
    Thanks in advance for any help :-)
    Best regards
    Kjetil Klxve

    You can wait for some kind personal to post a complete code
    solution... But if you want to fix this yourself (which is good
    for the soul) here are some hints:
    - You can use SUBSTR to get at chunks of text
    - You can use INSTR to find particular characters.
    - You can use INSTR as an argument of SUBSTR
    Hence:
    bit_of_text := SUBSTR(text, 1, INSTR(text, '<'));
    chopped_text := SUBSTR(text, INSTR(text, '<'));
    bit_of_text := bit_of_text||SUBSTR(chopped_text, INSTR
    (text, '>'), INSTR(text, '<'));
    will give you the first bit of text that doesn't contain any
    angle brackets.
    From this you should be able to work out how to functionalised
    this (you'll need to store the offsets and use them in a loop
    construct).
    Note that this assumes that the text only contains the '<'
    character when it's part of a HTML tag. If you can't guarantee
    this then you'll have to explicitly search for all the tags e.g.
    bit_of_text := SUBSTR(text, 1, INSTR(lower(text), '<p>'));
    bit_of_text := SUBSTR(text, 1, INSTR(lower(text), '<br>'));
    This will be a bit of pain. And completely rules out XML!
    rgds APC

  • I work for the Los Angeles Unified School District. I am looking for a way to remove or hide access to the settings on iPads using Apple Configurator (V1.7.1). Can anyone assist?

    I work for the Los Angeles Unified School District. I am looking for a way to remove or hide access to the settings on iPads using Apple Configurator (V1.7.1). Can anyone assist?

    A similar question came up yesterday. One of the responders posted this:
    Consider DEP, Device Enrollment Program. This will establish the company as the owner of the device.  It will lock an MDM to the divice which in turn will lock profiles to the device.
    Quick overview of zero-touch MDM enrollment, DEP
    http://www.apple.com/education/it/dep/
    "This document offers guidance on some important considerations for getting the most out of your iOS deployment." Covers: Prepare your infrastructure.  Set up devices.  Configure and manage devices.  Deploy apps and content.  Plan for support.
    https://www.apple.com/ipad/business/docs/iOS_Enterprise_Deployment_Overview_EN_F eb14.pdf
    [DocumentBodyEnd:d1616e95-b4ff-4e33-bf0b-3835cf3236c0]

  • Hello! I have a MacBookPro5,1. I recently had to erase the disk and start over. So right now I have OS X. what is the best(cheapest/fastest) way to get current on my OS to be able to use current websites and programs? thanks!

    Hello! I have a MacBookPro5,1. I recently had to erase the disk and start over. So right now I have OS X 10.5.5. what is the best(cheapest/fastest) way to get current on my OS to be able to use current websites and programs? thanks!
    Right now there is pretty much nothing on it, and I cant download basic things.

    Upgrading to Snow Leopard
    You can purchase Snow Leopard through the Apple Store: Mac OS X 10.6 Snow Leopard - Apple Store (U.S.). The price is $19.99 plus tax. You will be sent physical media by mail after placing your order.
    After you install Snow Leopard you will have to download and install the Mac OS X 10.6.8 Update Combo v1.1 to update Snow Leopard to 10.6.8 and give you access to the App Store. Access to the App Store enables you to download Mavericks if your computer meets the requirements.
         Snow Leopard General Requirements
           1. Mac computer with an Intel processor
           2. 1GB of memory
           3. 5GB of available disk space
           4. DVD drive for installation
           5. Some features require a compatible Internet service provider;
               fees may apply.
           6. Some features require Apple’s iCloud services; fees and
               terms apply.
    Upgrading to Mavericks
    You can upgrade to Mavericks from Lion or directly from Snow Leopard. Mavericks can be downloaded from the Mac App Store for FREE.
    Upgrading to Mavericks
    To upgrade to Mavericks you must have Snow Leopard 10.6.8 or Lion installed. Download Mavericks from the App Store. Sign in using your Apple ID. Mavericks is free. The file is quite large, over 5 GBs, so allow some time to download. It would be preferable to use Ethernet because it is nearly four times faster than wireless.
        OS X Mavericks- System Requirements
          Macs that can be upgraded to OS X Mavericks
             1. iMac (Mid 2007 or newer) - Model Identifier 7,1 or later
             2. MacBook (Late 2008 Aluminum, or Early 2009 or newer) - Model Identifier 5,1 or later
             3. MacBook Pro (Mid/Late 2007 or newer) - Model Identifier 3,1 or later
             4. MacBook Air (Late 2008 or newer) - Model Identifier 2,1 or later
             5. Mac mini (Early 2009 or newer) - Model Identifier 3,1 or later
             6. Mac Pro (Early 2008 or newer) - Model Identifier 3,1 or later
             7. Xserve (Early 2009) - Model Identifier 3,1 or later
    To find the model identifier open System Profiler in the Utilities folder. It's displayed in the panel on the right.
         Are my applications compatible?
             See App Compatibility Table - RoaringApps.

  • Problem removing html tags from the text retrived

    Hi there,
    I am using jdbc to connect the database and retriving the data. In one of the columns along with the description there are some html tags in few of the recors of that column. is there a way to retrive the text only ignoring the html tags in between. Or can i retrive and then strip off the html code in the text to display only normal text.
    example of the data retrived which are pipe seperated and one of the columns has html tags in it:
    209|The euphoria |187945-2|http://www.abc/lst.jsp?mktgChannel=I86023&sku=18791-2&siteID=qpF0HYnRugA|http://www.abc.com/assets/images/product/medium/18793-2_198.jpg|Rooftop Singers: Walk Right In | abc Music proudly presents THE FOLK YEARS, an unforgettable era in music history!<BR><BR><B>Featuring:</B><BR>
    <LI>The most complete collection of folk and folk-rock songs ever put together -- 132 classics!
    <LI>Original hits by the original artists!
    Now i need to remove the tags before displaying this on the output. Is there a simple way to do this.
    Thanks...

    Did you read the documentation of the trim() method,
    where it describes which whitespace it removes?I believe his problem is that
    "Some text here  
    <blah> 
    More text"becomes
    "Some text here  
    More text"... and he wants ...
    "Some text here
    More text"So, your problem is that your regex isn't matching whitespace as well.
    See the "Trimming Whitespace" section:
    http://www.regular-expressions.info/examples.html

  • How to remove html tags from a column

    Hi
    Problem is this: I get a column with a comma separated list of id's and I can successfully parse these id's and use them elsewhere. BUT, occasionally there are html tags within that id list like this:
    1082471,1237423<br xmlns="http://www.w3.org/1999/xhtml" />
    Is there a way to just automatically remove all tags from a column? Could do this with regex, but since there is no support, I don't know what to do.

    Hi,
    If the HTML can be detected by a starting symbol like „<“, then you could use the following:
    Unfortuntely the operation “ReplaceRange” is only available on a Text-level, so you have to invoke a function (at least to my knowledge). You also need an Index-column in your table, so if you don’t have it, you need to create one as well.
    This is your function:
    let
       fnRemoveHTML = (Value, Index) =>
    let
       Source = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
       IndeNo = Index,
       Value_ = Source{IndeNo-1}[Value],
       length = Text.Length(Text.From(Value_)),
       position = Text.PositionOf(Text.From(Value_), "<"),
       range = length-position,
       new= if Value_ is number then Value_ else Text.ReplaceRange(Value_, position, range, "")
    in
        new
    in
      fnRemoveHTML
    And this is how you invoke it:
    let
        Quelle = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
        Last = Table.AddColumn(Quelle, "Custom", each fn_RemoveHTML([Value], [Index])),
        ChangedType = Table.TransformColumnTypes(Last,{{"Custom", type number}})
    in
        ChangedType
    Provided your table is called “Tabelle1” & the column with your values to be replaced “Value” & your index-col “Index”
    Imke

  • ReReplace all html tags except selected

    Folks,
    I'm trying to figure out how to eliminate all html tags in a
    string except for <img> and <a> tags. Any ideas? I've
    been stumped for several days.
    thanks,
    /r

    Answer from the Regex Advise Forums at this link:
    http://regexadvice.com/forums/AddPost.aspx?PostID=40752
    ======================
    In it's simplest form I would suggest that might be:
    <(?!(?:a|img)\s|/a>)[^>]*>
    if CF doesn't like (?:):
    <(?!(a|img)\s|/a>)[^>]*>

  • Remove html tags and retrieve the data on the page

    hi,
    i want some help regarding removal of all the html tags and save the text that is on that page... i am relatively new to java and dont know how to go about this problem.
    can someone plz help me out

    > hi yeah i know that there are too many posts of this
    kind....but no1 gives a solid code or idea of how to
    remove the tags.... and i being a newbie dont get wat
    they want to say...... so plz help me out here guyz
    Write in clear, grammatical, correctly-spelled language
    We've found by experience that people who are careless and sloppy writers are usually also careless and sloppy at thinking and coding (often enough to bet on, anyway). Answering questions for careless and sloppy thinkers is not rewarding; we'd rather spend our time elsewhere.
    So expressing your question clearly and well is important. If you can't be bothered to do that, we can't be bothered to pay attention. Spend the extra effort to polish your language. It doesn't have to be stiff or formal -- in fact, hacker culture values informal, slangy and humorous language used with precision. But it has to be precise; there has to be some indication that you're thinking and paying attention.
    Spell, punctuate, and capitalize correctly. Don't confuse "its" with "it's", "loose" with "lose", or "discrete" with "discreet". Don't TYPE IN ALL CAPS; this is read as shouting and considered rude. (All-smalls is only slightly less annoying, as it's difficult to read. Alan Cox can get away with it, but you can't.)
    More generally, if you write like a semi-literate boob you will very likely be ignored. Writing like a l33t script kiddie hax0r is the absolute kiss of death and guarantees you will receive nothing but stony silence (or, at best, a heaping helping of scorn and sarcasm) in return.
    If you are asking questions in a forum that does not use your native language, you will get a limited amount of slack for spelling and grammar errors -- but no extra slack at all for laziness (and yes, we can usually spot that difference). Also, unless you know what your respondent's languages are, write in English. Busy hackers tend to simply flush questions in languages they don't understand, and English is the working language of the Internet. By writing in English you minimize your chances that your question will be discarded unread.
    Best of luck.
    ~

  • Remove HTML tags without REPLACE

    Hello,
    I have a simple query that returns some data, but the result could have html tags. I don't want to keep using REPLACE because sometimes I receive a tag that is not included in the REPLACE function. Is there is a way to do that, may be using some XML code?
    Thank you

    With a CTE
    like that 
    ALTER FUNCTION dbo.fnHtmlStripFreeLight 
    @inputStringCleaned VARCHAR(MAX) /*,@inputStringCleaned2 VARCHAR(MAX),@inputStringCleaned3 VARCHAR(MAX),
    @inputStringCleaned4 VARCHAR(MAX),@inputStringCleaned5 VARCHAR(MAX),
    @inputStringCleaned6 VARCHAR(MAX),
    @inputStringCleaned7 VARCHAR(MAX),
    @inputStringCleaned8 VARCHAR(MAX),@inputStringCleaned9 VARCHAR(MAX), @inputStringCleaned10 VARCHAR(MAX),@inputStringCleaned11 VARCHAR(MAX),
    @inputStringCleaned12 VARCHAR(MAX),
    @inputStringCleaned13 VARCHAR(MAX), @inputStringCleaned14 VARCHAR(MAX)*/
    RETURNS @freeHtml TABLE (
    col1 VARCHAR(MAX)/*,col2 VARCHAR(MAX),col3 VARCHAR(MAX),col4 VARCHAR(MAX),col5 VARCHAR(MAX),
    col6 VARCHAR(MAX),
    col7 VARCHAR(MAX),col8 VARCHAR(MAX),col9 VARCHAR(MAX),
    col10 VARCHAR(MAX),
    col11 VARCHAR(MAX), col12 VARCHAR(MAX),
    col13 VARCHAR(MAX),col14 VARCHAR(MAX)*/
    AS 
    BEGIN 
        DECLARE @output VARCHAR(MAX)=''/*, @output2 VARCHAR(MAX)='',@output3 VARCHAR(MAX)='', @output4 VARCHAR(MAX)='',@output5 VARCHAR(MAX)='', @output6 VARCHAR(MAX)='',@output7 VARCHAR(MAX)=''
    DECLARE @output8 VARCHAR(MAX)='', @output9 VARCHAR(MAX)='',@output10 VARCHAR(MAX)='', @output11 VARCHAR(MAX)='',@output12 VARCHAR(MAX)='', @output13 VARCHAR(MAX)='',@output14 VARCHAR(MAX)=''*/
        DECLARE @max_recursion INT
       SELECT @max_recursion =MAX(dim)
       FROM (SELECT dim
        FROM (VALUES (LEN(@inputStringCleaned))/*,(LEN(@inputStringCleaned2 )),
     (LEN( @inputStringCleaned3)),(LEN(@inputStringCleaned4)),(LEN(@inputStringCleaned5)),(LEN(@inputStringCleaned6)),(LEN(@inputStringCleaned7)),(LEN(@inputStringCleaned8))
     ,(LEN(@inputStringCleaned9)),(LEN(@inputStringCleaned10)),(LEN(@inputStringCleaned11)),(LEN(@inputStringCleaned12)),(LEN(@inputStringCleaned13)),(LEN( @inputStringCleaned14))*/
     ) as L (dim)
    ) as Q
        ;WITH Split
        AS(
            SELECT 
                1 AS stpos, 
                SUBSTRING(@inputStringCleaned,1,1) as Stream1, 
    SUBSTRING(@inputStringCleaned,2,1) as StreamNext, 
                CASE  WHEN  SUBSTRING(@inputStringCleaned,1,1) IN ('<','>') THEN 0 ELSE 1 END as isShow1
             /*   SUBSTRING(@inputStringCleaned2,1,1) as Stream2, 
    SUBSTRING(@inputStringCleaned2,2,1) as StreamNext2, 
                CASE  WHEN SUBSTRING(@inputStringCleaned2,1,1) IN ('<','>') THEN 0 ELSE 1 END as isShow2*/
            UNION ALL
            SELECT 
                      stpos+1,        -- premier flux
    StreamNext, 
    SUBSTRING(@inputStringCleaned,stpos+2,1),
                          CASE  WHEN StreamNext IN ('<','>') THEN 0
                                      WHEN Stream1 IN ('>') AND isShow1 = 0 THEN 1
                                      WHEN Stream1 NOT IN ('<','>') AND isShow1 = 1 THEN 1  ELSE 0 END,
                                      -- second flux
    /*  StreamNext2,
                                      SUBSTRING(@inputStringCleaned2,stpos+2,1)
                          CASE  WHEN StreamNext2 IN ('<','>') THEN 0
                                      WHEN Stream2 IN ('>') AND isShow2 = 0 THEN 1
                                      WHEN Stream2 NOT IN ('<','>') AND isShow2 = 1 THEN 1  ELSE 0 END */
            FROM Split
            WHERE stpos <= @max_recursion
    SELECT @output+= CASE WHEN isShow1  = 1 THEN Stream1    ELSE '' END FROM Split  OPTION (MAXRECURSION 8001)
    SELECT @output = REPLACE(@output, '&nbsp;','')
    INSERT INTO @freeHtml
    SELECT CASE  WHEN @inputStringCleaned IS NULL THEN CAST(NULL as VARCHAR) ELSE @output END
    RETURN
    END

  • Remove HTML tags in text

    Hi,
    I have to read some text from a text editor, that can be formatted for example with Bold, which means that when I execute the function to read its content, it returns something like this:
    Do you know how can I remove these HTML tags from the text?
    Thanks in advance.
    Regards,
    Sónia Gonçalves

    Hi,
    Something like this should do the trick.
    report  ztag.
    data: v_data type char30 value '<H>blablabla</H>'.
    if v_data(1) = '<' and
      v_data cs '>'.
    * Remove the HTML opening header
      shift v_data left up to '>'.
      shift v_data left.
    * Remove the HTML closing header
      shift v_data right up to '<'.
      shift v_data right.
      shift v_data left deleting leading space.
    endif.
    write: / v_data.
    Regards,
    Darren

Maybe you are looking for

  • TS3274 ipad mutes and shuts down when trying to watch streaming

    My friend is having a problem with his ipad and all he could explain to me was there was no sound when he was trying to stream the same program i was in another city, 550 kms away. All i ended up doing was phoning him and suggestin he put his phone o

  • Saving PDF as a Word doc.

    Hey guys... when i went to save my PDF i got in an email, to a word doc, everything seemed fine. once i opened the word doc in in microsoft word, it seemed to open it as a JPEG or something like a clip-art. i need to be able to edit the text, take so

  • Discrepancy between the confirmation and deletion of qRFC-LUWs

    Hi experts, When I started PC to loading data from CRM system,data loading stopped.I check the Log. "discrepancy between the confirmation and deletion of qRFC-LUWs" (Message no. RSQU016) How to solve it? Thank you.

  • My Nokia 900 is Haunted

    My phone has been acting strangely over the last month or so. When i plug it in, the touch screen freezes and no action can be taken unless i unplug the phone. Every once and a while the phone will take actions of its own, like last night i set it on

  • Variable for last day of last month....

    Hi, Can somebody let me know the code to find the last day of last month from the query variable. Is there any standard/business content  variable available for this? I have found one similar variable , but its for the posting date. Any help is appre