How do I strip out all HTML tags except for safe ones I designate?

  // FLAG ALL INCIDENTS OF LEGITIMATE TAGS BY CONVERTING <..> TO <!..!>
  for (int i = 0; i < parseVector.size(); i++) {
   this.content =
    this.content.replaceAll("<(/?)(" + (String)parseVector.elementAt(i) + "[\\s\\t=]+[^>]*)>",
                   "\\<!$1$2!\\>");
  }I have a Vector of HTML string text consisting of things like:
{"i", "b", "u", "blockquote", "font"}
And within this.content, which contains HTML, I want to strip out all HTML except for certain "safe" tags.
Problem is, my code fails to do just that.. while <img..> is gone, so is <i> and I want to keep the latter.
I feel the problem is in my regular expression pattern within this.content.replaceAll() method, but maybe I'm wrong. What do you say?
Entire code can be found in http://www.myjavaserver.com/~ppowell/HTMLParser.java
Thanx
Phil

Let me give you an example of what I want:
I understand what you want... I don't see how the code that you posted is supposed to that.
The best advice I can give is that a for loop is not part of the equation here. You will need to do it in one regex I think. Because if what you are doing there is saying replace all the tags that aren't <i> on one loop and replace all the tags that are not <b> in another loop guess what is happening?
On the first pass you don't replace the <i> tags but you do replace the <b> tags. On the next pass you replace <i> tags because they don't match the <b> etc.
You see?
So I think you need to do this all in one regex where the starting portion of the tag is NOT one of the set that you want to keep.

Similar Messages

  • ReReplace all html tags except selected

    Folks,
    I'm trying to figure out how to eliminate all html tags in a
    string except for <img> and <a> tags. Any ideas? I've
    been stumped for several days.
    thanks,
    /r

    Answer from the Regex Advise Forums at this link:
    http://regexadvice.com/forums/AddPost.aspx?PostID=40752
    ======================
    In it's simplest form I would suggest that might be:
    <(?!(?:a|img)\s|/a>)[^>]*>
    if CF doesn't like (?:):
    <(?!(a|img)\s|/a>)[^>]*>

  • Blocking all MAC addresses except for the ones you allow

    I have a Cisco Aironet 1200 Access Point. I want to block all MAC addresses from accessing the access point, except for the ones I've allowed. First I went to the Address Filters page and clicked on Allowed, then listed all the MAC address I want to be able to access the access point. Then I went to the Ethernet Advanced page, and set the Default Multicast Address Filter to Disallowed, and the Default Unicast Address Filter to Disallowed. Then I went to the AP Radio: Internal Advanced page, clicked on the Advanced Primary SSID Setup link, and set the Default Unicast Address Filter to Disallowed. Accept Authentication Type is set to Open with Shared and Network-EAP cleared, and the Require EAP check boxes are all cleared.
    When using a computer whose MAC address is not listed on the Address Filters page, I am still able to connect to the network through the access point. I am also able to connect to the access point from any pc on my network by entering its IP address in Internet Explorer.
    What do I need to do to block any pc without a listed MAC address from connecting to the access point?
    Thanks, Jeff

    Here's the instructions and URL on how to create an MAC based filter:
    Follow these steps to create a MAC address filter:
    Step 1 Follow the link path to the Address Filters page.
    Step 2 Type a destination MAC address in the New MAC Address Filter: Dest
    MAC Address field. You can type the address with colons separating the character pairs
    (00:40:96:12:34:56, for example) or without any intervening characters (004096123456, for example).
    Note If you plan to disallow traffic to all MAC addresses except
    those you specify as allowed, put your own MAC address in the list of allowed MAC
    addresses. If you plan to disallow multicast traffic, add the broadcast MAC address
    (ffffffffffff) to the list of allowed addresses.
    Step 3 Click Allowed to pass traffic to the MAC address or click Disallowed
    to discard traffic to the MAC address.
    Step 4 Click Add. The MAC address appears in the Existing MAC Address
    Filters list. To remove the MAC address from the list, select it and click Remove.
    Step 5 Click OK. You return automatically to the Setup page.
    Step 6 Click Advanced in the AP Radio row of the Network Ports section at
    the bottom of the Setup page for the radio you want to configure. The AP Radio Advanced page appears.

  • How can I clear out all smart tags?

    I just upgraded from Photoshop Elements 6 to Elements 10.  In the Elements 6 release you can do a FIND on all untagged items, so I can easily find all pictures that I have not yet gotten around to tagging.
    On Elements 10 I find that the software has automatically added a "Smart Tag" tag to ALL of my pictures, making the FIND UNTAGGED useless.  I assumed this happened when I converted my catalog.  I have now gone back and disabled the Smart Tag feature (now that I know about it) so that new pictures will NOT get Smart Tegged.  However, my database of tens of thousands of pictures has been contaminated by Smart Tags.
    Is there anyway to get all the Smart Tags cleared out of the database so I can find the pictures that I have not yet manually tagged?
    Alternately, is there a way to find pictures that have a Smart Tag but no manual tag?

    photodrawken wrote:
    Excellent!  Glad you found a solution.
    msrecant wrote:
    However, it appears to me that as soon as one enables the Smart Tag feature the "Find untagged items" feature becomes almost meaningless. 
    Absolutely right.  Although Smart Tags are automatically generated, as far as the database is concerned, for the images they are tags just like any other tags.
    Ken
    This time, even if it's not exactly a bug, it is a disastrous default setting.
    All autoanalyzer functions should be disabled by default, and only enabled knowingly.
    The other lesson is that the Find menu should be able to search for files without a tag while ignoring the 'smart' tags.
    The fact that the option to 'remove tags from selected items' cannot be used when you have to many tags (only one page display of tags, without a way to scroll down to next screen) is a bug in my opinion.
    http://forums.adobe.com/thread/1035239?tstart=0

  • I just upgraded to the latest software for my pages.  When I opened an old document it stripped out all the images I had in my tables.  How do I get them back

    I just upgraded to the latest software for my pages.  When I opened an old document it stripped out all the images I had in my tables.  How do I get them back

    Don't save. Close the documents.
    Pages '09 should still be in your Applications/iWork folder.
    Pages 5 will damage/alter older files, something Apple didn't tell its users.
    Pages 5 is a severely stripped down version with over 90+ features removed.
    We recommend trashing/archiving Pages 5 after Exporting any files you may have saved back to Pages '09.
    Also rate/review Pages 5 in the App store.
    Peter

  • Stripping all HTML tags from a CLOB

    Hi all,
    Running Oracle 9.2.0.8 on AIX...
    We have a table which stores HTML document fragments in a clob. I have a requirement to convert these to plain/text (strip all HTML tags) for sending in a plain/text email body.
    I have read the following solution from Tom Kyte's site:
    http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:25695084847068
    Basically creating an Oracle text index on the CLOB column and calling ctx_doc.filter with "plaintext" parameter set to true.
    I noticed in Tom's example, he uses the default filter, which based on the docs, is NULL_FILTER, which applies no filtering. I have tried his example in my dev box, creating the text index on the CLOB column with no parameters.
    The call to ctx_doc.filter did not filter the html at all. I re-created the index and specified the INSO_FILTER and the filtering was done. I was under the impression that INSO_FILTER was for filtering binary content to plaintext...
    create table filter ( query_id number, document clob );
    create table demo
      ( id            int primary key,
        theclob       clob
    create index demo_idx on demo(theClob) indextype is ctxsys.context;
    SET DEFINE OFF;
    Insert into DEMO
       (ID, THECLOB)
    Values
       (1, '<html><body><p>This is a test of <strong>ctx_doc.filter</strong> and plaintext filtering.</p></body></html>');
    COMMIT;
    exec ctx_doc.filter('demo_idx',1, 'filter',1, true);The above code does not convert the html to plaintext...
    Now re-create with the index with INSO_FILTER
    drop index demo_idx;
    create index demo_idx on demo(theClob) indextype is ctxsys.context parameters ('filter ctxsys.inso_filter');
    exec ctx_doc.filter('demo_idx',1, 'filter',1, true);Above scenario returns string "This is a test of ctx_doc.filter and plaintext filtering."
    The ORacle documentation doesn't specify any special filter parameter that needs to be set... just wondering if I'm missing soemthing here... or better yet, if there is a better solution to my problem. ;-)
    Thanks
    Stephane

    The difference between what you did and what Tom Kyte did is that you created your index on a clob column and Tom created his index on a blob column. What I don't know is why that makes a difference. I have demonstrated below with one blob column and one clob column, one index on the blob and one index on the clob, using the same code on both, with different results.
    SCOTT@orcl_11gR2> create table filter
      2    (query_id  number,
      3       document  clob)
      4  /
    Table created.
    SCOTT@orcl_11gR2> create table demo
      2    (id       int primary key,
      3       theblob   blob,
      4       theclob   clob)
      5  /
    Table created.
    SCOTT@orcl_11gR2> create index demo_blob_idx
      2  on demo (theblob)
      3  indextype is ctxsys.context
      4  /
    Index created.
    SCOTT@orcl_11gR2> create index demo_clob_idx
      2  on demo (theclob)
      3  indextype is ctxsys.context
      4  /
    Index created.
    SCOTT@orcl_11gR2> insert into demo values
      2    (1,
      3       utl_raw.cast_to_raw (
      4         '<html>
      5            <body>
      6              <p>
      7             This is a test of
      8             <strong> ctx_doc.filter </strong>
      9             and plaintext filtering.
    10              </p>
    11            </body>
    12          </html>'),
    13       '<html>
    14          <body>
    15            <p>
    16              This is a test of
    17              <strong> ctx_doc.filter </strong>
    18              and plaintext filtering.
    19            </p>
    20          </body>
    21        </html>')
    22  /
    1 row created.
    SCOTT@orcl_11gR2> exec ctx_doc.filter ('demo_blob_idx', 1, 'filter', 1, true)
    PL/SQL procedure successfully completed.
    SCOTT@orcl_11gR2> exec ctx_doc.filter ('demo_clob_idx', 1, 'filter', 2, true)
    PL/SQL procedure successfully completed.
    SCOTT@orcl_11gR2> select id, utl_raw.cast_to_varchar2 (theblob), theclob from demo
      2  /
            ID
    UTL_RAW.CAST_TO_VARCHAR2(THEBLOB)
    THECLOB
             1
    <html>
            <body>
              <p>
                This is a test of
                <strong> ctx_doc.filter </strong>
                and plaintext filtering.
              </p>
            </body>
          </html>
    <html>
          <body>
            <p>
              This is a test of
              <strong> ctx_doc.filter </strong>
              and plaintext filtering.
            </p>
          </body>
        </html>
    1 row selected.
    SCOTT@orcl_11gR2> select query_id, document from filter
      2  /
      QUERY_ID
    DOCUMENT
             1
    This is a test of ctx_doc.filter and plaintext filtering.
             2
    <html>
          <body>
            <p>
              This is a test of
              <strong> ctx_doc.filter </strong>
              and plaintext filtering.
            </p>
          </body>
        </html>
    2 rows selected.
    SCOTT@orcl_11gR2>

  • How does one strip out all Live Cycle data from a PDF and rebuild the form fields in Acrobat?

    Someone in a different department built a bunch of forms in Live Cycle. We now need to make minor edits to these forms but we all have Macs and can't use Live Cycle. Currently our only option to change a date and a name on each form  is to buy a new Windows workstation, buy a copy of Live Cycle and train someone for it.
    I understand the Live Cycle technology and Acrobat technology for forms are somehow different but there must be a way to just strip out all the Live Cycle form programming so that I just have the bare PDF with the text and layout.  Then make the text edits and rebuild the form fields in Acrobat.

    It depends on your PDF. Is the PDF a static XFA or a dynamic XFA?
    You can check to see if the PDF is static/dynamic by clicking File=>Save As, and it should say static or dynamic PDF as file type.
    iText will work with Static XFA forms created in LiveCycle. Dynamic XFA forms are not supported.
    You can also submit XML data to a server side script and parse the XML data using C# system.xml.xmlreader.
    Another tool that may speed the development of the project is:
    http://www.fdftoolkit.net/
    Note: FDFToolkit.net utilizes iText Technologies.

  • How can I take out all my HealthVault data?

    As titled, how can I take out all my HealthVault data?

    Hi,
    I'm assuming by "take out" you mean you want to save copies of your data to keep outside of HealthVault.
    In general, you want to use the Export feature. Find it by going to the Health Information page, then in the "More actions" menu, choose "Export information." If you're trying to send the data to another system, then you'll probably want to choose CCD or
    CCR format. If you are keeping records for yourself, HTML would be the human-readable format. The different formats behave differently, so I'd recommend checking that the exported file contains the data you want.
    After you export, then you'll want to go into the Files section of the Health Information page and download the files you want to keep. This will be especially important for files that came from a doctor, hospital, pharmacy, or lab.
    If you want to close your HealthVault account, make sure you've got all the data you want before you close the account. By the way, if by "take out" the data you meant "delete" the data, when you close your account the data will be deleted from HealthVault
    as per the Privacy Statement.
    Hope this helps,
    Kathy

  • How do you strip out certain groups of characters from a String variable

    for exapmle...
    String date = "11-Feb-2005";
    String day;
    String month;
    String year;
    how would you strip out '11' from date to assign it to 'day', and 'Feb' to assign it to 'month' and '2005' to assign it to 'year'.
    in my program the variable 'date' will always be in the format of:
    ist two digits are numbers followed by '-'
    then three digits (letters) followed by '-'
    then four digits that are numbers.
    i think it has something got to do with charAt or something, im not sure how to do it.
    any ideas?

    yea i tried the first method and it works fine.
    thanks very much.
    also... i tried the other one and it outputs... 11 1 2005
    which means it works but you see i wanted to put the date in the format of...
    Calendar date = new GregorianCalendar(2005, Calendar.FEBRUARY, 11); so i can compare it to another Calender object to see which one is earlier.
    that is why i am doing it like this...
    Calendar date = new GregorianCalendar(+ year + ", Calendar." + month + ", " + day);
    for example...
    Calendar xmas = new GregorianCalendar(1998, Calendar.DECEMBER, 25);
    Calendar newyears = new GregorianCalendar(1999, Calendar.JANUARY, 1);
    // Determine which is earlier
    boolean b = xmas.after(newyears); // false
    b = xmas.before(newyears); // true
    anyways i am just curious.

  • Does Firefox work with all html tags/CSS properties?

    I am considering Firefox because MSIE has been and is becoming more annoying.
    I want a browser that simply implements all html tags and CSS properties.
    I want Firefox to install without screwing with any other application on my computer.
    Possible?

    Sure no problems to install Firefox alongside other browsers.<br />
    You only need to decide which browser to set as the default browser that is used when you click a link in other programs.
    *http://developer.mozilla.org/en/Mozilla_CSS_support_chart
    *https://developer.mozilla.org/en/HTML

  • How can i find out all the different meanings of  User ID Paramtrs (USR05)?

    Dear Gurus
    How can i find out all the different meanings of  User ID Parameters (table USR05)?
    Thanks
    Nuno Natividade

    hi,
       the values are stored in the table tpara and they are used to put the landscape of the system,and to know the log of the system.
      FORM DETERMINE_LANDSCAPE .
        SELECT SINGLE * FROM USR05
                        WHERE BNAME = SY-UNAME
                            AND PARID = 'ZLANDSCAPE'.
        IF SY-SUBRC = 0.
          IF NOT USR05-PARVA IS INITIAL.
       REPLACE
          ELSE.
            PERFORM DETERMINE_FROM_CENTRAL_SYSTEM.
          ENDIF.
        ELSE.
          PERFORM DETERMINE_FROM_CENTRAL_SYSTEM.
        ENDIF.
      ENDFORM.                    " determine_landscape
      FORM DETERMINE_LOG_PARAMETER.
        SELECT SINGLE * FROM USR05 WHERE BNAME = SY-UNAME
                            AND PARID = 'ZLOG_USER'
                            AND PARVA = 'X'.
        IF SY-SUBRC = 0.
          WITH_LOG = 'X'.
        ENDIF.
    <REMOVED BY MODERATOR>
    venkat.
    Edited by: Alvaro Tejada Galindo on Mar 4, 2008 12:47 PM

  • HT5625 I cannot send a text thru iMessage.  I followed the directions over and over again but can't make it work.  Also how can I find out all the apple ids I may have.

    I cannot send a text thru iMessage.  I followed the directions over and over again but can't make it work.  Also how can I find out all the apple ids I may have.

    A wet phone is out of warranty. This is considered user damage. Even if you were able to get it to start now, the chances of it working for long are slim. I suggest going to Apple and see about an OOW replacement. One for the iPhone 4 is only $149USD and it would come with a short warranty. It is a refurbished device and you would not be worried about encountering additional problems.

  • How do I find out which battery is right for my mac pro 13 inch Processor  2.4 GHz Intel Core i5 late 2011 model

    how do I find out which battery is right for my mac pro 13 inch Processor  2.4 GHz Intel Core i5 late 2011 model

    You could try ringing apple and seeing if they will give you a model number for the battery and then find one on google. Or you may even be able to order one from apple themselves.
    They have support numbers for all over the world, in case you don't have the number you need, this is the support number website with all the numbers all over the world:
    http://support.apple.com/kb/HE57?viewlocale=en_US&locale=en_US

  • I havev installed a new Hard Drive on my laptop but cannot find my Photoshop Elements product box with the serial number to install Elements 10 or Elememts Premier 10. How can I find out the serial / installation codes for these?

    I havev installed a new Hard Drive on my laptop but cannot find my Photoshop Elements product box with the serial number to install Elements 10 or Elememts Premier 10. How can I find out the serial / installation codes for these? I saw a reply however, it was not my name and assumed it was for someone else. Now timed out. Can UI still have help on this matter please?

    If you registered PSE before, go to the main page of adobe.com, sign in and go to your account and you should find the serial number there in your purchases.

  • Table Name or Function Module to find out all the Screens & Subscreens for

    Hello Experts,
          Table Name or Function Module to find out all the Screens & Subscreens for all T-Codes
    Helpful Answer will b rewarded
    Arif Shaik

    Hi Balaji,
       But TSTC only Gives the Program Name , T-Code and Screen but not all the Subscreen details
    Any other which U know

Maybe you are looking for

  • Installing Adobe Reader on Drive D

    How do you install on Drive D.  Adobe Reader automatically installs on drive C.

  • HT203171 I use Apple adapter. The problem still persist.

    The article referred is incorrect in my case. I have used my macbook pro for more than 3 years and this issue just came up lately, at least within a month or so. Are you sure this issue is not caused by faulty update from Apple? I use 10.6.8 snow leo

  • Very Big Cell when export in Excel

    Dear Tech guys, I use VS 2008 with VB and CR 2008. Crystal report, and export in PDF are OK, but when i export the report in Excel, i have the bellow problems. The report is a delivery note with 7 columns and many rows. 1. In all pages, the page numb

  • Can I develop a data acquisition app without the DAQ card installed?

    I'm new to LabView. I would like to build a simple data acquisition app (graphical display and logging). I would prefer to build this on my desktop, and then install/run it on a laptop. I'm not sure how to work this as the DAQ card and driver are not

  • Website asking for "java(TM) platform se 7 u". What is it?

    Hi there. The Website: "familytreemaker.genealogy.com" generates the following message: "Allow familytreemaker.genealogy.com to run java(TM) platform se 7 u?" I've never seen this before. It usually just asks to run "Java" in general. Thanks in advan