Strip HTML by regexp?

Hi,
 
In short: what's the regexp for stripping HTML tags?
 
I want to give users the option of some basic formatting when entering note fields. Most of the time a plain Text Area will suit their purposes, and displaying a Text Area with HTML Editor will just scare them.
 
(Sadly Text Area with HTML Editor items are not configurable - I really want to be able to switch off options like Justification, Foreground and Background. It's very irritating to be typing this post into the sort of item I want to provide!)
 
My prototype solution is a radio button which allows users to toggle between a plain Text Area and one with an Editor, preserving any text so far entered. Fine, I've got that far.
 
When the user switches from plain text to Editor, I just replace chr(10) with ' ' and all is well.
 
When the user switches from Editor to plain text, I need to strip the HTML tags out.
 
I've googled far and wide for a regular expression to do it, but all the solutions I've found are in VB, Perl, Python or whatever.
 
Clearly the HTMLDB team have cracked it - I imagine that the Strip HTML option on reports is a regexp_replace - so would anyone please care to share?
 
Many thanks,
 
John D
PS the commonly-cited expression '<(.|\n)*?>' doesn't work:
 
SQL> select regexp_replace('abc','<(.|\n)*?>',null) str from dual;
S
- 
PPS when I previewed this post, my linebreaks were lost, so I had to put them in manually.Odd.

SQL> select regexp_replace('abc','<[^>]*>',null) str from dual;
STR
abcseems to work for simple markup

Similar Messages

Bug report: Strip HTML in reports

Create a report region using the following SQL
select * from apex_application_pages
where application_id=:APP_IDThe default for Strip HTML is Yes in Report Attributes. Yet, if some rows contain CSS or Javascript snippets in the PAGE_HTML_HEADER column, they are NOT escaped i.e. the report column is empty and the code is dumped in on the page.
Needless to say, this causes havoc with the presentation because the CSS is actually applied to the page.
Thanks.

Can you elaborate on that?The Strip HTML option (Yes/No for the whole report) allows html tags in column values to be removed (not escaped) when they are emitted when the column's HTML Expression attribute references the column in #COL# notation.
And I agree, the Display As attribute of the Tabular Form Element section is a bit misleading as situated. Whereas most of the attributes in that section pertain to tabular form elements, the Display As LOV provides two additional variations on the Standard Report Column type: Display as Text (based on LOV, does not save state), and Display as Text (escape special characters, does not save state). Changing the selection among these three changes how non-updatable report columns are rendered, regardless of whether they are part of an updatable report.
Scott

Interactive report - strip HTML

Hi,
Where is the option to strip HTML for an Interactive report? Is it possible to set this option for each column?
N.B. This setting can be found under Layout and Pagination for classic report.
Thanks,
Louis-Guillaume
Homepage: http://www.insum.ca
Blog: http://insum-apex.blogspot.com

Louis-Guillaume,
Because we allow filters on the actual column values, strip HTML is not possible as an option for interactive reports. Any changes to the column values should be done inside the query.
- Marco

APEX strips HTML in LOV

Hi,
I have a hierarchical query to generate LOV values with a delimiter separating each level in the hierarchy. I use a regular expression to surround the last value with 'value'.
However, when I test the results, either the opening bold tag or both the opening and closed bold tags are stripped from the LOV (depending on what's between them).
Does anyone know how to tell APEX to not strip the tags? I looked at earlier posts and saw 'Strip HTML ' as a setting but could not find that in the scope of an LOV.
Here is a simple example of the LOV query that will be affected by this:
select 'TEST' display_value, dummy return_value from dual;
Thanks,
Corey

Hi Corey,
There's no simple way to highlight just a portion of the text unless you use uppercase or asterisks or something else that stands out. Instead of using &nbsp;, you should use character 160 - in your sql statement, where you want this to appear, hold down the alt key, then type 0160 on the number keypad and then release the alt key. On screen this is a space but for html, this is just a character like any other.
One other method, which I seen used somewhere in this forum, is to insert a separator option and then disable that using the following in the region footer:
<script type="text/javascript">
var x = document.getElementById("P1_LANGUAGE");
var o = x.options;
for (var k = 0; k < o.length; k++)
if (o[k].value == '- - - - - - - - - - - - -')
o[k].disabled = true;
</script>The '- - - - - - - - - - - - -' shown is whatever value you specify for the separator.
I'm not sure how easy it would be for you to do this in your sql statement?
Otherwise, as Tyler suggests, try things in static files - this is how I tend to work out this sort of thing. If you can get something to work there, it should be doable in Apex.
Andy

Need some help - function that strips html tags

Hey peeps,
Need some guidance.
I need help to write a function that will strip html tags from a string. So for examples if I have the following string:
myString = "This is a Paragraph"; after runing the function it should return the string: "This is a Paragraph".
Could anyone plz point me in the right direction on how to do this.
Thanks,
Zub

System.out.println(myString.replaceAll("<[^>]++>\\s*", ""));

Oracle function to strip HTML tags in data

Hi, may I know if there is any oracle function that can strip HTML tags from the data? I am currently using Oracle 9i. Therefore is unable to use the RegExp_Replace functionality provided by Oracle 10g.
Any help will be appreciated.
Thanks a lot!

I found this function:
function str_html ( line in varchar2 ) return
varchar2 is
x varchar2(32767) := null;Let's hope whoever uses this doesn't want to deal with large XML's.
This question would have been better asked over in the XDB forum. All depends whether there is a schema available to describe the XML, what version of the database etc. With a schema, the schema could be registered into the Oracle 10g database and that used to create relational tables automatically, from where you can load in the XML files. Loads of information available on the XDB forum, most likely in the FAQ's on there.

How to strip HTML out of the form field but leave the basic user formatting?

What would you recommend to automatically strip out HTML that a user has entered into a form field? At the same time we need to preserve the basic formatting that was submitted by the users i.e. replace tags with CRLF, etc. StripHTML function is perfect but it removes all HTML and therefore, all formatting. Is there anything more flexible?
Thanks!

Do you need to strip it, or just render it inoperable.
The latter can be done with the htmlEditFormat() and htmlCodeFormat() functions.
If you want the striping, take a look at related functions at the http://www.cflib.org site. I know I have seen HTML replace functions that had the ability to be configured to strip and|or not strip a select list of tags.
StripHTML() may actually have this feature (I believe it is hosted at cflib.org). You may just need to see the documentation on how to configure it thus.

Strip html tags from string & convert ampelsand charachters

hello, i'm converting html into xml, and i need to convert html code & content into xml content, withouth the html tags ...
so, for example, I strip this out of an html file:
<A NAME="b_betreft"></A>STUDIEOPDRACHT "UITBREIDING VIPA NAAR MEERDERE SUBSECTOREN" HERVERDELING VASTLEGGINGS- EN VEREFFENINGSKREDIETEN VAN HET VIPA VOOR HET JAAR 1999 ONTWERPBESLUIT VAN DE VLAAMSE REGERING TOT HERVERDELING VAN BASISALLOCATIES VAN DE BEGROTING VAN HET VLAAMS INFRASTRUCTUURFONDS VOOR PERSOONSGEBONDEN AANGELEGENHEDEN VOOR HET BEGROTINGSJAAR 1999<A NAME="e_betreft">
and i want to get rid of the "<A NAME="b_betreft"></A>" & "<A NAME="e_betreft">", are there classes that can do this ???
probably there are, i know in php there are ..., how about java ???
also i'll need to correct stuff like:
Financiële => Financi�le
Comit&eacute=>Comit�
you see, then, i'm done, cool ...
thanks dudessssss

hello, i'm converting html into xml, and i need to
convert html code & content into xml content,
withouth the html tags ...Why didn't you continue to post in your other thread?
http://forum.java.sun.com/thread.jspa?threadID=777660
It's not nice to create multiple threads with the same question.
Kaj

ASP - strip html from loaded page

Hi,
I have a link to an external site (although owned by the same
company)... which when clicked opens a new window and displays some
results.
The link in question actually subits form and passed some
values to the new window which loads the external site -takes the
form values and displays the resutls.
pretty easy so far..
problem is.. the results page shows a load of info that I
don't want...
so I'm looking for a way of stripping the html coding out
before the display...
now normally that would be easy if the html was on the site
side.. but it's being generated on the other side.
so basically I need to have an ASP page.. which loads the
outside source.. gets the info out of it.. and then displays that.
what I don't know how to do is stuff the generated html into
an ASP variable first.. before displaying it..
Any ideas?
James

You need to use the server-side XMLHTTP object to request the
page in order
to do that.
http://www.4guysfromrolla.com/webtech/110100-1.shtml
You'll almost certainly have version 3 of the component, so
read the code
comments as a line or two differs for version 3.
"jamesy" <[email protected]> wrote in message
news:e41t86$bgr$[email protected]..
> what I don't know how to do is stuff the generated html
into an ASP
> variable
> first.. before displaying it..
>
> Any ideas?
>
> James
>

Stripping HTML thru regular expression(pls help)

Hi all..
I've been trying to use the regular OROMatcher-1.1 expression package downloaded from apache.org.
it works well with my program but i m having problems building correct regular expression to strip off HTML tags.
can any of u help me build an expression tha strips of ALL html tags including those with funny spaces such as:
<a href = "www.here.com">click me</a>
do help pls. i've tried for ages and its driving me mad

Hi,
Wont go into much details but the simplest way to do that would be using XML technology. Try using SAX or DOX whatever you feel comfortable with. I think SAX would be a better choice. For details visit
http://java.sun.com/xml/?frontpage-spotlight
/khurram

List of stripped HTML-Tags in Publication Mode

Hi,
I'm doing some research about the printing abilities of MDM PCM.
For this reason im searching for the HTML-Tags which are stripped out in the different Modes (especially Publication Mode)
I got a hind in "MDM Data Manger Reference Guide" on Page 521, that HTML tags are stripped out, but I can't find a List with the tags.
Regards
Christoph Kohler

Hi Christoph,
In terms of properly viewing HTMLs in the publication modes, there is actually a small piece of intermediary software that is available to our customers who use MDM to publish catalogs. It is called the HTML filter. Basically what it does is translate your HTMLs to tags that may be read by the publication modes. Tags that are supported by this filter are:
attributesStr << " border-top-style:solid; border-top-color:black; border-top-width:" << borderWidth << "pt;";
attributesStr << " border-left-style:solid; border-left-color:black; border-left-width:" << borderWidth << "pt;";
attributesStr << " border-bottom-style:solid; border-bottom-color:black; border-bottom-width:" << borderWidth << "pt;";
attributesStr << " border-right-style:solid; border-right-color:black; border-right-width:" << borderWidth << "pt;";
attributesStr << " font-style:italic;";
attributesStr << " font-weight:bold;";
attributesStr << " font-size:" << dFontSize << "pt;";
attributesStr << " font-family:" << fontFamily << ";";
attributesStr << " text-align:" << textAlign << ";";
attributesStr << " vertical-align:top;";
attributesStr << " vertical-align:middle;";
attributesStr << " vertical-align:bottom;";
attributesStr << " background-color:" << bgColor << ";";
attributesStr << " vertical-align:sub;";
attributesStr << " vertical-align:super;";
Hard coded table attributes are
cleansedHtmlStr += StringNL("<table cellpadding=\"0\" cellspacing=\"0\" style=\"border-collapse:collapse\"");
The table cells support rowspan and colspan attributes.
 is used for hard returns.
tags that are picked out of the source HTML are:
mainTags.push_back(StringNL("table"));
mainTags.push_back(StringNL("#text")); //meaning any text outside of the table
mainTags.push_back(StringNL("br"));
These three types of objects are represented in the filtered HTML with style attribute markup.
For example, #text is filtered to my text
The use of this filter is a temporary workaround. Development is planned to incorporate this process into the Data Manager so that the user will eventually have the option of creating HTML "variants" similar to the variants that can currently be created for images. This way, you will be able to store a version of the HTML code that is appropriate for both web and print.
Please contact me directly via email [email protected] for further information regarding this issue.
Hope this is helpful,
Neta

Stripping HTML Tags from a String

What's the best way to remove html tags from a string (i.e. user input)?

Can you give an example? You can do substring, if your passing spaces between pages you can do a trim to the variable. Also look at the indexOf(). Look at methods relating to java.lang.String.

Parsing HTML with RegExp

I have loaded a .html file and want to parse some variables. I need to search the html for a table with the ID of "formRedirNormalRadios", then parse that table again. In short, this html:
<more html>
<td id="formRedirNormalRadios">
<input type="radio" name="choice1" value="1" onclick="controlRedirNormal()" checked="checked" />
Option 1 
<input type="radio" name="choice1" value="2" onclick="controlRedirNormal()" checked="checked" />
Option 2 
<input type="radio" name="choice1" value="3" onclick="controlRedirNormal()" checked="checked" />
Option3
</td>
<also some more html>
would have to result in this variable:
Array = [formRedirNormalRadios, 1, checked, 2, checked, 3, checked]

Anyway I don't know if in the future the org.w3c.dom
package will be implemented as standard in java. Does
anyone know?Not only in the future, in the past as well. It has been part of Java since the 1.3 release.
I've got the org.w3c.dom package from rt.jar in the
jre directory.
Is the package downloadable elsewhere?

Strip Tags From Interactive Report Download.

Hi,
Is it possible to strip html tags from a column during a CSV download on a interactive report?
Basically i have a report where one of the columns is as follows
453 | 0
but in the download i want to strip out the tags so in the csv the column would show
453 | 0
Any ideas how i can do this using the standard csv download from the drop down menu?
Thanks Andy

Just wondering if any one had any thoughts on how I may be able to do this

Html tags in pdf printing

Hi ,
I have a pl/sql region which I use to display a report .
Now I want a Pdf output of this pl/sql region. For this I used the Bi publisher and was able to design the report query and report layout as required.
However my problem is that some of the columns in my table have html tags (since the input for these column is through an text area with html editor) . When the PDF is generated the html tags are not converted.
Please advice me on this.
Thanks,
Deepa
Edited by: Deepa J on Sep 25, 2008 6:01 AM

Hi Deepa.
You wont get the HTML to be interpreted in the BI Publisher output I'm afraid.
You'll need to wait for APEX 4.0 for that functionality.
If you're seeing the HTML tags in your PDF and you use a Classic report region you can try selecting the "Strip HTML" option in the Layout and Pagination settings.
Regards
Simon Gadd

Strip HTML by regexp?

Similar Messages

Maybe you are looking for