Oracle function to strip HTML tags in data

Hi, may I know if there is any oracle function that can strip HTML tags from the data? I am currently using Oracle 9i. Therefore is unable to use the RegExp_Replace functionality provided by Oracle 10g.
Any help will be appreciated.
Thanks a lot!

I found this function:
function str_html ( line in varchar2 ) return
varchar2 is
x varchar2(32767) := null;Let's hope whoever uses this doesn't want to deal with large XML's.
This question would have been better asked over in the XDB forum. All depends whether there is a schema available to describe the XML, what version of the database etc. With a schema, the schema could be registered into the Oracle 10g database and that used to create relational tables automatically, from where you can load in the XML files. Loads of information available on the XDB forum, most likely in the FAQ's on there.

Similar Messages

Need some help - function that strips html tags

Hey peeps,
Need some guidance.
I need help to write a function that will strip html tags from a string. So for examples if I have the following string:
myString = "This is a Paragraph"; after runing the function it should return the string: "This is a Paragraph".
Could anyone plz point me in the right direction on how to do this.
Thanks,
Zub

System.out.println(myString.replaceAll("<[^>]++>\\s*", ""));

HTML Tags in data field

Does Documaker support reading and applying HTML tags in data fields? For example, we are publishing a list of questions that are being passed to me from the database. These questions contain embedd HTML formatting tags:
Are you a member or do you intend to become a member of the armed forces <.i>including reserves</.i>?
** Please note: I had to add the . into the tag because the forum was formatting italics tag appropriately.
I want Documaker to read the sting and apply the formats that the tags specify.
Any help would be appreciated.
Dave

Dave,
Have a look at the rule MessageFromExtr, it may help you.
Gaétan

List of stripped HTML-Tags in Publication Mode

Hi,
I'm doing some research about the printing abilities of MDM PCM.
For this reason im searching for the HTML-Tags which are stripped out in the different Modes (especially Publication Mode)
I got a hind in "MDM Data Manger Reference Guide" on Page 521, that HTML tags are stripped out, but I can't find a List with the tags.
Regards
Christoph Kohler

Hi Christoph,
In terms of properly viewing HTMLs in the publication modes, there is actually a small piece of intermediary software that is available to our customers who use MDM to publish catalogs. It is called the HTML filter. Basically what it does is translate your HTMLs to tags that may be read by the publication modes. Tags that are supported by this filter are:
attributesStr << " border-top-style:solid; border-top-color:black; border-top-width:" << borderWidth << "pt;";
attributesStr << " border-left-style:solid; border-left-color:black; border-left-width:" << borderWidth << "pt;";
attributesStr << " border-bottom-style:solid; border-bottom-color:black; border-bottom-width:" << borderWidth << "pt;";
attributesStr << " border-right-style:solid; border-right-color:black; border-right-width:" << borderWidth << "pt;";
attributesStr << " font-style:italic;";
attributesStr << " font-weight:bold;";
attributesStr << " font-size:" << dFontSize << "pt;";
attributesStr << " font-family:" << fontFamily << ";";
attributesStr << " text-align:" << textAlign << ";";
attributesStr << " vertical-align:top;";
attributesStr << " vertical-align:middle;";
attributesStr << " vertical-align:bottom;";
attributesStr << " background-color:" << bgColor << ";";
attributesStr << " vertical-align:sub;";
attributesStr << " vertical-align:super;";
Hard coded table attributes are
cleansedHtmlStr += StringNL("<table cellpadding=\"0\" cellspacing=\"0\" style=\"border-collapse:collapse\"");
The table cells support rowspan and colspan attributes.
 is used for hard returns.
tags that are picked out of the source HTML are:
mainTags.push_back(StringNL("table"));
mainTags.push_back(StringNL("#text")); //meaning any text outside of the table
mainTags.push_back(StringNL("br"));
These three types of objects are represented in the filtered HTML with style attribute markup.
For example, #text is filtered to my text
The use of this filter is a temporary workaround. Development is planned to incorporate this process into the Data Manager so that the user will eventually have the option of creating HTML "variants" similar to the variants that can currently be created for images. This way, you will be able to store a version of the HTML code that is appropriate for both web and print.
Please contact me directly via email [email protected] for further information regarding this issue.
Hope this is helpful,
Neta

Query to extract HTML tag with data

Hi All,
I have a string.
'<HTML><HEAD>THIS IS HEAD.</HEAD><BODY>THIS IS BODY.THIS IS P1.NIMISHTHIS IS P2.</BODY></HTML>'
I want to extract a html tag including its opening & closing tab with data as
if i say P1
then the output should be
'THIS IS P1.'
for P2
then the output should be
THIS IS P2.
please help me in writing this query with regular expression
i have tried it as following but it is not giving desired result:
WITH T AS
SELECT
 '<HTML><HEAD>THIS IS HEAD.</HEAD><BODY>THIS IS BODY.THIS IS P1.NIMISHTHIS IS P2.</BODY></HTML>' STR
FROM
 DUAL
SELECT REGEXP_SUBSTR(STR, '.+P2.+') FROM T
Thanks & Regards
Nimish GargEdited by: Nimish Garg on May 7, 2012 5:49 PM

Nimish Garg wrote:
My requirement is to extract a <tag>data</tag> from a HTML/XML string
where data contains any specified value.HTML is not XML.
And that is a critical distinction to make. HTML parsing is horribly complex. XML is quite easy. For HTML you have to code your own parser in PL/SQL. XML can be parsed using the XMLTYPE class/data type in PL/SQL.
So if you need to find a single specific tag in HTML - I would not try to treat it as XML. I may not even try to use regular expressions.
I would do a basic substring search for the start of the tag. Read the data following the tag. Ensure that there are no nested or embedded tags in the data. Until the end tag is read. Because HTML is that much abused - and because that is an accepted norm as parsers used by browsers deals with that abuse without complaining.
Proper HTML is mostly a myth in my experience of "screen scraping" web servers for data extraction as they do not have web services supplying the data.

Strip html tags from string & convert ampelsand charachters

hello, i'm converting html into xml, and i need to convert html code & content into xml content, withouth the html tags ...
so, for example, I strip this out of an html file:
<A NAME="b_betreft"></A>STUDIEOPDRACHT "UITBREIDING VIPA NAAR MEERDERE SUBSECTOREN" HERVERDELING VASTLEGGINGS- EN VEREFFENINGSKREDIETEN VAN HET VIPA VOOR HET JAAR 1999 ONTWERPBESLUIT VAN DE VLAAMSE REGERING TOT HERVERDELING VAN BASISALLOCATIES VAN DE BEGROTING VAN HET VLAAMS INFRASTRUCTUURFONDS VOOR PERSOONSGEBONDEN AANGELEGENHEDEN VOOR HET BEGROTINGSJAAR 1999<A NAME="e_betreft">
and i want to get rid of the "<A NAME="b_betreft"></A>" & "<A NAME="e_betreft">", are there classes that can do this ???
probably there are, i know in php there are ..., how about java ???
also i'll need to correct stuff like:
Financiële => Financi�le
Comit&eacute=>Comit�
you see, then, i'm done, cool ...
thanks dudessssss

hello, i'm converting html into xml, and i need to
convert html code & content into xml content,
withouth the html tags ...Why didn't you continue to post in your other thread?
http://forum.java.sun.com/thread.jspa?threadID=777660
It's not nice to create multiple threads with the same question.
Kaj

Stripping HTML Tags from a String

What's the best way to remove html tags from a string (i.e. user input)?

Can you give an example? You can do substring, if your passing spaces between pages you can do a trim to the variable. Also look at the indexOf(). Look at methods relating to java.lang.String.

Function result into html tag

Hi guys,
i need a help:
I have a function(text_color) that returns a color value that i want to insert in a font tag.
and this is what i wanna do

How can i do it?
tks

hi davide--
htmldb application are online by their very nature. they're all web-based; accessed via a url. the links to which you are referring are called the Developer Toolbar. as explained in the HTML DB documentation, that toolbar only appears when a developer has logged into both the application and the associated htmldb workspace from the same browser. if you download the new 1.6 version of htmldb...
http://www.oracle.com/technology/products/database/htmldb/download.html
...there's a new chapter 12 called "Deploying an Application". you should probably check there to get the full scoop on making your app more widely available.
regards,
raj

HTML tags not displayed when using Data Template

Hi All...
I'm developing a BI Publisher report in which one of the columns is a clob data type. I'm using an xsl stylesheet to format the data present in the clob column.
I've developed the report using data template as the data set. The problem is the clob column which has the HTML tags where not displayed properly...for example
the tag starting with
<
is replaced with
& lt;
I did a couple of searches in this forum and in tim's blog but couldn't find a proper solution...
http://blogs.oracle.com/xmlpublisher/2007/01/formatting_html_with_templates.html
API and HTML Formated Content
Re: Problem with text data elements containing escaped HTML codes
HTML Output from CDATA
Re: HTML formatted output
Re: Special characters in CLOB are making report fail
Re: Formatting of HTML tag problem
I'm using BI Publisher standalone:Release 10.1.3.2. In one of the threads..
Re: Special characters in CLOB are making report fail
I came to know that data template cannot generate proper HTML tags for release 10.1.3.2. Is there any work around way to get the proper HTML tags when data template is used as a data set?
Thanks in Advance...
Edited by: user10280715 on Dec 9, 2008 3:13 PM

Issue could be with the data that is selected in the other environment. It generally happens that the ALV will not give the same results as in the DEV in the other systems.
Possible errors could be the control break statements in the loop...endloop block. validate the correctness of the control break stmts if any.

JSP/Servlets functions: clear HTML tags, clear SQL code, validate E-mail

Hello!
I am looking for java functions, which:
- clear HTML tags
- clear SQL injection code
- validate an e-mail address
Probably there are java build-in functions for doing that.
Maybe anyone could give me their names?
I would be also grateful for any help, links or something.

Hi,
You could try using ,
DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver());
instead of
DriverManager.registerDriver(new oracle.jdbc.OracleDriver());
-Amol

The find function (Ctl+F) , doesn't not expanding the xml file , to search for given search. If the the xml file is expanded , then find function finds the tag and data. How to fix this.

The find function doesn't expanding the xml nodes to search. If the xml is expanded , then find function highlights both matching tag and data. how to fix this.
== This happened ==
Every time Firefox opened

<xsl:value-of select="x"/> produces a string that consists of all text nodes in x.
<xsl:copy-of select="x"/> produces an exact copy of x.
Go to http://www.zvon.org/ for more information like this.

Need to Open a new Window through Oracle Function

Hi all ,
I have created an Oracle Function say Func1 and tagged it to a menu of HTML TAB Type .
The function is working fine . However , i have a requirement.Whenever i click on the Func1 Tab , the function should open in a new window .
The Web_Html Call of the function is a call to a HTML Page.
I want the HTML page to open in a new window
Kindly advise how to do the same
Thanks
Chirag

The server cannot affect windows on the client machine.
All a server sees is a request, to which it sends a response.
What the client does with that response is up to it.
Simplest way is in viewresult.jsp set the target of the form to be "_blank"
This tells it to open the results of the request into a new window.
You can't decide at the server side whether to remain on the same page, or open a new window. You have to make that decision when you submit the page - not on the results of that submission.

Strip HTML by regexp?

Hi,
 
In short: what's the regexp for stripping HTML tags?
 
I want to give users the option of some basic formatting when entering note fields. Most of the time a plain Text Area will suit their purposes, and displaying a Text Area with HTML Editor will just scare them.
 
(Sadly Text Area with HTML Editor items are not configurable - I really want to be able to switch off options like Justification, Foreground and Background. It's very irritating to be typing this post into the sort of item I want to provide!)
 
My prototype solution is a radio button which allows users to toggle between a plain Text Area and one with an Editor, preserving any text so far entered. Fine, I've got that far.
 
When the user switches from plain text to Editor, I just replace chr(10) with ' ' and all is well.
 
When the user switches from Editor to plain text, I need to strip the HTML tags out.
 
I've googled far and wide for a regular expression to do it, but all the solutions I've found are in VB, Perl, Python or whatever.
 
Clearly the HTMLDB team have cracked it - I imagine that the Strip HTML option on reports is a regexp_replace - so would anyone please care to share?
 
Many thanks,
 
John D
PS the commonly-cited expression '<(.|\n)*?>' doesn't work:
 
SQL> select regexp_replace('abc','<(.|\n)*?>',null) str from dual;
S
- 
PPS when I previewed this post, my linebreaks were lost, so I had to put them in manually.Odd.

SQL> select regexp_replace('abc','<[^>]*>',null) str from dual;
STR
abcseems to work for simple markup

Need to copy Data from a specific Html Tag

Hello,
I am trying to use CF to access website and capture data from a specific tag to the end of that tag and store same in a csv file or database.
The tag based search of an open file is where I am not able to get any head way. Any one has done this?

You'll need to use a regular expression for that. CF supports regular expressions with the REFind, REFindNoCase and REReplace functions. Here's an example of using regular expressions to capture the value within an HTML tag:
http://www.javamex.com/tutorials/regular_expressions/example_scraping_html.shtml
It's in Java, but the syntax for regular expressions is the same in CF.
Dave Watts, CTO, Fig Leaf Software
http://www.figleaf.com/
http://training.figleaf.com/
Fig Leaf Software is a Veteran-Owned Small Business (VOSB) on
GSA Schedule, and provides the highest caliber vendor-authorized
instruction at our training centers, online, or onsite.
Read this before you post:
http://forums.adobe.com/thread/607238

Stripping all HTML tags from a CLOB

Hi all,
Running Oracle 9.2.0.8 on AIX...
We have a table which stores HTML document fragments in a clob. I have a requirement to convert these to plain/text (strip all HTML tags) for sending in a plain/text email body.
I have read the following solution from Tom Kyte's site:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:25695084847068
Basically creating an Oracle text index on the CLOB column and calling ctx_doc.filter with "plaintext" parameter set to true.
I noticed in Tom's example, he uses the default filter, which based on the docs, is NULL_FILTER, which applies no filtering. I have tried his example in my dev box, creating the text index on the CLOB column with no parameters.
The call to ctx_doc.filter did not filter the html at all. I re-created the index and specified the INSO_FILTER and the filtering was done. I was under the impression that INSO_FILTER was for filtering binary content to plaintext...
create table filter ( query_id number, document clob );
create table demo
( id int primary key,
 theclob clob
create index demo_idx on demo(theClob) indextype is ctxsys.context;
SET DEFINE OFF;
Insert into DEMO
 (ID, THECLOB)
Values
 (1, '<html><body>This is a test of ctx_doc.filter and plaintext filtering.</body></html>');
COMMIT;
exec ctx_doc.filter('demo_idx',1, 'filter',1, true);The above code does not convert the html to plaintext...
Now re-create with the index with INSO_FILTER
drop index demo_idx;
create index demo_idx on demo(theClob) indextype is ctxsys.context parameters ('filter ctxsys.inso_filter');
exec ctx_doc.filter('demo_idx',1, 'filter',1, true);Above scenario returns string "This is a test of ctx_doc.filter and plaintext filtering."
The ORacle documentation doesn't specify any special filter parameter that needs to be set... just wondering if I'm missing soemthing here... or better yet, if there is a better solution to my problem. ;-)
Thanks
Stephane

The difference between what you did and what Tom Kyte did is that you created your index on a clob column and Tom created his index on a blob column. What I don't know is why that makes a difference. I have demonstrated below with one blob column and one clob column, one index on the blob and one index on the clob, using the same code on both, with different results.
SCOTT@orcl_11gR2> create table filter
2 (query_id number,
3 document clob)
4 /
Table created.
SCOTT@orcl_11gR2> create table demo
2 (id int primary key,
3 theblob blob,
4 theclob clob)
5 /
Table created.
SCOTT@orcl_11gR2> create index demo_blob_idx
2 on demo (theblob)
3 indextype is ctxsys.context
4 /
Index created.
SCOTT@orcl_11gR2> create index demo_clob_idx
2 on demo (theclob)
3 indextype is ctxsys.context
4 /
Index created.
SCOTT@orcl_11gR2> insert into demo values
2 (1,
3 utl_raw.cast_to_raw (
4 '<html>
5 <body>
6 
7 This is a test of
8 ctx_doc.filter 
9 and plaintext filtering.
10 
11 </body>
12 </html>'),
13 '<html>
14 <body>
15 
16 This is a test of
17 ctx_doc.filter 
18 and plaintext filtering.
19 
20 </body>
21 </html>')
22 /
1 row created.
SCOTT@orcl_11gR2> exec ctx_doc.filter ('demo_blob_idx', 1, 'filter', 1, true)
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> exec ctx_doc.filter ('demo_clob_idx', 1, 'filter', 2, true)
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> select id, utl_raw.cast_to_varchar2 (theblob), theclob from demo
2 /
 ID
UTL_RAW.CAST_TO_VARCHAR2(THEBLOB)
THECLOB
 1
<html>
 <body>
 
 This is a test of
 ctx_doc.filter 
 and plaintext filtering.
 
 </body>
 </html>
<html>
 <body>
 
 This is a test of
 ctx_doc.filter 
 and plaintext filtering.
 
 </body>
 </html>
1 row selected.
SCOTT@orcl_11gR2> select query_id, document from filter
2 /
QUERY_ID
DOCUMENT
 1
This is a test of ctx_doc.filter and plaintext filtering.
 2
<html>
 <body>
 
 This is a test of
 ctx_doc.filter 
 and plaintext filtering.
 
 </body>
 </html>
2 rows selected.
SCOTT@orcl_11gR2>

Oracle function to strip HTML tags in data

Similar Messages

Maybe you are looking for