Verity Search - exclude html tags when indexing?

I have a table I want to index, but some data stored in the
table contains HTML.
I want to index the content, but I want all HTML tags to be
excluded.
This is a problem, say you had a table storing all retail
stores, there's some HTML in the data, and you do a Verity search
for "Target" (as in the Target retail store). You will return a lot
of irrelevant results if there's a
target="blank" attribute in an A HREF tag, for example. Can
you strip these out during the <CFINDEX> ?
Any idea on how to accomplish this?

Hi user494326,
I'm actually having trouble getting html into a CLOB. It sounds like you were able to do this successfully. What did you do to get it in as far as escaping characters, etc. I'd love to see how you handled that, it would be greatly appreciated!

Similar Messages

  • Why there is a difference in a "class" attribute value of html tag when viewed in "Page Source" and using "Inspector", I am refering to new Microsoft site?

    While inspecting the new Microsoft site source, I observed that the "class" attribute value of the "html" tag when seen in Page Source the value given by Tools/Web Developer/Inspect tool. Value with the tool indicates class="en-in js no-flexbox canvas no-touch backgroundsize cssanimations csstransforms csstransforms3d csstransitions fontface video audio svg inlinesvg" while that is given in Page Source is class="en-us no-js"
    The question is why different values are shown?

    Inspector is showing you the source after it's been modified by Javascript and such.
    To see the same thing in the source viewer, press '''Ctrl+A''' to select everything on the page, then right-click the selection and choose '''View Selection Source'''.

  • Text Search skiping HTML tags

    I have a table containing clob column.
    select code, details from search order by code;
    CODE DETAILS
    4 just a <b>test </b>insert
    5 just a <b>test</b> insert
    9 <HTML>just a <i>test</i> insert</HTML>
    10 checking test insert
    I have created a context index and add html tags in the stop list.
    exec ctx_ddl.create_stoplist('mystop', 'BASIC_STOPLIST');
    exec ctx_ddl.add_stopword('mystop', '<b>');
    exec ctx_ddl.add_stopword('mystop', '</b>');
    CREATE INDEX searchi ON search(details)
    INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS
    ('FILTER CTXSYS.AUTO_FILTER SECTION GROUP CTXSYS.AUTO_SECTION_GROUP STOPLIST MYSTOP');
    But when I search 'test insert' it only shows the following rows
    SQL> SELECT score(1), code, details FROM search WHERE CONTAINS(details, 'test insert', 1) > 0 ORDER BY score(1);
    SCORE(1) CODE DETAILS
    5 10 checking test insert
    5 9 <HTML>just a <i>test</i> insert</HTML>
    I would like to define a text index which skips the html keywords and returns all the rows contain the searching phrase

    Since you did not use code tags in your post, most of your html does not show, so it is difficult to tell what html is in your data or what values you set for your stopwords. One problem with stopwords is that, although the word is not indexed, it still expects some word where the stopword was, so searching for "word1 word2" will not find "word1 removed_stopword word2". How about using a procedure_filter as demonstrated below? I only removed a few tags, so you would need to either expand it to include others or searching for starting and ending tags and remove what is inbetween.
    SCOTT@orcl_11g> CREATE TABLE search
      2    (code      NUMBER,
      3       details  CLOB)
      4  /
    Table created.
    SCOTT@orcl_11g> INSERT ALL
      2  INTO search VALUES (4, 'just a <b>test</b> insert')
      3  INTO search VALUES (5, 'just a <i>test</i> insert')
      4  INTO search VALUES (9, '<HTML>just a test insert</HTML>')
      5  INTO search VALUES (10, 'checking test insert')
      6  SELECT * FROM DUAL
      7  /
    4 rows created.
    SCOTT@orcl_11g> CREATE OR REPLACE PROCEDURE myproc
      2    (p_rowid    IN ROWID,
      3       p_in_clob  IN CLOB,
      4       p_out_clob IN OUT NOCOPY CLOB)
      5  AS
      6  BEGIN
      7    p_out_clob := REPLACE (p_in_clob, '<html>', '');
      8    p_out_clob := REPLACE (p_out_clob, '</html>', '');
      9    p_out_clob := REPLACE (p_out_clob, '<HTML>', '');
    10    p_out_clob := REPLACE (p_out_clob, '</HTML>', '');
    11    p_out_clob := REPLACE (p_out_clob, '<b>', '');
    12    p_out_clob := REPLACE (p_out_clob, '</b>', '');
    13    p_out_clob := REPLACE (p_out_clob, '<B>', '');
    14    p_out_clob := REPLACE (p_out_clob, '</B>', '');
    15    p_out_clob := REPLACE (p_out_clob, '<i>', '');
    16    p_out_clob := REPLACE (p_out_clob, '</i>', '');
    17    p_out_clob := REPLACE (p_out_clob, '<I>', '');
    18    p_out_clob := REPLACE (p_out_clob, '</I>', '');
    19  END myproc;
    20  /
    Procedure created.
    SCOTT@orcl_11g> SHOW ERRORS
    No errors.
    SCOTT@orcl_11g> BEGIN
      2    CTX_DDL.CREATE_PREFERENCE ('myfilter', 'PROCEDURE_FILTER');
      3    CTX_DDL.SET_ATTRIBUTE ('myfilter', 'PROCEDURE', 'myproc');
      4    CTX_DDL.SET_ATTRIBUTE ('myfilter', 'ROWID_PARAMETER', 'TRUE');
      5    CTX_DDL.SET_ATTRIBUTE ('myfilter', 'INPUT_TYPE', 'CLOB');
      6    CTX_DDL.SET_ATTRIBUTE ('myfilter', 'OUTPUT_TYPE', 'CLOB');
      7  END;
      8  /
    PL/SQL procedure successfully completed.
    SCOTT@orcl_11g> CREATE INDEX searchi
      2  ON search (details)
      3  INDEXTYPE IS CTXSYS.CONTEXT
      4  PARAMETERS ('FILTER myfilter')
      5  /
    Index created.
    SCOTT@orcl_11g> SELECT token_text FROM dr$searchi$i
      2  /
    TOKEN_TEXT
    CHECKING
    INSERT
    TEST
    3 rows selected.
    SCOTT@orcl_11g> COLUMN details FORMAT A35
    SCOTT@orcl_11g> SELECT score (1), code, details
      2  FROM   search
      3  WHERE  CONTAINS (details, 'test insert', 1) > 0
      4  ORDER  BY score (1)
      5  /
      SCORE(1)       CODE DETAILS
             3          4 just a <b>test</b> insert
             3          5 just a <i>test</i> insert
             3          9 <HTML>just a test insert</HTML>
             3         10 checking test insert
    4 rows selected.
    SCOTT@orcl_11g>

  • HTML tags not displayed when using Data Template

    Hi All...
    I'm developing a BI Publisher report in which one of the columns is a clob data type. I'm using an xsl stylesheet to format the data present in the clob column.
    I've developed the report using data template as the data set. The problem is the clob column which has the HTML tags where not displayed properly...for example
    the tag starting with
    <
    is replaced with
    & lt;
    I did a couple of searches in this forum and in tim's blog but couldn't find a proper solution...
    http://blogs.oracle.com/xmlpublisher/2007/01/formatting_html_with_templates.html
    API and HTML Formated Content
    Re: Problem with text data elements containing escaped HTML codes
    HTML Output from CDATA
    Re: HTML formatted output
    Re: Special characters in CLOB are making report fail
    Re: Formatting of HTML tag problem
    I'm using BI Publisher standalone:Release 10.1.3.2. In one of the threads..
    Re: Special characters in CLOB are making report fail
    I came to know that data template cannot generate proper HTML tags for release 10.1.3.2. Is there any work around way to get the proper HTML tags when data template is used as a data set?
    Thanks in Advance...
    Edited by: user10280715 on Dec 9, 2008 3:13 PM

    Issue could be with the data that is selected in the other environment. It generally happens that the ALV will not give the same results as in the DEV in the other systems.
    Possible errors could be the control break statements in the loop...endloop block. validate the correctness of the control break stmts if any.

  • How to use add image in HTML tag

    Hello frndz
                     i  m working on text chat application in adobe  air.using <mx:html/> tag for dispalying text and  images(smiley).but the font size fo flex is diffrent and html diffrent.i  mean i m using 10 font size but it looks too large on  <mx:html/>  component.is there any way to cast them in same size.
    The second and major problem is that it can not add images to html  tag.when i m giving images like:-private var txt:String="<img  src='src\smily\tongue.gif'/>" .
    it shows nothing but when i give images through http path it shows  images on html componenet.
    can any body explain me the problem.
    Thanks And regards
      Vineet Osho

    thanks Alot René Bühling for  your quick reply.but the link u mentioned ith reply is not working for  me.
    ok can u tell me how can i manage my flex font  size 10 to  <mx:html tag 's font size 10.
    i think there is large difference  between flex font and html font.so please guide me for that.
    Thanks  Alot
    Vineet osho

  • HTML Tags search using TREX

    Hi All,
    Using following 2 documents I had tried to configure TREX to search HLTM tags. After following all the steps when I tired to search I don't get any results.
    1) How to set up web repository and crawling it for indexing.pdf
    2) How-to-guide for searchable HTML tags.pdf
    Any buddies please help?
    Thanks in advance.
    Mahesh

    Hi Suman,
    I am Manu's Colleague. We have the Hierarchy of Objects like this.
    Cubes --> Multiprovider -- > Aggregation level.
    We had two tranport requests one for the Planning objects Including Aggregation levels and the other for data model objects including cubes , DSO's and multiproviders. All deletion Requests.
    We moved the First transport request to production and we checked using Normal Find objectsand found no results for the aggregation Levels.We assumed all the objects were deleted.
    Then we moved the Datamodel transport request to Quasltiy and it failed stating that the Multiproviders are used in Aggregation Level. (this happened in Q)
    Then when we checked the aggreation Level in Planning Modeller we found it in there (this in both Q and P) and not in RSA1 transaction until we used TREX to retreive the result. (This in P as we dont have TREX in D and Q systems)
    This is the issues and beacuse of this we are not able to delete the Data models in the system.
    Thanks for all your previous replies and will be helpful if you have any idea ont his.
    Regards.
    Shafi.

  • Verity search not indexing

    I have recently upgraded to coldfusion 8. I reworked my
    verity searches but the main one of the website times out. I have
    gotten it to complete a couple of times but this time it will not
    complete. I have no clue what to change to make it index. I would
    appreciate any help.
    Thanks

    This is the error "The request has exceeded the allowable
    time limit Tag: cfoutput " It is a verity search of my website
    "www.archives.alabama.gov" there are over 26,000 documents. It has
    always worked fine until the upgrade to v.8 and it worked the first
    several times but now it is timing out with the above message
    everytime. I do not know where that tag is or how to change it.
    That is what I need to know.

  • Html tags removed when #COLUMN_HEADER# is used in column template

    Hi all,
    I'm using APEX 4.0.2, theme 2 Builder Blue.
    I am trying to add html tags to dynamically generated column headings of a dynamic SQL Report.
    When using a standard report template, the headings contain the html tags. However when I want to use one of the vertical lay-outs all html tags are removed. After some research I found out that when the substitution string #COLUMN_HEADER# is used within the column headings part of the template, the html tags are being preserved. They are removed however when the #COLUMN_HEADER# substitution string is used in the column templates part of the template.
    This is easily testable by using for instance "return htf.bold ('COL01')" as dynamic column header.
    Is this a bug or am I overlooking something? Is there another solution maybe to preserve html tags in the column heading?
    Cheers, Erik

    webdynpro appears to use XHTML instead of HTML so the syntax is a bit more limitted and more picky.
    this link explains the difference between the two syntaxes:
    http://reference.sitepoint.com/html/html-xhtml-syntax
    you can test your tags in this validator tool
    http://validator.w3.org
    solved?  have a good week, and holidays.

  • Embedding html in xml tags, when rednering text as html

    Quick question,
    I have a site that reads all content from an external xml.
    The text box that reads this info renders the content as html; does
    anyone know how to go about putting an html tag in an xml tag so
    that flash can read it?
    So would it be possible to do:
    <content>
    " Welcome to the site<br>we are happy to have you
    here<br><img src="logo.jpg"> "
    </content>

    I completed deleted the old way, so I have to recreate this
    from scratch...but here is how I used to be able to do it (which,
    looking at how I do it now since HTML wasn't parsing, this was so
    stupid).
    <chair id="1" price_point="High-end">
    <image>&lt;a
    href='/dsn/catalog/viewproductpage.asp?OwnerID=1&amp;PageID=%7B7A0FB858%2D2184%2D4033%2DB 474%2D2B22D89BBD96%7D'&gt;&lt;img
    src='/images/s/dshe/mini/7/79219.jpg border='0'
    /&gt;&lt;/a&gt;</image>
    <description>&lt;a
    href='/dsn/catalog/viewproductpage.asp?OwnerID=1&amp;PageID=%7B7A0FB858%2D2184%2D4033%2DB 474%2D2B22D89BBD96%7D'&gt;Toni&lt;/a&gt;</description>
    <brand>Kwalu</brand>
    <composition>Kwalu</composition>
    <leg_style>Chippendale</leg_style>
    <overall_style>Transitional</overall_style>
    </chair>
    And now in 1.1, having that, it prints this on screen:
    <a
    href='/dsn/catalog/viewproductpage.asp?OwnerID=1&PageID=%7B7A0FB858%2D2184%2D4033%2DB474% 2D2B22D89BBD96%7D'><img
    src='/images/s/dshe/mini/7/79219.jpg border='0' /></a>
    whereas before, it would be that image and it would link to
    the page specified in the href.
    I'm about to take off from work for the night, but I'll check
    back when I get home if you need anything else from me.
    Thanks for your help, Kin. :)
    Kyle

  • When Exported in PDF reports displays HTML Tags

    Hi All,
    Business Object XI R2
    WebI Report
    I am Reporting on Oracle CLOBs. This CLOB contains Data with HTML Tags, Clearly this CLOB is the combination of DATA formatted in HTML tags.
    When I create a WebI Reports on this (CLOB) WebI displays correctly in the sence data only.
    When I export this report to PDF; The PDF display contains both data with HTML tags as well.
    Is there any solution/workaround or any setting in the BO Server to overcome/solve this issue.
    Appreciated you prompt help.
    Thanks,
    Ashok

    Christine,
    My question is wether Adobe support CLOBs or not.
    I was reported on oracle CLOBs (This CLOBs contains data wrapped in HTML tags), by using Business Object Web Intelligence. This WebI displaying the reports as I desire, there is no problem, in WebI. WebI displaying data with no HTML Tags.
    My problem is that when I export this WebI report, this Adobe is getting HTML tags, why? Since this WebI is not displaying the HTML tags, since adobe does not know wether there is a HTML tags or not.
    Apprecited your prompt reply.
    Thanks,
    Ash

  • Search for strings inside html tags ?

    Is there any way to get Spotlight to find search strings inside html documents? One example is I want to find any file that includes a certain alt=" " string inside an img src tag.
    Spotlight does not seem to include anything inside html tags in it's search, as far as I can tell.
    Am I missing something? Is there something I need to do?
    Thanks for any ideas,
    KarenD
    G5   Mac OS X (10.4.4)  
    G5   Mac OS X (10.3.5)  

    Very strange--it does find text inside the html files, but indeed does not seem to have text that is in the file but only appears within a tag. The solution is EasyFind by Christian Grunenberg:
    http://www.grunenberg.com
    You can do a content search on files in a particular folder, which I recommend since it does a brute force search and can take awhile if it has to search everything.
    Francine
    Schwieder

  • Arabic characters appear as empty squares when using certain HTML tags or font styles

    Only when HW acceleration is on. Arabic characters appear as empty squares when using "italic" or "oblique" font styles or when using &lt;i&gt; or &lt;em&gt; html tags.
    Try this code to replicate the problem
    <pre>
    &lt;p&gt;مشكلة ظهور المربعات الخالية بدل الحروف&lt;/p&gt;
    &lt;p style="font-style: italic;">Italic مشكلة ظهور المربعات الخالية بدل الحروف&lt;/p&gt;
    &lt;p style="font-style: oblique;">Oblique مشكلة ظهور المربعات الخالية بدل الحروف&lt;/p&gt;
    &lt;i&gt;i tag مشكلة ظهور المربعات الخالية بدل الحروف</i> &lt;br&gt; &lt;br&gt;
    &lt;em&gt;em tag مشكلة ظهور المربعات الخالية بدل الحروف &lt;/em&gt;
    </pre>

    After lots of research, I found the problem. The boxes (squares) show up whenever there is a font in the webpage that does not have Arabic within its Unicode range such as Times New Roman Italic or Oblique. Normally, Firefox will pick another font to display the characters but now, a newly introduced feature is interfering.
    To fix the problem without turning off hardware acceleration.
    Go to about:config
    locate: gfx.font_rendering.directwrite.use_gdi_table_loading
    which is True by default in FF4.0 Beta 10, and change it to False.
    This is a bug that has to be fixed.

  • Html tags executed when retrived from database

    Exit
    what happens is the html executes ...........

    It is fairly difficult to answer this question, as:
    HTML is not "executed".
    A database defently does not do anything special to HTML.
    "it is multi processing using code "Huh?
    Please .... dont .... write ... like ... this
    i am also working on forum ...
    but i get these html tags executed ...
    when they are retrived and shown... OK, so you are write some forum software, and in the text area your users are putting HTML tags (<b>I'm bold</b>) and you don't want that to happen?
    A simple solution would be BEFORE you insert this data into your varchar replace all the > with &gt; < with &lt; and & with &amp;
    Message was edited by:
    mlk

  • Enterprise search: error when indexing (CRM 7.0)

    Hello,
    I need your help. We want to use enterprise search in our demo system (CRM 7.0). I use the Web dynpro application ESH_ADMIN_UI_COMPONENT. However, when indexing we get the following errors:
    Data of NW authorization objects could not be indexed
    The implementing class does not support the iterator by time interface
    Indexing ended with error
    Indexing of complete objects from type USER_AUTHORITY has returned 5.508 NameServer error: no servers found IndexId:esh:cr7
    Multi-index call of index ESH:CR7100CR7100USER_AUTHORITY~USER_AUTHORITY has returned 5.508: NameServer error: no servers found IndexId:esh:cr7
    Indexing ended with error
    best regards,
    Wim Olieman

    Hi Pieter,
    Thanks. But my rule policy is simple as stated below.
    If
    E-Mail orginal recipient contain "contactusatabc.com"
    Then
    Route EMail ( Organizational Object = Assistant Manager/Executvie )  and
    Create Service Request ( Process type = ZR )
    Service manager profile ZSRQMROUTING is created and assigned with below services.
    1     SVC_PARAMS
    5     FG_WEBFORM
    7     FG_EMAIL
    10     UT_WORKITEMTEXT
    50     RE_RULE_EXEC
    70     AH_DEF_ROUTING
    800     UT_ERMS_REPLICAT
    And RE_RULE_EXEC assigned with
    DEF_ROUTING     O:50000008
    LOG_LEVEL     0
    POLICY     ZSTC
    Whenever a mail sent from SBWP, recipient as contactusatabc.com" and receiver type is Business Object.
    After mail has been sent and ERMS processing log shows sender id as blank ( even my user id maintianed email address in SU01)  in tcode CRM_ERMS_LOGGING
    compiled Rule: <or><not_contain case="" multivalue=""> <xpath provider="CL_CRM_E
    RMS_ADD2FB_DOCUMENT" accessor=""><constant value="/parts/SENDER_ADDRESS/text()"/
    </xpath> <constant value="abcatannon.com"/></not_contain></or>
    path address:/parts/SENDER_ADDRESS/text()
    Kindly advise me incase of any missing configuration.
    Thanks
    Shan

  • Prob when there is any HTML tag inside netui:label tag

    Hi,
    When i am displaying content in a script directly and if some HTML tag is there in the content like <b> or <i> tag then that particular property is reflected in the displayed content
    Eg: In code: <%=headline %>
    Headine=This is a <b><i>beautiful </i></b>flower
    Output: This is beautiful flower
    When I displayed same headline using <netui:label>
    Eg: In code: <netui:label value=”<%=headline %> ”>
    Headine=This is a <b><i>beautiful </i></b>flower
    Output: This is a <b><i>beautiful </i></b>flower
    So is there any workaround by which I can get same output(as for normal script) with <netui:label> tag
    Thanks

    Hi,
    When i am displaying content in a script directly and if some HTML tag is there in the content like <b> or <i> tag then that particular property is reflected in the displayed content
    Eg: In code: <%=headline %>
    Headine=This is a <b><i>beautiful </i></b>flower
    Output: This is beautiful flower
    When I displayed same headline using <netui:label>
    Eg: In code: <netui:label value=”<%=headline %> ”>
    Headine=This is a <b><i>beautiful </i></b>flower
    Output: This is a <b><i>beautiful </i></b>flower
    So is there any workaround by which I can get same output(as for normal script) with <netui:label> tag
    Thanks

Maybe you are looking for

  • Can give another developer access to my MUSE documents?

    Hi, I made several products in MUSE lately as a kind of prototyping for testing and descionmaking at a publishing house in Norway. Now the website design has to be mapped to a publishing system for a product line. Ofcourse I can make the HTML export

  • 4.6 & 4.5 OS removes sheets-to-go

    Folks, has anyone gotten info from Blackberry about how 4.6 & 4.5 removes sheets-to-go? It downgrades your docs-to-go application or removes it, because the version number with sheets to go is different that what is originally installed. This is a se

  • How condition divide in to parts i wnats vat4%and cst 1% but at this time

    hi i am doing RG23D process in which take cont ion JIN6 condition type in which come vat+cst 5% and i want to divide it in two like vat 4% and cst1% . what is procedure to dividend it and it shoe in invoice in two different record  .and in invoice sh

  • Set classpath in linux

    if u dont mind plz explain the steps of how to set java classpath in linux.?

  • Logitech webcam works in everything but ffmpeg

    It's a Logitech QuickCam Express $ lsusb Bus 005 Device 002: ID 046d:0870 Logitech, Inc. QuickCam Express and it works fine in cheese and vlc, but with ffmpeg, I get: $ ffmpeg -f video4linux2 -i /dev/video1 test.avi [video4linux2 @ 0x96513a0] Cannot