OTI index with documents

Hi,
Could you please guide me whether the following approch is right way to implement my requirment, I'm working for one of the fuzzy search requiremnet, My table contains 100 million rows with varchar2 datatype, for this i have been created the following steps,
1. Convert the table data in to XML documents with help of following pl/sql procedure
CREATE OR REPLACE PACKAGE Chrdscr IS
p_dir_name varchar2(13) := '/opt/app/log/';
procedure OpenXmlFile(p_dir_name varchar2);
END Chrdscr;
CREATE OR REPLACE PACKAGE Chrdscr IS
p_dir_name varchar2(13) := '/opt/app/log/';
procedure OpenXmlFile(p_dir_name varchar2);
END Chrdscr;
create or replace package body Chrdscr is
v_FILENAME varchar2(30);
f_XML_FILE UTL_FILE.file_type;
procedure OpenXmlFile(p_dir_name varchar2) is
v_record_data varchar2(4000) := null;
v_DSCR varchar2(4000) := null;
cursor orders_cursor is
select t.dscr
from CHR_DSCR_T t;
begin
--v_FILENAME := TO_CHAR(SYSDATE, 'DDMMYYYYHH24MI') || '.xml';
v_FILENAME := 'chrdscr.xml';
f_XML_FILE := UTL_FILE.fopen('DATA_PUMP_DIR', v_FILENAME, 'W');
v_RECORD_DATA := '<?xml version="1.0" encoding="UTF-8"?>';
UTL_FILE.put_line(f_XML_FILE, v_RECORD_DATA);
open orders_cursor;
loop
fetch orders_cursor
into v_DSCR;
EXIT WHEN orders_cursor%NOTFOUND;
UTL_FILE.put_line(f_XML_FILE,
' <DSCR>' || v_DSCR || '</DSCR>');
end loop;
close orders_cursor;
UTL_FILE.FCLOSE(f_XML_FILE);
EXCEPTION
WHEN UTL_FILE.INTERNAL_ERROR THEN
raise_application_error(-20500,
'Cannot open file :' || v_FILENAME ||
', internal error; code:' || sqlcode ||
',message:' || sqlerrm);
WHEN UTL_FILE.INVALID_OPERATION THEN
raise_application_error(-20501,
'Cannot open file :' || v_FILENAME ||
', invalid operation; code:' || sqlcode ||
',message:' || sqlerrm);
WHEN UTL_FILE.INVALID_PATH THEN
raise_application_error(-20502,
'Cannot open file :' || v_FILENAME ||
', invalid path; code:' || sqlcode ||
',message:' || sqlerrm);
WHEN UTL_FILE.WRITE_ERROR THEN
raise_application_error(-20503,
'Cannot write to file :' || v_FILENAME ||
', write error; code:' || sqlcode ||
',message:' || sqlerrm);
end;
end Chrdscr;
PL/SQL procedure successfully completed.
2. The XML document created in the directory, after that i have created one table
CREATE TABLE testtab
(id NUMBER,
     docs VARCHAR2 (30))
Table created.
INSERT INTO testtab     VALUES (1, 'chrdscr.xml')
1 row created.
3. Context type index also created for the XML document
CREATE INDEX otiind ON testtab (docs)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('DATASTORE otipref')
Index created.
SQL> SELECT token_text FROM dr$otiind$i
2 ;
TOKEN_TEXT
1.0
1888
13654
125678.6
888
999999
DATA
DIRECT
ENCODING
UTF
VERSION
TOKEN_TEXT
XML
12 rows selected.
5. then i executed the follwing step
begin
chrdscr.OpenXmlFile('opt/app/log/');
end;
SELECT id FROM mytab
2 WHERE CONTAINS (docs, 'data') > 0
3 /
ID
1
Could you please advice me whether this approch will satify the fuzzy search requirment? your help is very much helpful to implement this.

I don't know why you insist on adding the extra unnecessary step of xml. What do you expect to gain by using xml? There are various ways to create and index your xml. The example below is just one of many. You can find other methods and examples in the online documentation. As you can see below, there is nothing gained by adding the xml. If you want sections, you can create a multi_column_datastore without the xml, but still have your separate columns to select from.
SCOTT@orcl_11gR2> CREATE TABLE chr_dscr_t
  2    (CHR_DSCR_ID  NUMBER           NOT NULL,
  3       CHR_ID          NUMBER           NOT NULL,
  4       LANG_ID      NUMBER           NOT NULL,
  5       DSCR_ID      NUMBER           NOT NULL,
  6       DSCR          VARCHAR2(4000 CHAR),
  7       TRANS_ST     VARCHAR2(1 CHAR)      NOT NULL,
  8       CRTD_BY      VARCHAR2(50 CHAR)      NOT NULL,
  9       CRTD_DTTM    DATE           NOT NULL,
10       UPD_BY          VARCHAR2(50 CHAR)      NOT NULL,
11       UPD_DTTM     DATE           NOT NULL,
12       LCK_NUM      NUMBER           NOT NULL)
13  /
Table created.
SCOTT@orcl_11gR2> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('chr_dscr_wordlist', 'BASIC_WORDLIST');
  3    CTX_DDL.SET_ATTRIBUTE ('chr_dscr_wordlist', 'FUZZY_MATCH', 'AUTO');
  4    CTX_DDL.SET_ATTRIBUTE ('chr_dscr_wordlist', 'FUZZY_SCORE', 0);
  5    CTX_DDL.SET_ATTRIBUTE ('chr_dscr_wordlist', 'FUZZY_NUMRESULTS', 5000);
  6  END;
  7  /
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> INSERT INTO chr_dscr_t VALUES
  2    (1, 2, 3, 4, 'This is a test record.', 'a', 'b', SYSDATE, 'c', SYSDATE, 5)
  3  /
1 row created.
SCOTT@orcl_11gR2> INSERT INTO chr_dscr_t VALUES
  2    (2, 3, 4, 5, 'tests tests tests', 'c', 'd', SYSDATE, 'e', SYSDATE, 6)
  3  /
1 row created.
SCOTT@orcl_11gR2> COMMIT
  2  /
Commit complete.
SCOTT@orcl_11gR2> CREATE TABLE testtab
  2    (id   NUMBER,
  3       docs CLOB)
  4  /
Table created.
SCOTT@orcl_11gR2> INSERT INTO testtab (id, docs)
  2  SELECT chr_dscr_id,
  3           DBMS_XMLGEN.GETXML
  4             ('SELECT *
  5            FROM   chr_dscr_t
  6            WHERE  chr_dscr_id = ' || chr_dscr_id)
  7  FROM   chr_dscr_t
  8  /
2 rows created.
SCOTT@orcl_11gR2> SELECT * FROM testtab
  2  /
        ID
DOCS
         1
<?xml version="1.0"?>
<ROWSET>
<ROW>
  <CHR_DSCR_ID>1</CHR_DSCR_ID>
  <CHR_ID>2</CHR_ID>
  <LANG_ID>3</LANG_ID>
  <DSCR_ID>4</DSCR_ID>
  <DSCR>This is a test record.</DSCR>
  <TRANS_ST>a</TRANS_ST>
  <CRTD_BY>b</CRTD_BY>
  <CRTD_DTTM>21-DEC-10</CRTD_DTTM>
  <UPD_BY>c</UPD_BY>
  <UPD_DTTM>21-DEC-10</UPD_DTTM>
  <LCK_NUM>5</LCK_NUM>
</ROW>
</ROWSET>
         2
<?xml version="1.0"?>
<ROWSET>
<ROW>
  <CHR_DSCR_ID>2</CHR_DSCR_ID>
  <CHR_ID>3</CHR_ID>
  <LANG_ID>4</LANG_ID>
  <DSCR_ID>5</DSCR_ID>
  <DSCR>tests tests tests</DSCR>
  <TRANS_ST>c</TRANS_ST>
  <CRTD_BY>d</CRTD_BY>
  <CRTD_DTTM>21-DEC-10</CRTD_DTTM>
  <UPD_BY>e</UPD_BY>
  <UPD_DTTM>21-DEC-10</UPD_DTTM>
  <LCK_NUM>6</LCK_NUM>
</ROW>
</ROWSET>
2 rows selected.
SCOTT@orcl_11gR2> CREATE INDEX chr_dscr_t_dscr_idx
  2  ON testtab (docs)
  3  INDEXTYPE IS CTXSYS.CONTEXT
  4  PARAMETERS
  5    ('WORDLIST     chr_dscr_wordlist
  6        SYNC          (ON COMMIT)
  7        SECTION GROUP     CTXSYS.AUTO_SECTION_GROUP')
  8  /
Index created.
SCOTT@orcl_11gR2> VARIABLE criteria VARCHAR2(100)
SCOTT@orcl_11gR2> VARIABLE page NUMBER
SCOTT@orcl_11gR2> EXEC :criteria := 'tests'
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> EXEC :page := 1
PL/SQL procedure successfully completed.
SCOTT@orcl_11gR2> COLUMN criteria_in_context FORMAT A45 WORD_WRAPPED
SCOTT@orcl_11gR2> SELECT rank, criteria_in_context
  2  FROM   (SELECT SCORE (1) rank,
  3                CTX_DOC.SNIPPET
  4                  ('chr_dscr_t_dscr_idx',
  5                   ROWID,
  6                   'FUZZY (' || :criteria || ') WITHIN dscr')
  7                  AS criteria_in_context,
  8                ROW_NUMBER () OVER (ORDER BY SCORE (1) DESC) rn
  9            FROM   testtab
10            WHERE  CONTAINS (docs, 'FUZZY (' || :criteria || ') WITHIN dscr', 1) > 0
11            ORDER  BY SCORE (1) DESC)
12  WHERE  rn BETWEEN ((:page - 1) * 20) + 1 AND :page * 20
13  /
      RANK CRITERIA_IN_CONTEXT
        12 2
           3
           4
           5
           <b>tests</b> <b>tests</b> <b>tests</b>
           c
           d
           21-DEC-10
           e
           21-DEC
         4 1
           2
           3
           4
           This is a <b>test</b> record.
           a
           b
           21-DEC-10
           c
           21
2 rows selected.
SCOTT@orcl_11gR2>

Similar Messages

  • Indexing pdf documents with indextype ctxsys.context

    I have an application that stores the contents of uploaded documents in BLOB data fields. We provide web pages which search through the uploaded documents based on text entered by the user. We currently upload both MS Word .doc and HTML documents. For the HTML documents, which are made available to the public, we index the table with the following procedure:
    CREATE OR REPLACE procedure WEBADMIN.index_redacted_docs is
    begin
    declare
    cur           PLS_INTEGER;
    exec_int           PLS_INTEGER;
    counter          number;
    begin
    select count(*) into counter
    from user_indexes
    where index_name = 'DOCS_CTX_REDACTED_IDX';
    if (counter = 1) then
    ctx_ddl.sync_index (idx_name => 'docs_ctx_redacted_idx');
    else
    cur := DBMS_SQL.OPEN_CURSOR;
    DBMS_SQL.PARSE (cur, 'create index docs_ctx_redacted_idx on documents_ctx_redacted (blob_content) ' ||
         'indextype is ctxsys.context parameters (''filter ctxsys.null_filter'')', DBMS_SQL.NATIVE);
    exec_int := DBMS_SQL.EXECUTE (cur);
    DBMS_SQL.CLOSE_CURSOR (cur);
    end if;
    exception
    when others then
         DBMS_SQL.CLOSE_CURSOR (cur);
         raise;
    end;
    end;
    We run this process after every uploaded HTML file and are able to locate documents which contain any text entered by the user. The portion of the command we use to query the documents_ctx_redacted table (blob_content is the BLOB field in this table) is (using "corn" as a sample query text):
    WHERE (contains (BLOB_CONTENT, 'corn', 10) > 0)
    Our customer is now asking that PDF files be uploaded as well and searched in the same manner. After the PDF files are uploaded (into the same table as the HTML files) and the index updated, with the above command ctx_ddl.sync_index (idx_name => 'docs_ctx_redacted_idx'), since the index already exists, we cannot get any rows returned with the above WHERE (contains .... ) clause. We know the text we're looking for (such as "corn") is contained in the PDF files, but the search does not find them, although it finds the HTML documents just fine. I've also tried dropping the index entirely and recreating it, but that also only finds the HTML documents but not the PDF's.
    What are we doing incorrectly with the PDF files? Thanks.

    We are using Oracle version 10.2 . I looked at the relevant Oracle Text documentation for that version, and the best I could glean was that PDF files are supported by the filter ctxsys.auto_filter (rather than null_filter) when creating the index. I dropped the existing null_filter index and created a new index with the auto_filter parameter, but the end result was the same. I still get no PDF records found when issuing the command (using "corn" as the text query)
    WHERE (contains (BLOB_CONTENT, 'corn', 10) > 0)
    although the HTML records show up fine again.

  • TREX does not index all documents fro document class SOLARGNSRC

    Hi all,
    I've setup a connection between a TREX server (which is also used by a portal system) and Solution Manager. I've gone through the settings in SAP Note 750623 and I was able to create a index, the queue and even bypass the basic authentication in the preprocessing for document class SOLARGNSRC
    According to everything I've read, this should be enough to be able to index all documents in Solution Manager and find them with Full Tex Search. But this doesn't work. From the 50.000 documents in de class (35.000 English and 15.000 German) only 6300 documents are passed to TREX. Most of them are German HTML links to help.sap.com. I can see in the trace files that the URL's to some documents in the content server are passed to TREX. I can open dthe document using the links but I cannot find the documents using the full text search which I think means that nothing was really indexed.
    Questions:
    1. Has anyone been able to succesfully index Solution Manager documents for full text search purposes?
    2. Why are only 10% of the documents passed to TREX? Is there a specific setting for this?
    3. Why does TREX use the Content Server HTTP links to index the documents and not the RFC connection?
    Cheers
    Marcel Rabe

    Yep, all lights green
    SSR maintained. DRFUZZY as Search Engine (I have not tried Verity)
    Return Code? Interesting, I don't get an return code when I trigger the Index/Deindex. I just see the hourglass for about 5 minutes (when I run it in the foreground) and after that its back to the way it was. No messages. Nothing appears in the application logs as far as I can see.
    The program RSTIRIDX  is scheduled in the background and runs every hour minutes for about 10 seconds without an error.
    TREX Version 7.00.39.00
    Five languages actived, including German and English. In SKPR07 under Indexed documents I can see that 300 German and 6000 English documents have status indexed.
    No proxy server. Systems sit within the same network segment
    Thnx for your help. I posted a message with SAP as well as this seems strange to me.
    Marcel

  • How to link a full text index with catalog in a PDF file ?

    Good morning and thank you for your help.
    I already create some PDF files on a folder (with hypertext links between us) and I use the command "Tools\Document processing\Full Text Index with Catalog" to create an index; at this time everything works properly.
    Now I want to link this Index to my first PDF file in order to use automatically this index on an advance search in this file.
    I hope that someone may answer me!
    Thank you.

    Now I want to link this Index to my first PDF file in order to use automatically this index on an advance search in this file.
    In the properties of the document:

  • Crawler not indexing the documents from repository

    Hi All,
    I've installed TREX 6.1.09 with EP6.(version=6.0.9.0.0 and
    KnowledgeManagementCollaboration 6.0.9.0.0 (NW04 SPS09) on my Windows 2003 platform.
    NameServer port is configured in J2EE visual Administrator and also all lights are green in TREX Monitor in KM -> Monitoring.
    Now, i created an index and assigned my newely uploaded Portal.par as a Datasource to it. One more thing, i logged into the portal as as Administrator and all this has done successfully.
    But when i check in the TREX Monitor, i see both index and its associated queue are idle and also indexing of document is not taking place.
    When i checked ApplicationLog it gives me the XCrawler error like below
    "Error  4/25/06 7:33:03 PM  XCrawlerService  Failed to create crawler task index2_FileNet - com.sapportals.wcm.service.xcrawler.XCrawlerException: The SQL statement "SELECT "XCRW_TASK_INDEX" FROM "KMC_XCRW_TASKS" WHERE "XCRW_TASK_ID" = ?" contains the semantics error[s]: type check error "
    Sir, i've already done all the work around given in the Discussion Forum but could not got my problem solved.
    Plz. guide me where i'm wrong in configuration ASAP.
    Lot of Thanks in advance

    hi Mridul,
    This is a quite an exception that occurs in sp9. I experienced the same exception and so does most others. The solution is to upgrade to sp14.
    This has been suggested by Mr.Noufal in his weblog. Read it
    An installation that kept me searching
    Also the thread he refers in his blog
    Re: XCrawlerException with index
    So the thing you need to do is apply the patch and upgrade it straight away as it will work afterwards.
    Regards,
    Ganesh N
    Was the answer useful to you ?
    Message was edited by: Ganesh Natarajan

  • Indexing Word document in UTF8 database

    Hello,
    have anybody experience with database created with character set UTF8
    (Unicode) and indexing formatted documents like MS Word, Powerpoint, Adobe
    Acrobat etc.?
    When I'm indexing Word document in non UTF8 database it's OK, in UTF8
    database indexing runs without error but searching does not work (in
    'DR$<index_name>I' table are unreadable strings - tokens).
    Is there any possibility in indexing to specify filter preferences
    INSO_FILTER and CHARSET_FILTER together?
    Thanks!
    Best Regards
    Jiri Salvet
    [email protected]
    null

    I'm using Oracle 8.1.6 on Windows2000. If You have any information about problem with INSO filter on that platform, please send it me.

  • Getting SES to index unknown document types in UCM?

    Hi
    We have a SES set up to crawl/index a Oracle UCM, and when configuring the crawler source we can define which document types it should crawl/index. It makes sense that SES only knows how to index the content of some known document types, but why can SES not index any document type without looking inside the actual document? I mean, it is possible to upload any document type in UCM and give it UCM specific metadata like title and so on, and it should be easy for SES to index these UCM metadata for unknown document types also.
    How can I get SES to crawl/index all unknown document types?
    Thank you
    Søren

    I've checked with our UCM connector expert, and he says all document types are passed from UCM to SES.
    So hopefully it should be just a matter of editing the crawler.dat file (found in $ORACLE_HOME/search/data/config)
    You would need to add a MIMEINCLUDE line for the mimetype of the documents you want included - if you're not sure what the mimetype is for any document, you can usually see it in the crawler log file.
    You'd also need to check that the document suffix is not in the list:
    # default file name suffix exclusion list
    RX_BOUNDARY (?i:(?:\.jar)|(?:\.bmp)|(?:\.war)|(?:\.ear)|(?:\.mpg)|(?:\.wmv)|(?:\.mpeg)|(?:\.scm)|(?:\.iso)|(?:\.dmp)|(?:\.dll)|(?:\.cab)|(?:\.so)|(?:\.avi)|(?:\.wav)|(?:\.mp3)|(?:\.wma)|(?:\.bin)|(?:\.exe)|(?:\.iso)|(?:\.tar)|(?:\.png))$You can edit this list to remove any suffixes that you do want included.

  • Can Azure Search support indexing of documents (pdf, doc etc)

    Hi,
    I want to implement the azure search service indexing of documents (pdf, docs etc). is it possible ? if yes, then how ?
    Thanks

    Hi Shib,
    I wanted to respond with a few ideas based on your question.  As of today, we do not have a document cracker for Azure Search to allow you to index the content from files such as the ones you suggested.  This is a pretty highly requested feature
    (http://feedback.azure.com/forums/263029-azure-search/suggestions/6328662-blob-indexing-w-file-crackers), but we have not too date been able
    to prioritize this. 
    In the meantime, somethings you might consider looking at are iFilter or Apache Tika.  These are both great options that would allow you to programmatically extract the text from these files.  Based on the extracted text you could then post the
    content to Azure Search.  I personally think this example on CodeProject is a pretty good starting point if you were to consider using iFilter:
    http://www.codeproject.com/Articles/13391/Using-IFilter-in-C
    I hope that helps.
    Liam
    Sr. Program Manager, SQL Azure Strategy -
    Blog

  • Change the Index from documents to All

    Hi all
    I created an index in the index administration only for documents(Items to Index=Documents).
    This was a long time ago...
    Now we have the problem that our search engine only shows documents to this index.
    OK its how it works.
    But my question is: How can I change the "Items to index" from documents to All?
    Is there a way, because its nearly impossible to delete the index a create a new one, because
    we have a lot of documents indexed.
    Thanks in advance
    Steve

    Hi Steve,
    Can you try the following -
    1. Create a new Index with the required properties (items to index set to "All") and select the same data source as done in the old index.
    2. Provide schedule for the index.
    3. Re-index it one time.
    4. When everything is done then you can remove the old index and use the new one.
    5. Modify your Search Options Set accordingly.
    Note: There should be sufficient space in the TREX Server to accommodate both the indexes for some time.
    Regards,
    Sudip

  • Trex: Indexing a document link.

    Hi everybody,
    I am trying to index some document links in my portal but TREX seems not to index. I have a folder with some documents and another folder with links to this documents. The documents have some properties and the links have a diferent type of properties. What I need is to create a taxonomy with the links classified using some of theri properties (not the document properties). When I index the link folder, the crawler return all the files but when I watch the index queue everything is at 0, no file is being proccessed.
    Thanks in advance.
    Gregori Coll Ingles.

    Gregori,
    what is the start condition of your indexes?
    http://help.sap.com/saphelp_nw2004s/helpdata/en/71/090e41bb0ff023e10000000a155106/frameset.htm
    James

  • Content has been indexed with Info only. Resubmit should only be performed

    Hi All,
    Im using the Oracle Content Server (OCS) , When im trying to checkin new document then i get the below mentioned error message can any one plz tell me that what is the problem.
    Error Message:_
    Text conversion of the file failed.
    Content has been indexed with Info only. Resubmit should only be performed if the problem has been resolved.
    Text conversion of the file '//awusrp04/PortalStg/oracle/inetucmstg/weblayout/groups/public/@enterprise/@hr/documents/document/s_013020.pdf' failed.
    **Content has been indexed with Info only. Resubmit should only be performed if the problem has been resolved.      **

    Hello Experts,
    I am Facing the Same Issue, anybody know the solution for the same?
    Thanks in Advance.

  • What causes the "Can not index this document as its modified version is open in Acrobat" index error

    (when building a full-text index with Acrobat Catalog, and when the PDF document referenced after each of these error messages in the catalog building log file is not open in neither Acrobat Standard nor in Acrobat Reader)

    If you print to the Adobe PDF printer you are creating a PS file that is then automatically loaded into Distiller for conversion to PDF. Printing to a different PS file should not make a difference, unless the PS driver is substantially different. You can also print to file with the Aodbe PDF printer and then convert with Distiller (eliminating the automatic Distiller step). If you are printing to a network location, that may be the issue. Acrobat has long had problems with some network connections. To check this out, you might print locally and then move the file to the network drive.
    In terms of posting a file, you have to put it on a separate server and then post the link.

  • Problems indexing 30M documents

    Platform: Sun 4800, 12 CPU, Solaris 9, 48 Gb RAM
    Oracle Version: 10.1.04
    Database Character Set: UTF-8
    SGA MAX SIZE: 24 Gb
    hi,
    Our database contains a mix of image files and plain text documents in 30 different languages (approximately 30 million rows). When we try to index the documents (using IGNORE in the format column to skip the rows containing images), the indexing either bombs out or hangs indefinitely.
    When I first started working on the problem, there were rows in the ctx_user_index_errors table which didn't really give any good indication of what the problem was. I created a new table containing just these rows and was able to index them with no problems using the same set of preferences and the same indexing script. At that time, they were using just the BASIC_LEXER.
    We created a MULTI_LEXER preference and added sub-lexers when lexers existed for the specified language, using the BASIC_LEXER as the default. When we tried to create the index using a parallel setting of 6, the indexing failed after 2 hours, and we got the following error codes: ORA-29855, ORA-20000, DRG-50853, DRG-50857, ORA-01002, and ORA-06512. We then tried to create the index without parallel slaves, and it failed after 3 hours with an end of file on communication channel error.
    Thinking perhaps that it was the MULTI_LEXER that was causing the problem (because the data is converted to UTF-8 by an external program, and the character set and language ID is not always 100% accurate), we tried to create the index using just the BASIC_LEXER (knowing that we wouldn't get good query results on our CJK data). We set the parallel slaves to 6, and it ran for more than 24 hours, with each slave indexing about 4 million documents (according to the logs) before just hanging. The index state in ctx_user_indexes is POPULATE, and in user_indexes is INPROGRESS. There were three sessions active, 2 locked, and 1 blocking. When we were finally able to ctl-C out of the create index command, SQL*Plus core dumped. It takes hours to drop the index as well.
    We're at a loss to figure out what to try next. This database has been offline for about a week now, and this is becoming critical. In my experience, once the index gets hung in POPULATE, there's no way to get it out other than dropping and recreating the index. I know that Text should be able to handle this volume of data, and the machine is certainly capable of handling the load. It could be that the MULTI_LEXER is choking on improperly identified languages, or that there are errors in the UTF-8 conversion, but it also has problems when we use BASIC_LEXER. It could be a problem indexing in parallel, but it also dies when we don't use parallel. We did get errors early on that the parallel query server died unexpectedly, but we increased the PARALLEL_EXECUTION_MESSAGE_SIZE to 65536, and that stopped the parallel errors (and got us to the point of failure quicker).
    Any help you can provide would be greatly appreciated.
    thanks,
    Tarisa.

    I'm working with the OP on this. Here is the table definition and
    the index creation with all the multi_lexer prefs. The table
    is hash partitioned, and we know the index cannot be
    local because of this, so it is a global domain index.
    Perhaps of interest, we have changed PARALLEL_EXECUTION_MESSAGE_SIZE
    from the default up to 32K. This made a huge difference in indexing speed, but
    so far has just helped us get to the point of failure faster.
    CREATE TABLE m (
    DOC_ID NUMBER,
    CID NUMBER,
    DATA CLOB,
    TYPE_ID NUMBER(10),
    FMT VARCHAR2(10),
    ISO_LANG CHAR(3)
    LOB (data) store as meta_lob_segment
    ( ENABLE STORAGE IN ROW
    PCTVERSION 0
    NOCACHE
    NOLOGGING
    STORAGE (INITIAL 32K NEXT 32K)
    CHUNK 16K )
    PARTITION BY HASH ( doc_id )
    PARTITIONS 6
    STORE IN (ts1, ts2, ts3, ts4, ts5, ts6),
    pctfree 20
    initrans 12
    maxtrans 255
    tablespace ts
    ALTER TABLE m
    ADD (CONSTRAINT pk_m_c PRIMARY KEY (doc_id, cid)
    USING index
    pctfree 20
    initrans 12
    maxtrans 255
    tablespace ts
    nologging )
    BEGIN
    ctx_ddl.create_preference('english_lexer', 'basic_lexer');
    ctx_ddl.set_attribute('english_lexer','index_themes','false');
    ctx_ddl.set_attribute('english_lexer','index_text','true');
    ctx_ddl.create_preference('japanese_lexer','japanese_lexer');
    ctx_ddl.create_preference('chinese_lexer','chinese_lexer');
    ctx_ddl.create_preference('korean_lexer','korean_morph_lexer');
    ctx_ddl.create_preference('german_lexer','basic_lexer');
    ctx_ddl.set_attribute('german_lexer','index_themes','false');
    ctx_ddl.set_attribute('german_lexer','index_text','true');
    ctx_ddl.set_attribute('german_lexer','composite','german');
    ctx_ddl.set_attribute('german_lexer','mixed_case','yes');
    ctx_ddl.set_attribute('german_lexer','alternate_spelling','german');
    ctx_ddl.create_preference('french_lexer','basic_lexer');
    ctx_ddl.set_attribute('french_lexer','index_text','true');
    ctx_ddl.set_attribute('french_lexer','index_themes','false');
    ctx_ddl.set_attribute('french_lexer','base_letter','yes');
    ctx_ddl.create_preference('spanish_lexer','basic_lexer');
    ctx_ddl.set_attribute('spanish_lexer','index_text','true');
    ctx_ddl.set_attribute('spanish_lexer','index_themes','false');
    ctx_ddl.set_attribute('spanish_lexer','base_letter','yes');
    ctx_ddl.create_preferences('global_lexer','multi_lexer');
    ctx_ddl.add_sub_lexer('global_lexer','default','english_lexer');
    ctx_ddl.add_sub_lexer('global_lexer','english','english_lexer','eng');
    ctx_ddl.add_sub_lexer('global_lexer','gernan','german_lexer','ger');
    ctx_ddl.add_sub_lexer('global_lexer','french','french_lexer','fra');
    ctx_ddl.add_sub_lexer('global_lexer','spanish','spanish_lexer','spa');
    ctx_ddl.add_sub_lexer('global_lexer','japanese','japanese_lexer','jpn');
    ctx_ddl.add_sub_lexer('global_lexer','korean','korean_lexer','kor');
    ctx_ddl.add_sub_lexer('global_lexer','simplified chinese','chinese_lexer','zho');
    ctx_ddl.add_sub_lexer('global_lexer','traditional chinese','chinese_lexer');
    END;
    BEGIN
    ctx_output.start_log('m_ctx_data.log');
    END;
    CREATE INDEX m_ctx_data ON m (data)
    INDEXTYPE IS ctxsys.context
    PARAMETERS ('memory 1G
    lexer global_lexer
    format column fmt
    language column iso_lang
    sync (every "sysdate+1")' )
    PARALLEL 6
    BEGIN
    ctx_output.end_log();
    END;
    /

  • Error when creating index with parallel option on very large table

    I am getting a
    "7:15:52 AM ORA-00600: internal error code, arguments: [kxfqupp_bad_cvl], [7940], [6], [0], [], [], [], []"
    error when creating an index with parallel option. Which is strange because this has not been a problem until now. We just hit 60 million rows in a 45 column table, and I wonder if we've hit a bug.
    Version 10.2.0.4
    O/S Linux
    As a test I removed the parallel option and several of the indexes were created with no problem, but many still threw the same error... Strange. Do I need a patch update of some kind?

    This is most certainly a bug.
    From metalink it looks like bug 4695511 - fixed in 10.2.0.4.1

  • Can we associate index with foreign key?

    hello
    i have searched and could not find the answer to the above;
    i have a foreign key constraint on the tables; i added an index on that column as well;
    however when i query the all_constraints, under index_name for this foreign constraint there is nothing; only when i have PK/UK i case see indexes associated with them;
    will then oracle still associate my index with the FK constrained column? or i need to excplicity associate it with the foreign key column? if so, how to do that?
    thx
    rgds

    Hi,
    UserMB wrote:
    i have a foreign key constraint on the tables; i added an index on that column as well;It helps if you give a specific example, such as:
    "I have a foreign key constraint, where emp.deptno references dept.deptno. (Deptno is the primary key of dept.) I created an index called emp_deptno_idx on emp.deptno as well."
    however when i query the all_constraints, under index_name for this foreign constraint there is nothing; only when i have PK/UK i case see indexes associated with them;Not all indexes are associated with a constraint. In the example above, you wouldn't expect to see anything about the index emp_demptno_idx in all_constraints or in all_cons_columns.
    will then oracle still associate my index with the FK constrained column? or i need to excplicity associate it with the foreign key column? if so, how to do that?In the situation above, Oracle will still use the index when the optimizer thinks it will help. You don't have to do anything else.

Maybe you are looking for

  • How do I give a copy of CS5 installed on my machine back to my employer?

    I'm leaving a company that purchased CS5 for me, that was installed in on my personal computer, which is what I was supposed to do in order to work for them. How do I give them back the software they paid for? Is it sufficient to uninstall it from my

  • NullPointerException during failover on 5.7MP2

    I just saw this error (below) today during the failover of the primary engine to the backup engine. This message was in the backup engine's log file (soon to be primary engine) and occurred right after the "[SSM] Primary server is not responding. Att

  • Opening a binary data file with a local java application from a weblink...

    I don't know if this is the right forum for my query, but I would like to be able to click on a data file from a download link on a webpage and be able to have the file "open" into the java application that processes the particular file (as opposed t

  • Different list structure

    I've got a list of e-mails  which will be intput (like: First Name([email protected]),  Second Name([email protected]), and so on) I would like to convert them to an array and then return them to the user in a different form: [email protected], [emai

  • Removing DV artifacts

    what do you recommend to remove artifacts from dv compression (AVCHD) - something in premiere, after effects or a 3rd party plug in?