Problem indexing with TREX

Hi guys,
We have some problems with our TREX system. This´s our environment:
- SAP Netweaver 7 - SP17 (SUN Solaris)
- SAP TREX 7 (Windows 2003 server)
- IIS 6 with SSO to File Server (SSO22Kerbmap configured)
The problem is that When indexing process is started, we detected that a lot of documents had preprocessing errors. We have in TREX log the following info for each file error:
2009-06-14 17:46:01.589 e preprocessor Preprocessor.cpp(00897) : HTTPHEAD failed for URL http://naspro01.xxxxxx.es:50000/irj/go/km/docs/departamentos/gc-idp/laboratorio/fichas%20de%20seguridad%20reactivos%20lab/merck_es_fichas%20de%20seguridad%20reactivos/1090/109033.pdf with Httpstatus 401
2009-06-14 17:46:01.589 e preprocessor Preprocessor.cpp(03553) : HANDLE: DISPATCH - Processing Document with key '/Departamentos/GC-IDP/Laboratorio/FICHAS DE SEGURIDAD REACTIVOS LAB/Merck_es_Fichas de Seguridad Reactivos/1090/109033.pdf' failed, returning PREPROCESSOR_ACTIVITY_ERROR (Code 6401)
More information:
- index_service user have super administrator rol in our system.
- We have some errors in windows log about sso22kerbmap library. The text is 'Application failure: w3wp.exe, version:6.0.3790.3959, module: SSO22KerbMap.dll,version 1.1.0.8 ...'
Can somebody help us to solve this problem?
Best regards.

Hi Julio,
did you check the following SAP Note ?
SAP Note 735639 - SSO22KerbMap: Known issues
Did you also create an Active Directory User for the user index_service ?
This is described on page 7 in my white paper "[Integration of Windows File Servers into the SAP KM platform using SSO and the WebDAV repository manager|http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/e1f93f5c-0301-0010-5c83-9681791f78ec]".
Best regards,
André

Similar Messages

Problem generating index with 2th level in cs4

Windows XP sp 3 - Indesign CS4 601
When generating an index with a 1st and a 2th level, in the 2th level the entry of the 1st entry is always repeated.
E.g. the index should look as this:
text     75
     capitalized     76
     import     78
word     105
     language     108
     meaning     109
But actually the index looks as this:
text     75
     textcapitalized     76
     textimport     78
word     105
     wordlanguage     108
     wordmeaning     109
Is this a bug, or is there a solution for it?
Thanks for your help!
Luc Van de Cruys
phaedra creative communications

Oops!
Just tested the script.
I run it, and I get the script alert 'All Done' at the end.
But nothing happens.
This is what I do:
- I generate the index.
- I select the index with the text tool (select all or just put the textcursor somewhere in the index makes no difference).
- I run the script
- I get the message 'all done', but there is no difference. The problem persists.
Any other ideas?
Thanks anyway.
L.L.

Problem crating text index with PREFIX_INDEX option

I am trying to create a text index with prefixes option for use in wildcard search scenarios.
Here is the code I use:
connect CTXSYS/*******
BEGIN
ctx_ddl.create_preference('wildcard_pref', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('wildcard_pref','PREFIX_INDEX','TRUE');
ctx_ddl.set_attribute('wildcard_pref','PREFIX_MIN_LENGTH',3);
ctx_ddl.set_attribute('wildcard_pref','PREFIX_MAX_LENGTH',8);
ctx_ddl.set_attribute('wildcard_pref','SUBSTRING_INDEX','YES');
END;
And preference is created
SELECT PRE_OWNER, PRE_NAME FROM CTXSYS.CTX_PREFERENCES;
PRE_OWNER PRE_NAME
CTXSYS WILDCARD_PREF
CTXSYS DEFAULT_STORAGE
CTXSYS DEFAULT_CLASSIFIER
Now when I log as one of the database users and try to create the index,
I got this:
create index wildcard_idx on MY_Table(Name)
indextype is ctxsys.context
parameters ('WORDLIST wildcard_pref') ;
create index wildcard_idx on MY_Table(Name)
ERROR at line 1:
ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: Oracle Text error:
DRG-10700: preference does not exist: wildcard_pref
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 364
What I am doing wrong ? Keep in mind that I was able to create a text index without the prefixes, but a lot of the searches will be based on patial word search.
Eventualy I would also like to make those indexes be tansactional and work as a datastore (multiple column search)
Thanks.
Stefan

Problem solved.
Atrributes and preferences had to be created by the same user creating the index.
Log as sysdba and
GRANT EXECUTE ON CTX_DDL TO <user_that_creates_index>
And it works.
Message was edited by:
Stef4o

Error while indexing after Trex Configuration

Hi ,
After TREX configuration , only my .txt files are getting indexed ..
all other extension files like .doc, .pdf are getting failed at the preprocessing stage
can somebody explain this

hi
this problem is with host name missing
Try these
Logon to portal as administrator
Navigate to System Administration -> System Configuration -> Knowledge management -> Content Management -> Global Service -> Show Advanced option -> URL Generator Service
Then check the Host name.
It should be according to your portal
For example if your portal URL is http://GHIKD.sde.rfd:53200/irj/portal then you need to give host name like http://GHIKD:53200.
If still your problem is existing, please feel free to ask.
Regards
Krishna.

Error when creating index with parallel option on very large table

I am getting a
"7:15:52 AM ORA-00600: internal error code, arguments: [kxfqupp_bad_cvl], [7940], [6], [0], [], [], [], []"
error when creating an index with parallel option. Which is strange because this has not been a problem until now. We just hit 60 million rows in a 45 column table, and I wonder if we've hit a bug.
Version 10.2.0.4
O/S Linux
As a test I removed the parallel option and several of the indexes were created with no problem, but many still threw the same error... Strange. Do I need a patch update of some kind?

This is most certainly a bug.
From metalink it looks like bug 4695511 - fixed in 10.2.0.4.1

Performance - composite index with 'OR' in 'WHERE' clause

I have a problem with the performance of the following query:
select /*+ index_asc(omschact oma_index1) */ knr, projnr, actnr from omschact where ((knr = 100 and actnr > 30) or knr > 100)
and rownum = 1;
(rownum used only for test purpose)
index:
create index on omschact (knr, projnr);
Execution plan:
Id Operation
0 SELECT STATEMENT
1 COUNT STOPKEY
2 TABLE ACCESS BY INDEX ROWID
3 INDEX FULL SCAN
If I'm correct, the 'OR' in the 'WHERE' clause is responsible for the INDEX FULL SCAN, what makes the query slow.
A solution would be then to separate the 'WHERE' clause in 2 separate select's (1 with 'knr = 100 and actnr > 30' and 1 with 'knr > 100' and combine the results with a UNION ALL.
Since it's necessary to have all rows in ascending order (oma_index1) I still have to use an ORDER BY to make sure the order of the rows is correct. This results again in a (too) low performance.
Another solution that does the trick is to create an index with the 2 fields (knr, projnr) concatenated and to use the same in the 'WHERE' clause:
create index oma_index2 on omschact (knr || projnr);
select /*+ index_asc(omschact oma_index2) */ knr, projnr, actnr from omschact where (knr || projnr) > 10030;
I just can't believe this work-around is the only solution, so I was hoping that someone here knows of a better way to solve this.

padders,
I'll give the real data instead of the example. The index I really use consists of 4 fields. In this table the fields are just numbers, but in other tables I need to use char-fields in indexes, so that's why I concatenate instead of using formula's (allthough I would prefer the latter).
SQL> desc omschact
Name Null? Type
KNR NOT NULL NUMBER(8)
PROJNR NOT NULL NUMBER(8)
ACTNR NOT NULL NUMBER(8)
REGELNR NOT NULL NUMBER(3)
REGEL CHAR(60)
first methode:
SQL> create index oma_key_001(knr,projnr,actnr,regelnr);
Index created.
SQL> select /*+ index_asc(omschact oma_key_001) */ * from omschact where
2 (knr > 100 or
3 (knr = 100 and projnr > 30) or
4 (knr = 100 and projnr = 30 and actnr > 100000) or
5 (knr = 100 and projnr = 30 and actnr = 100000 and regelnr >= 0));
Execution Plan
Plan hash value: 1117430516
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 11M| 822M| 192K (1)| 00:38:26 |
| 1 | TABLE ACCESS BY INDEX ROWID| OMSCHACT | 11M| 822M| 192K (1)| 00:38:26 |
|* 2 | INDEX FULL SCAN | OMA_KEY_001 | 11M| | 34030 (1)| 00:06:49 |
Predicate Information (identified by operation id):
2 - filter("KNR">100 OR "KNR"=100 AND "PROJNR">30 OR "KNR"=100 AND "PROJNR"=30
AND "ACTNR">100000 OR "ACTNR"=100000 AND "KNR"=100 AND "PROJNR"=30 AND
"REGELNR">=0)
second method (same index):
SQL> select * from (
2 select /*+ index_asc(omschact oma_key_001) */ * from omschact where knr > 100
3 union all
4 select /*+ index_asc(omschact oma_key_001) */ * from omschact where knr = 100 and projnr > 30
5 union all
6 select /*+ index_asc(omschact oma_key_001) */ * from omschact where knr = 100 and projnr = 30 and actnr > 100000
7 union all
8 select /*+ index_asc(omschact oma_key_001) */ * from omschact where knr = 100 and projnr = 30 and actnr = 100000 and regelnr > 0)
9 order by knr, projnr, actnr, regelnr;
Execution Plan
Plan hash value: 292918786
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 11M| 1203M| | 477K (1)| 01:35:31 |
| 1 | SORT ORDER BY | | 11M| 1203M| 2745M| 477K (1)| 01:35:31 |
| 2 | VIEW | | 11M| 1203M| | 192K (1)| 00:38:29 |
| 3 | UNION-ALL | | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID| OMSCHACT | 11M| 822M| | 192K (1)| 00:38:26 |
|* 5 | INDEX RANGE SCAN | OMA_KEY_001 | 11M| | | 33966 (1)| 00:06:48 |
| 6 | TABLE ACCESS BY INDEX ROWID| OMSCHACT | 16705 | 1272K| | 294 (1)| 00:00:04 |
|* 7 | INDEX RANGE SCAN | OMA_KEY_001 | 16705 | | | 54 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID| OMSCHACT | 47 | 3666 | | 4 (0)| 00:00:01 |
|* 9 | INDEX RANGE SCAN | OMA_KEY_001 | 47 | | | 3 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID| OMSCHACT | 1 | 78 | | 4 (0)| 00:00:01 |
|* 11 | INDEX RANGE SCAN | OMA_KEY_001 | 1 | | | 3 (0)| 00:00:01 |
Predicate Information (identified by operation id):
5 - access("KNR">100)
7 - access("KNR"=100 AND "PROJNR">30)
9 - access("KNR"=100 AND "PROJNR"=30 AND "ACTNR">100000)
11 - access("KNR"=100 AND "PROJNR"=30 AND "ACTNR"=100000 AND "REGELNR">0)
third method:
SQL> create index oma_test(to_char(knr,'00000000')||to_char(projnr,'00000000')||to_char(actnr,'00000000')||to_char(regelnr,'000'));
Index created.
SQL> select /*+ index_asc(omschact oma_test) */ * from omschact where
2 (to_char(knr,'00000000')||to_char(projnr,'00000000')||
3 to_char(actnr,'00000000')||to_char(regelnr,'000')) >=
4 (to_char(100,'00000000')||to_char(30,'00000000')||
5* to_char(100000,'00000000')||to_char(0,'000'))
Execution Plan
Plan hash value: 424961364
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 553K| 55M| 1712 (1)| 00:00:21 |
| 1 | TABLE ACCESS BY INDEX ROWID| OMSCHACT | 553K| 55M| 1712 (1)| 00:00:21 |
|* 2 | INDEX RANGE SCAN | OMA_TEST | 99543 | | 605 (1)| 00:00:08 |
Predicate Information (identified by operation id):
2 - access(TO_CHAR("KNR",'00000000')||TO_CHAR("PROJNR",'00000000')||TO_CHAR("
ACTNR",'00000000')||TO_CHAR("REGELNR",'000')>=TO_CHAR(100,'00000000')||TO_CHAR(3
0,'00000000')||TO_CHAR(100000,'00000000')||TO_CHAR(0,'000'))

Performance of context index with sorting

Dear All,
I've got a problem and don't know how to solve this.
there has a table which have a XMLTYPE field to store the unstructred xml, and created with context index.
When I try to select a record from it by using contains (res, '[searchingfield]')>0, the response time is quick, but when I try to order by another field which in the same table, the response time is drop down slightly. (ex. select id, path, res, update_date from testingtbl where contains(res, 'shopper')>0 order by update_date desc.
Actually there is a context index build for field 'res', any other index build for field 'update_date', when sql without 'order by update_date', the context index will use, but the update_date index will not be used even have ordering criteria.
Is there any expect can tell how to solve this? how to keep the performance even doing the sorting process?
Thanks and Regards
Raymond

Thanks for your quick reply.
The mentions information provide after back to office, actually I just want to know if there is any method(s) which can use the context index (with contains keyword) and sorting without slow down the performance.
Thanks and Regards
Raymond

My website looks fine in other browsers but on firefox it comes up like an index with the title first and on the second line it says skip to content. Can this be fixed?

When I look at my website it loads up as an index with the heading on the top line and then a link saying "skip to content" and then the navigation is in bullet points. I have downloaded the latest version of firefox and this has not improved the situation. The website looks fine in other browsers.

Make sure that you do not block the CSS files on that page.
Reload web page(s) and bypass the cache.
* Press and hold Shift and left-click the Reload button.
* Press "Ctrl + F5" or press "Ctrl + Shift + R" (Windows,Linux)
* Press "Cmd + Shift + R" (MAC)
Start Firefox in <u>[[Safe Mode]]</u> to check if one of the extensions is causing the problem (switch to the DEFAULT theme: Firefox (Tools) > Add-ons > Appearance/Themes).
*Don't make any changes on the Safe mode start window.
*https://support.mozilla.com/kb/Safe+Mode

DML operations performance on table indexed with CTXCAT

Hi,
I have a table with 2M records. The table is batch updated once a day, and the number of record movements (update/delete/insert) should be around 100K.
The table is indexed with CTXCAT.
If I create the index from scratch, it takes 5minutes.
If I perform delete/insert/update operations involving 40K records, it takes a lot more (especially for delete and update operations, something like 30 minutes).
In this particular case I can drop the index and recreate it from scratch every night. The problem is that the 2M records table is only the first step in adoption of Oracle Text. The next step will be a 40M records table, on which the initial index creation takes something like 2hours (so I can't rebuild it every night).
Do you have any suggest?
Thanks.
-- table DDL
CREATE TABLE TAHZVCON_TEXT
CONSUMER_ID NUMBER(10) NOT NULL,
COMPANY_NAME VARCHAR2(4000 CHAR),
CITY VARCHAR2(30 BYTE),
PROVINCE VARCHAR2(3 CHAR),
POST_CODE VARCHAR2(10 BYTE)
CREATE UNIQUE INDEX TAHZVCON_TEXT_PK ON TAHZVCON_TEXT (CONSUMER_ID);
begin
ctx_ddl.drop_preference('mylex');
ctx_ddl.create_preference('mylex', 'BASIC_LEXER');
ctx_ddl.set_attribute('mylex', 'printjoins', '.#');
ctx_ddl.set_attribute('mylex', 'base_letter', 'YES');
ctx_ddl.set_attribute('mylex', 'index_themes','NO');
ctx_ddl.set_attribute('mylex', 'index_text','YES');
ctx_ddl.set_attribute('mylex', 'prove_themes','NO');
ctx_ddl.drop_preference('mywordlist');
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('mywordlist','stemmer','NULL');
ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'NO');
ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','NO');
ctx_ddl.drop_index_set('tahzvcon_iset');
ctx_ddl.create_index_set('tahzvcon_iset');
ctx_ddl.add_index('tahzvcon_iset','city');
ctx_ddl.add_index('tahzvcon_iset','province');
ctx_ddl.add_index('tahzvcon_iset','post_code');
end;
CREATE INDEX TAHZVCON_TEXT_TI01 ON TAHZVCON_TEXT(COMPANY_NAME)
INDEXTYPE IS CTXSYS.CTXCAT
PARAMETERS ('lexer mylex wordlist mywordlist index set tahzvcon_iset')
PARALLEL 8;
Andrea

Hi kevinUCB,
I've decided to use CTXCAT indexes because I had to perform queries involving different columns (company name, city, region, etc.). So I thought CTXCAT was the right index for me.
Now I've discovered that if I index an XML with CONTEXT, I can perform a search on single XML fields, so CONTEXT is suitable for my needs.
Preliminary test on the 2M record table looks very good.
Bye,
Andrea

Problems searching with INPATH

hi,
i want to search in a table by INPATH where i saved my xml files as blobs.
searching with the contains query operator works fine, but the problem is, that i don't get any results if i try to search using INPATH in the contains query.
do i need special privileges for searching with INPATH, HASPATH and WITHIN?
is there a problem by saving xml files in a BLOB instead of a CBLOB or a XMLType?
here what i did:
i created the following table and inserted some xml files using java and the oracle-jdbc-driver:
create table xmldocs (docid number not null, title varchar2 (30), format varchar2(10), docblob blob);
i create my context index:
create index xmldocs_idx on xmldocs(docblob) indextype is ctxsys.context parameters
('datastore ctxsys.default_datastore filter ctxsys.null_filter section group ctxsys.path_section_group');
here an example of a xml file (with the docid=4) that i inserted:
<A><B><C>dog</C></B></A>
by searching for "dog" i receive a result:
select docid from xmldocs where contains(docblob, 'dog')>0;
DOCID
4
but if i search with INPATH i receive nothing:
select docid from xmldocs where contains(docblob, 'HASPATH(/A)')>0;
any ideas???

nevertheless, i know the difference between INPATH and HASPATH, that is not my problem.
so if i execute the statement:
select docid from xmldocs where contains(docblob,'dog INPATH (/A)')>0;
or
select docid from xmldocs where contains(docblob,'HASPATH (//A = "dog")')>0;
i receive "no rows selected", and that's my problem.
it seems that the index that should be used for searching with INPATH or HASPATH is not correct or not available. if i compare the token_texts of the index that generated by create index and the parameter ctxsys.path_section_group to an index generated without any parameters it's the same. should it be like this? is that regular?
i checked the indexes with:
select token_text from dr$xmldocs2_idx$i;
if i check the available indexes with:
select index_name, index_type, ityp_owner, ityp_name, domidx_opstatus
from user_indexes
where ityp_owner = 'CTXSYS';
i receive:
INDEX_NAME INDEX_TYPE ITYP_OWNER ITYP_NAME DOMIDX
XMLDOCS_IDX DOMAIN CTXSYS CONTEXT VALID
do i need any other database privileges than to ctxapp?
thanks a lot if somebody can help me!!!
randy

How to create new INDEX for TREX

Hello Experts,
Iam new to this KM and TREX so need ur guidance for this...
I need to create an INDEX for TREX, so after going through various messages i found that it can be created form the Index Administration iView portal----System Administration -> Syetm Configuration -> Knowledge Management -> Index Administration -> New Index
But iam not getting the above path in SPRO OR how to run Index Administration iView OR from which portal i need to create an INDEX??can anyone provide steps on how to go to System Administration -> Syetm Configuration -> Knowledge Management -> Index Administration -> New Index
path.....
Regards.

I'm not very familiar with CRM business partners, but the common way - to include the repository where the documents are stored to the KM using [CM Repository managers|http://help.sap.com/saphelp_nw04/helpdata/en/e3/923227b24e11d5993800508b6b8b11/frameset.htm].
You need first to create a repository manager for you repository and only then you'll be able to create an index.
Regards, Mikhail.

How to initialize the index with "at new" command ??

Hi All,
I am facing a problem of Initializing the index .
I want to Initialize the Index with the " at new "
command within a loop but its not working.
Otherwise,please tell me the way to put a flag on the first record with " at new " command.
Please provide a solution ASAP.

loop at itab.
if sy-index = 1.
set ur flag.
endloop.
check the code below..
LOOP AT lt_citm_b INTO wa_citm_b.
    AT NEW vbeln .
      CLEAR: wa_temp_output , lt_temp_output[].
    ENDAT.
    AT LAST.
      lv_t = 'X'.
    ENDAT.
    LOOP AT lt_char INTO wa_char WHERE instance = wa_citm_b-instance.
      MOVE: c_d TO wa_temp_output-type,
      wa_citm_b-vbeln TO wa_temp_output-vbeln,
      wa_citm_b-posnr TO wa_temp_output-posnr,
      wa_citm_b-matnr TO wa_temp_output-matnr.
      MOVE-CORRESPONDING wa_char TO wa_temp_output.
Special requirement
     IF wa_temp_output-atnam = c_itr AND wa_temp_output-atwrt = c_itr03
                                                    OR
       wa_temp_output-atnam = c_cosr AND wa_temp_output-atwrt = c_osr03.
        CLEAR: wa_temp_output-atnam, wa_temp_output-atwrt.
        MOVE: c_iorp TO wa_temp_output-atnam,
              c_in2-os_nd TO wa_temp_output-atwrt,
              'X' TO wa_temp_output-flag.
      ENDIF.
*If the characteristic value for PMFREQUENCIE is initial need to pull
*the floating value.
      PERFORM fr_get_float TABLES lt_char USING wa_char-atwrt
                                                wa_char-instance
                                                wa_char-atnam
                                       CHANGING wa_temp_output-atwrt.
*To recognize if contract item have extended and standard
*characteristics.
      PERFORM fr_ext_std USING wa_temp_output-atnam
                      CHANGING wa_temp_output-ext
                               wa_temp_output-std.
      APPEND wa_temp_output TO lt_temp_output.
      CLEAR: wa_temp_output, wa_char.
    ENDLOOP.
    SORT lt_temp_output BY vbeln posnr std ext.
    AT END OF vbeln.
To translate the data according to the mapping rules
      PERFORM fr_translate.
      IF lv_t = 'X'.
        DESCRIBE TABLE lt_output LINES gv_count.
        READ TABLE lt_output INTO wa_output INDEX gv_count.
        IF sy-subrc EQ 0.
          CLEAR wa_output.
          MOVE: c_t TO wa_output-type.
          APPEND wa_output TO lt_output.
        ENDIF.
      ENDIF.

Content has been indexed with Info only. Resubmit should only be performed

Hi All,
Im using the Oracle Content Server (OCS) , When im trying to checkin new document then i get the below mentioned error message can any one plz tell me that what is the problem.
Error Message:_
Text conversion of the file failed.
Content has been indexed with Info only. Resubmit should only be performed if the problem has been resolved.
Text conversion of the file '//awusrp04/PortalStg/oracle/inetucmstg/weblayout/groups/public/@enterprise/@hr/documents/document/s_013020.pdf' failed.
**Content has been indexed with Info only. Resubmit should only be performed if the problem has been resolved. **

Hello Experts,
I am Facing the Same Issue, anybody know the solution for the same?
Thanks in Advance.

How to search/filter HR documents with TREX

Hi,
We have installed TREX 7.0
I need to use it with HR docs
We must find for exemple :
how to receive the list of HR contracts signed since last week ( possibility to filter via a document type ? )
how to have all the documents linked to a personnal ID
how to search and filter text in scanned HR documents stored in a content server
I did not find documentation telling how to manage HR documents with TREX
I have only tried SES but it is not useful for what I need.
Could you please tell me if TREX can be used for this task ? Can you give me some link for HR document management on TREX ?
Best regards
Franck

Hi Frank,
Yes, Trex can be used for this task. I have noe idea particularly about HR document management on TREX .
But I can help you out for using the Trex for classification, Indexing etc based on your requirements.
Please reply with issues.
Best Regards,
Atul

Problems indexing 30M documents

Platform: Sun 4800, 12 CPU, Solaris 9, 48 Gb RAM
Oracle Version: 10.1.04
Database Character Set: UTF-8
SGA MAX SIZE: 24 Gb
hi,
Our database contains a mix of image files and plain text documents in 30 different languages (approximately 30 million rows). When we try to index the documents (using IGNORE in the format column to skip the rows containing images), the indexing either bombs out or hangs indefinitely.
When I first started working on the problem, there were rows in the ctx_user_index_errors table which didn't really give any good indication of what the problem was. I created a new table containing just these rows and was able to index them with no problems using the same set of preferences and the same indexing script. At that time, they were using just the BASIC_LEXER.
We created a MULTI_LEXER preference and added sub-lexers when lexers existed for the specified language, using the BASIC_LEXER as the default. When we tried to create the index using a parallel setting of 6, the indexing failed after 2 hours, and we got the following error codes: ORA-29855, ORA-20000, DRG-50853, DRG-50857, ORA-01002, and ORA-06512. We then tried to create the index without parallel slaves, and it failed after 3 hours with an end of file on communication channel error.
Thinking perhaps that it was the MULTI_LEXER that was causing the problem (because the data is converted to UTF-8 by an external program, and the character set and language ID is not always 100% accurate), we tried to create the index using just the BASIC_LEXER (knowing that we wouldn't get good query results on our CJK data). We set the parallel slaves to 6, and it ran for more than 24 hours, with each slave indexing about 4 million documents (according to the logs) before just hanging. The index state in ctx_user_indexes is POPULATE, and in user_indexes is INPROGRESS. There were three sessions active, 2 locked, and 1 blocking. When we were finally able to ctl-C out of the create index command, SQL*Plus core dumped. It takes hours to drop the index as well.
We're at a loss to figure out what to try next. This database has been offline for about a week now, and this is becoming critical. In my experience, once the index gets hung in POPULATE, there's no way to get it out other than dropping and recreating the index. I know that Text should be able to handle this volume of data, and the machine is certainly capable of handling the load. It could be that the MULTI_LEXER is choking on improperly identified languages, or that there are errors in the UTF-8 conversion, but it also has problems when we use BASIC_LEXER. It could be a problem indexing in parallel, but it also dies when we don't use parallel. We did get errors early on that the parallel query server died unexpectedly, but we increased the PARALLEL_EXECUTION_MESSAGE_SIZE to 65536, and that stopped the parallel errors (and got us to the point of failure quicker).
Any help you can provide would be greatly appreciated.
thanks,
Tarisa.

I'm working with the OP on this. Here is the table definition and
the index creation with all the multi_lexer prefs. The table
is hash partitioned, and we know the index cannot be
local because of this, so it is a global domain index.
Perhaps of interest, we have changed PARALLEL_EXECUTION_MESSAGE_SIZE
from the default up to 32K. This made a huge difference in indexing speed, but
so far has just helped us get to the point of failure faster.
CREATE TABLE m (
DOC_ID NUMBER,
CID NUMBER,
DATA CLOB,
TYPE_ID NUMBER(10),
FMT VARCHAR2(10),
ISO_LANG CHAR(3)
LOB (data) store as meta_lob_segment
( ENABLE STORAGE IN ROW
PCTVERSION 0
NOCACHE
NOLOGGING
STORAGE (INITIAL 32K NEXT 32K)
CHUNK 16K )
PARTITION BY HASH ( doc_id )
PARTITIONS 6
STORE IN (ts1, ts2, ts3, ts4, ts5, ts6),
pctfree 20
initrans 12
maxtrans 255
tablespace ts
ALTER TABLE m
ADD (CONSTRAINT pk_m_c PRIMARY KEY (doc_id, cid)
USING index
pctfree 20
initrans 12
maxtrans 255
tablespace ts
nologging )
BEGIN
ctx_ddl.create_preference('english_lexer', 'basic_lexer');
ctx_ddl.set_attribute('english_lexer','index_themes','false');
ctx_ddl.set_attribute('english_lexer','index_text','true');
ctx_ddl.create_preference('japanese_lexer','japanese_lexer');
ctx_ddl.create_preference('chinese_lexer','chinese_lexer');
ctx_ddl.create_preference('korean_lexer','korean_morph_lexer');
ctx_ddl.create_preference('german_lexer','basic_lexer');
ctx_ddl.set_attribute('german_lexer','index_themes','false');
ctx_ddl.set_attribute('german_lexer','index_text','true');
ctx_ddl.set_attribute('german_lexer','composite','german');
ctx_ddl.set_attribute('german_lexer','mixed_case','yes');
ctx_ddl.set_attribute('german_lexer','alternate_spelling','german');
ctx_ddl.create_preference('french_lexer','basic_lexer');
ctx_ddl.set_attribute('french_lexer','index_text','true');
ctx_ddl.set_attribute('french_lexer','index_themes','false');
ctx_ddl.set_attribute('french_lexer','base_letter','yes');
ctx_ddl.create_preference('spanish_lexer','basic_lexer');
ctx_ddl.set_attribute('spanish_lexer','index_text','true');
ctx_ddl.set_attribute('spanish_lexer','index_themes','false');
ctx_ddl.set_attribute('spanish_lexer','base_letter','yes');
ctx_ddl.create_preferences('global_lexer','multi_lexer');
ctx_ddl.add_sub_lexer('global_lexer','default','english_lexer');
ctx_ddl.add_sub_lexer('global_lexer','english','english_lexer','eng');
ctx_ddl.add_sub_lexer('global_lexer','gernan','german_lexer','ger');
ctx_ddl.add_sub_lexer('global_lexer','french','french_lexer','fra');
ctx_ddl.add_sub_lexer('global_lexer','spanish','spanish_lexer','spa');
ctx_ddl.add_sub_lexer('global_lexer','japanese','japanese_lexer','jpn');
ctx_ddl.add_sub_lexer('global_lexer','korean','korean_lexer','kor');
ctx_ddl.add_sub_lexer('global_lexer','simplified chinese','chinese_lexer','zho');
ctx_ddl.add_sub_lexer('global_lexer','traditional chinese','chinese_lexer');
END;
BEGIN
ctx_output.start_log('m_ctx_data.log');
END;
CREATE INDEX m_ctx_data ON m (data)
INDEXTYPE IS ctxsys.context
PARAMETERS ('memory 1G
lexer global_lexer
format column fmt
language column iso_lang
sync (every "sysdate+1")' )
PARALLEL 6
BEGIN
ctx_output.end_log();
END;
/

Problem indexing with TREX

Similar Messages

Maybe you are looking for