Indexing Speed

Recently I have been using Oracle Text and I have noticed that the indexing speed is rather low. I have changed the max index mem to 2GB and the default to 2GB also and have created a partitioned table made of 8 partitions and use the 'local' and 'parallel 8' keywords when indexing.
I am running this on a server with 32GB of ram and 8 processors.
It takes about 3.3 hours to index 3GB (360k documents).
I am using a FILE_DATASTORE and the documents are stored on a RAID.
I have used other search engines on the same machine and this is considerably slow. Any ideas what I can do to speed this up?

You might try increasing sort_area_size. Also, if your memory settings are so high as to cause paging it can make indexing slower instead of faster.

Similar Messages

What is the command to get network interface index, speed and duplex?

I need the following info
interface index
interface speed
interface duplex
interface negotiation
IPv4 dhcp server
IPv6 dhcp server
can any one let me know the command to get the above info?

Hi,
on Solaris 10 you can use the dladm command. Have a look at the man page, e.g.:
dladm show-dev
e1000g0 link: up speed: 100 Mbps duplex: full
e1000g1 link: down speed: 0 Mbps duplex: half
e1000g2 link: unknown speed: 0 Mbps duplex: unknown
e1000g3 link: unknown speed: 0 Mbps duplex: unknown
dladm show-link
e1000g0 type: non-vlan mtu: 1500 device: e1000g0
e1000g1 type: non-vlan mtu: 1500 device: e1000g1
e1000g2 type: non-vlan mtu: 1500 device: e1000g2
e1000g3 type: non-vlan mtu: 1500 device: e1000g3
Also have a look at ifconfig, e.g. ifconfig -a
Andreas

Does an index speed up MAX()

Does an index on a numeric colum speed up MAX() queries or does oracle hold that value ready within its column statistics?
Thank you,
Felix

An index can speed-up querries that use that index.
If you need to select max(id), then, an index on the ID column will help, provided that you do not have a complicated where clause that will make the index on the id column not to be used.
so:
select max(id)
from my_table
where date_col between :low_date and :high_date
Will not use the index on the ID column, but an index on date_col will help.
See your execution plan in order to determine whether an index will help or not.
Moreover, one other thing you have to pay attention to is whether you are querrying some tables that are implicated in intense OLTP or not. If the table is implicated in intense OLTP, then, additional indexes will slow down the transaction processing, which may not be acceptable. If you query somekind of database warehouse, feel free to use indexes, and bitmap indexes if the columns do not have high cardinality. For example, bitmap index on department_id, or on currency_id.

Difference b/w index and unique

Hi,
Difference b/w index and unique?

hi,
The optional additions UNIQUE or NON-UNIQUE determine whether the key is to be unique or non-unique, that is, whether the table can accept duplicate entries. If you do not specify UNIQUE or NON-UNIQUE for the key, the table type is generic in this respect. As such, it can only be used for specifying types. When you specify the table type simultaneously, you must note the following restrictions:
You cannot use the UNIQUE addition for standard tables. The system always generates the NON-UNIQUE addition automatically.
You must always specify the UNIQUE option when you create a hashed table.
INDEX:
An index can be considered a copy of a database table that has been reduced to certain fields. This copy is always in sorted form. Sorting provides faster access to the data records of the table, for example using a binary search. The index also contains a pointer to the corresponding record of the actual table so that the fields not contained in the index can also be read.
The primary index is distinguished from the secondary indexes of a table. The primary index contains the key fields of the table and a pointer to the non-key fields of the table. The primary index is created automatically when the table is created in the database
You can also create further indexes on a table in the ABAP Dictionary. These are called secondary indexes. This is necessary if the table is frequently accessed in a way that does not take advantage of the sorting of the primary index for the access.
Indexes speed up data selection from the database. They consist of selected fields of a table, of which a copy is then made in sorted order. If you specify the index fields correctly in a condition in the WHERE or HAVING clause, the system only searches part of the index (index range scan).
The system automatically creates the primary index. It consists of the primary key fields of the database table. This means that for each combination of fields in the index, there is a maximum of one line in the table. This kind of index is also known as UNIQUE.
If you cannot use the primary index to determine the result set because, for example, none of the primary index fields occur in the WHERE or HAVINGclauses, the system searches through the entire table (full table scan). For this case, you can create secondary indexes, which can restrict the number of table entries searched to form the result set.
You create secondary indexes using the ABAP Dictionary. There you can create its columns and define it as UNIQUE. However, you should not create secondary indexes to cover all possible combinations of fields.
Only create one if you select data by fields that are not contained in another index, and the performance is very poor. Furthermore, you should only create secondary indexes for database tables from which you mainly read, since indexes have to be updated each time the database table is changed. As a rule, secondary indexes should not contain more than four fields, and you should not have more than five indexes for a single database table.
If a table has more than five indexes, you run the risk of the optimizer choosing the wrong one for a particular operation. For this reason, you should avoid indexes with overlapping contents.
Secondary indexes should contain columns that you use frequently in a selection, and that are as highly selective as possible. The fewer table entries that can be selected by a certain column, the higher that columns selectivity. Place the most selective fields at the beginning of the index. Your secondary index should be so selective that each index entry corresponds to, at most, five percent of the table entries. If this is not the case, it is not worth creating the index. You should also avoid creating indexes for fields that are not always filled, where their value is initial for most entries in the table.
If all of the columns in the SELECT clause are contained in the index, the system does not have to search the actual table data after reading from the index. If you have a SELECT clause with very few columns, you can improve performance dramatically by including these columns in a secondary index.
What is the difference between primary index and secondary index?
http://help.sap.com/saphelp_47x200/helpdata/en/cf/21eb2d446011d189700000e8322d00/frameset.htm
A difference is made between Primary & Secondary indexes to a table. the primary index consists of the key fields of the table and a pointer to the non-keys-fields of the table. The Primary index is generated automatically when a table is created and is created in the datebase as the same times as the table. It is also possible to define further indexes to a table in the ABAP/4 dictionary, which are then referred to as Secondary indexes.
Message was edited by:
Roja Velagapudi

Primary index does not exist in database but shows in SE14?

I added a couple of key fields to a Z table and activated it. Every thing went well - adjusted it with SE14 also.
When I checked the Runtime object - it is OK.
When I checked Database Object, it is showing that the Primary Index does not exist in Database but SE14 says that the Priamry Index exists in Database.
Please advise how to correct this. I saw another psoting when I searched per this error message: "Indexes: Inconsistent with DDIC source" . It tells to create this index at Database but need to know the steps.
pl advise.
(reposting here)
Edited by: Venkatabby on Mar 26, 2008 4:05 PM

hi,
To create a index for a table,
go to se11->click on Indexes tab and create a primary index.
Indexes speed up data selection from the database. They consist of selected fields of a table, of which a copy is then made in sorted order. If you specify the index fields correctly in a condition in the WHERE or HAVING clause, the system only searches part of the index (index range scan).
The system automatically creates the primary index. It consists of the primary key fields of the database table. This means that for each combination of fields in the index, there is a maximum of one line in the table. This kind of index is also known as UNIQUE.
If you cannot use the primary index to determine the result set because, for example, none of the primary index fields occur in the WHERE or HAVINGclauses, the system searches through the entire table (full table scan). For this case, you can create secondary indexes, which can restrict the number of table entries searched to form the result set.
You create secondary indexes using the ABAP Dictionary. There you can create its columns and define it as UNIQUE.
reward points if useful.
regards
sandhya

Indices configuration for XML document analysis (indexing time problems)

Hi all,
I'm currently developing a tool for XML Document analysis using XQuery. We have a need to analyse the content of a large CMS dump, so I am adding all documents to a berkeley DB xml to be able to run xqueries against it.
In my last run I've been running to indexing speed problems, with single documents (typically 10-20 K in size) taking around 20 sec to be added to the database after 6000 documents (I've got around 20000 in total). The time needed for adding docs to the database drops with the number of documents.
I suspect my index configuration to be the reason for this performance drop. Indeed, I've been very generous with indexes, as we have to analyse the data and don't know the structure in advance.
Currently my index configuration includes:
- 2 default indicess: edge-element-presence-none and edge-attribute-presence-none to be able to speed up every possible xquery to analyse data patterns: ex. collection()//table//p[contains(.,'help')]
- 8 edge-attribute-substring-string indices on attributes we use often (id, value, name, ...)
- 1 edge-element-substring-string index on the root element of the xml documents to be able to speed up document searches: ex. collection()//page[contains(.,'help')]
So here my questions:
- Are there any possible performance optimisations in Database config (not index config)? I only set the following:
setTransactional(false);
envConf.setCacheSize(1024*64);
envConf.setCacheMax(1024*256);
- How can I test various index configuration on the fly? Are there any db tools that allow to set/remove indexes?
- Is my index config suspect? ;-)
Greetings,
Nils

Hi Nils,
The edge-element-substring-string index on the document element is almost certainly the cause of the slow document inserts - that's really not a good idea. Substring indexes are used to optimize "=", contains(), starts-with() and ends-with() when they are applied to the named element that has the substring index, so I don't think that index will do what you want it to.
John

Force Outlook to index faster

Outlook 2010 - fully patched.
Is there a way to force Windows/Outlook to use more resources to index quicker? It's painful when you've got several GB mailboxes and PST's and need to perform searches in Outlook.
I've got 16GBs, a performance SSD, and 8 cores - I want indexing to use them!

Hi,
We can't control how many resources Windows takes to index. However, for a tip to speed up Windows Indexing process, please disable indexer backoff as mentioned in the article below:
http://www.question-defense.com/2010/12/03/windows-7-indexing-speed-up-windows-indexing-process
Please Note: Since the web site is not hosted by Microsoft, the link may change without notice. Microsoft does not guarantee the accuracy of this information.
Regards,
Melon Chen
Forum Support
Come back and mark the replies as answers if they help and unmark them if they provide no help.
If you have any feedback on our support, please click
here

Secondary index on vbfa table

I have very expensive statement running against VBFA table. It comes from a customer report and doing SQL:
SELECT /*+ FIRST_ROWS */ "VBELV"
FROM "VBFA"
WHERE "MANDT" = :A0 AND "VBELN" = :A1 AND ROWNUM <= :A2
It has execution plan:
SELECT STATEMENT ( Estimated Costs = 96.009 , Estimated #Rows = 41 )
|
--- COUNT STOPKEY
|
INDEX RANGE SCAN VBFA~0
As you can see it is very expensive statement because VBFA is huge table and because I have only VBFA~0 index with columns:
UNIQUE Index VBFA~0
COLUMN DISTINCT VALUES
MANDT      1
VBELV       1.589.207
POSNV      4.184
VBELN       3.202.114
POSNN       58.173
VBTYP_N    18
In order to improve performance of this report , would you recomend creating secondary index and would it be on columns: MANDT, VBELN, VBELV
I have not seen this type of secondary index in SAP community (most of the time I see sec.index on mandt, vbeln and posnn columns) so that is why I want to double check it before I deploy it.
Regards,
Andrija

Hi,
Indexes speed up access to rows in a table. They can be created for a single column or for a series of columns.
MANDT AND VBELN does not have index then create indexes on these columns.
The explain statement can be used to check the effect of creating or deleting indexes (see index) on the choice of search strategy for the specified SQL statement. You can also estimate the time needed by the database system to process the specified SQL statement. The specified QUERY or SINGLE SELECT statement is not executed while the EXPLAIN statement is being executed.
To be frank to analyze you shd generate trace file and you need to analyze.
Oracle claims that <b>first_rows_n</b> optimization results in faster response time for certain queries, we must remember that the performance is achieved via a change to the costing.
Use the FIRST_ROWS hint when you need only the first few hits of a query. When you need the entire result set, do not use this hint as it might result in poorer performance.
So collect stats,Analyze table create index your query will execute faster.
Analyze in trace file generated.Want to know indexes are used or not have a look on Explain Plan.
Regards
Vinod

How an INDEX of a Table got selected when a SELECT query hits the Database

Hi All,
How an Index got selected when a SELECT query hits the Database Table.
My SELECT query is as ahown below.
    SELECT ebeln ebelp matnr FROM ekpo
                   APPENDING TABLE i_ebeln
                   FOR ALL ENTRIES IN i_mara_01
                   WHERE werks = p_werks      AND
                         matnr = i_mara_01-matnr AND
                         bstyp EQ 'F'         AND
                         loekz IN (' ' , 'S') AND
                         elikz = ' '          AND
                         ebeln IN s_ebeln     AND
                         pstyp IN ('0' , '3') AND
                         knttp = ' '          AND
                         ko_prctr IN r_prctr AND
                         retpo = ''.
The fields in the INDEX of the Table EKPO should be in the same sequence as in the WHERE clasuse?
Regards,
Viji

Hi,
You minimize the size of the result set by using the WHERE and HAVING clauses. To increase the efficiency of these clauses, you should formulate them to fit with the database table indexes.
Database Indexes
Indexes speed up data selection from the database. They consist of selected fields of a table, of which a copy is then made in sorted order. If you specify the index fields correctly in a condition in the WHERE or HAVING clause, the system only searches part of the index (index range scan).
The primary index is always created automatically in the R/3 System. It consists of the primary key fields of the database table. This means that for each combination of fields in the index, there is a maximum of one line in the table. This kind of index is also known as UNIQUE. If you cannot use the primary index to determine the result set because, for example, none of the primary index fields occur in the WHERE or HAVING clause, the system searches through the entire table (full table scan). For this case, you can create secondary indexes, which can restrict the number of table entries searched to form the result set.
reference : help.sap.com
thanx.

Creating Index to a master table

Hello All,
I have a database table with the following fields
CLIENT
MATNR
KORDX
KCATV
VARIANT_NBR
With the initial four fields making the primary key. ( CLIENT, MATNR, KORDX, KCATV).
I would like to put an index to the table with the fields (CLIENT, VARIANT_NBR), because there are many reads in the program for the field VARIANT_NBR. Could you please put across on views on this. What all factors should I consider for creating an index in performance perspective.
Thanks in advance
Sudha

Hi,
Regarding indexes information check this link...
http://help.sap.com/saphelp_nw04/helpdata/en/cc/7c58b369022e46b629bdd93d705c8c/content.htm
and
http://www.ncsu.edu/it/mirror/mysql/doc/maxdb/en/6a/c943401a306f13e10000000a1550b0/content.htm
And also go through the below information...
They may help you in optimizing your program.
The Optimizer
Each database system uses an optimizer whose task is to create the execution plan for SQL statements (for example, to determine whether to use an index or table scan). There are two kinds of optimizers:
1) Rule based
Rule based optimizers analyze the structure of an SQL statement (mainly the SELECT and WHERE clauses without their values) and the table index or indexes. They then use an algorithm to work out which method to use to execute the statement.
2) Cost based
Cost based optimizers use the above procedure, but also analyze some of the values in the WHERE clause and the table statistics. The statistics contain low and high values of the fields, or a histogram containing the distribution of data in the table. Since the cost based optimizer uses more information about the table, it usually leads to faster database access. Its disadvantage is that the statistics have to be periodically updated.
Minimize the Search Overhead
You minimize the size of the result set by using the WHERE and HAVING clauses. To increase the efficiency of these clauses, you should formulate them to fit with the database table indexes.
Database Indexes
Indexes speed up data selection from the database. They consist of selected fields of a table, of which a copy is then made in sorted order. If you specify the index fields correctly in a condition in the WHERE or HAVING clause, the system only searches part of the index (index range scan). The primary index is always created automatically in the R/3 System. It consists of the primary key fields of the database table. This means that for each combination of fields in the index, there is a maximum of one line in the table. This kind of index is also known as UNIQUE. If you cannot use the primary index to determine the result set because, for example, none of the primary index fields occur in the WHERE or HAVING clause, the system searches through the entire table (full table scan). For this case, you can create secondary indexes, which can restrict the number of table entries searched to form the result set. You specify the fields of secondary indexes using the Abap Dictionary. You can also determine whether the index is unique or not. However, you should not create secondary indexes to cover all possible combinations of fields. Only create one if you select data by fields that are not contained in another index, and the performance is very poor. Furthermore, you should only create secondary indexes for database tables from which you mainly read, since indexes have to be updated each time the database table is changed. As a rule, secondary indexes should not contain more than four fields, and you should not have more than five indexes for a single database table. If a table has more than five indexes, you run the risk of the optimizer choosing the wrong one for a particular operation. For this reason, you should avoid indexes with overlapping contents. Secondary indexes should contain columns that you use frequently in a selection, and that are as highly selective as possible. The fewer table entries that can be selected by a certain column, the higher that column's selectivity. Place the most selective fields at the beginning of the index. Your secondary index should be so selective that each index entry corresponds to at most five percent of the table entries. If this is not the case, it is not worth creating the index. You should also avoid creating indexes for fields that are not always filled, where their value is initial for most entries in the table. If all of the columns in the SELECT clause are contained in the index, the system does not have to search the actual table data after reading from the index. If you have a SELECT clause with very few columns, you can improve performance dramatically by including these columns in a secondary index.
Formulating Conditions for Indexes
You should bear in mind the following when formulating conditions for the WHERE and HAVING clauses so that the system can use a database index and does not have to use a full table scan. Check for Equality and Link Using AND The database index search is particularly efficient if you check all index fields for equality (= or EQ) and link the expressions using AND.
Use Positive conditions
The database system only supports queries that describe the result in positive terms, for example, EQ or LIKE. It does not support negative expressions like NE or NOT LIKE. If possible, avoid using the NOT operator in the WHERE clause, because it is not supported by database indexes; invert the logical expression instead.
Using OR
The optimizer usually stops working when an OR expression occurs in the condition. This means that the columns checked using OR are not included in the index search. An exception to this are OR expressions at the outside of conditions. You should try to reformulate conditions that apply OR expressions to columns relevant to the index, for example, into an IN condition.
Using Part of the Index
If you construct an index from several columns, the system can still use it even if you only specify a few of the columns in a condition. However, in this case, the sequence of the columns in the index is important. A column can only be used in the index search if all of the columns before it in the index definition have also been specified in the condition.
Checking for Null Values
The IS NULL condition can cause problems with indexes. Some database systems do not store null values in the index structure. Consequently, this field cannot be used in the index.
Avoid Complex Conditions
Avoid complex conditions, since the statements have to be broken down into their individual components by the database system.
Hope this information had helped you.
Regards
Narin Nandivada.

Steps for creating a database index

Do we just create it from SE11? Does Basis needs to be involved for any furthur steps?

Hi Amrutha,
Indexes speed up data selection from the database. They consist of selected fields of a table, of which a copy is then made in sorted order. If you specify the index fields correctly in a condition in the WHERE or HAVING clause, the system only searches part of the index (index range scan). The primary index is always created automatically in the R/3 System. It consists of the primary key fields of the database table. This means that for each combination of fields in the index, there is a maximum of one line in the table. This kind of index is also known as UNIQUE. If you cannot use the primary index to determine the result set because, for example, none of the primary index fields occur in the WHERE or HAVING clause, the system searches through the entire table (full table scan). For this case, you can create secondary indexes, which can restrict the number of table entries searched to form the result set. You specify the fields of secondary indexes using the Abap Dictionary. You can also determine whether the index is unique or not. However, you should not create secondary indexes to cover all possible combinations of fields. Only create one if you select data by fields that are not contained in another index, and the performance is very poor. Furthermore, you should only create secondary indexes for database tables from which you mainly read, since indexes have to be updated each time the database table is changed. As a rule, secondary indexes should not contain more than four fields, and you should not have more than five indexes for a single database table. If a table has more than five indexes, you run the risk of the optimizer choosing the wrong one for a particular operation. For this reason, you should avoid indexes with overlapping contents. Secondary indexes should contain columns that you use frequently in a selection, and that are as highly selective as possible. The fewer table entries that can be selected by a certain column, the higher that column's selectivity. Place the most selective fields at the beginning of the index. Your secondary index should be so selective that each index entry corresponds to at most five percent of the table entries. If this is not the case, it is not worth creating the index. You should also avoid creating indexes for fields that are not always filled, where their value is initial for most entries in the table. If all of the columns in the SELECT clause are contained in the index, the system does not have to search the actual table data after reading from the index. If you have a SELECT clause with very few columns, you can improve performance dramatically by including these columns in a secondary index.
Index:
http://help.sap.com/saphelp_nw04/helpdata/en/cf/21eb20446011d189700000e8322d00/content.htm
Creating Secondary Index
http://help.sap.com/saphelp_nw04/helpdata/en/cf/21eb47446011d189700000e8322d00/content.htm
regards,
keerthi.

How index Works.

Hi all,
Shall i know how the indexes work internally in Oracle database. Because when we are creating index in some fields, its slows down the speed of the query, and when creating some indexes speeds up the index.
So please help me How to determine the fields, in which i have to create index and in what basis.
Thanks and Regards,
Manu.

Index are reference to data in tables's columns on what they are based. Index work exactly in way like we have book's index describing its chapters and what page # chapter can be found.
They only contain reference to the data. There are two types on index.
B-tree index and bit map indexes.
Whenever new index created on column it stores column value and rowid of that row into the index. The index maintains all the values in the tree structure. i.e. if you create index on salary column (supose salary column contain the salary range 1000) in that case half of the branch on one side contain values upto 1 to 500 and on another side 501 to 1000. again 1 to 500 branch is subdivided into 1 to 250 and 251 to 500. like this oracle divides all the entered data. The actual value contained components are called leaves.
You can get more details at bellow given links.
http://www.orafaq.com/node/1403
Creating new index can hamper the DML performance so there should be ballance between performance of DML and select statments. Consider creating the index if it will be used by many select queries.
Consider the table columns for index creation which will appear in most of the where clause of select statments.

CF8 Verity Indexing

I've been using Verity to build collection of data from my
database systems for several years now using CF5 and CF6.1. I
recently upgraded my system to CF8. While searching these
collections is 5 to 6 times faster, indexing these collections is
SIGNIFICANTLY slower.
My system has some 50+ screens that I use to collect and
store data into either MSAccess or SQL Sever databases. There are
26 different collections that I use to store this data in. After
the data from each screen is stored in the datbase, I then update
the pertinent collection based upon the screen and database
table(s) where the data is being stored. The collections are a
compromise between number of screens and tables. This enables users
to always be searching the most current data. Every night we run a
scheduled task to rebuild all of the collections because this
continual add/dropping of data during the day chops up the Verity
collections.
I noticed two thing after creating the new CF8 collections:
Indexings is 2 to 3 times slower that it was in CF8. Searching is 5
to 6 time faster than it was. I'm wondering two things at this
point:
1. Is this a common experience amoung CF6 to CF8 users?
2. Is there some different technique that you have to use to
index collections using CF8 that will improve the indexing speed?
Thank you in advance for your insight in this matter.
Len

No work can be performed by our development group without a complete analysis, project scope, project plan and project budget is developed. I wish I could simple pull CF9 into the shop or the Solr download but that is not possible unless all the technical aspects are provide ahead of time and submitted for approval in the form of a project plan.
I presume you have a computer at home? Why don't use use that? That's what I'm doing, to test your code!
Or do you stop being a developer at 5pm?
Thank you for offering to test Solr for me, it is appreciated.
Attached is a form we use provided by the IRS. When running CFINDEX and then CFSEARCH the document does not appear on the search results.
Thanks for your help.
NP, but there was nothing attached.
Oh, and sorry to take so long to reply... Xmas and all that bullsh!t got in the way.
Adam

About index memory parameter for Oracle text indexes

Hi Experts,
I am on Oracle 11.2.0.3 on Linux and have implemented Oracle Text. I am not an expert in this subject and need help about one issue. I created Oracle Text indexes with default setting. However in an oracle white paper I read that the default setting may not be right. Here is the excerpt from the white paper by Roger Ford:
URL:http://www.oracle.com/technetwork/database/enterprise-edition/index-maintenance-089308.html
"(Part of this white paper below....)
Index Memory                                  As mentioned above, cached $I entries are flushed to disk each time the indexing memory is exhausted. The default index memory at installation is a mere 12MB, which is very low. Users can specify up to 50MB at index creation time, but this is still pretty low.
This would be done by a CREATE INDEX statement something like:
CREATE INDEX myindex ON mytable(mycol) INDEXTYPE IS ctxsys.context PARAMETERS ('index memory 50M');
Allow index memory settings above 50MB, the CTXSYS user must first increase the value of the MAX_INDEX_MEMORY parameter, like this:
begin ctx_adm.set_parameter('max_index_memory', '500M'); end;
The setting for index memory should never be so high as to cause paging, as this will have a serious effect on indexing speed. On smaller dedicated systems, it is sometimes advantageous to temporarily decrease the amount of memory consumed by the Oracle SGA (for example by decreasing DB_CACHE_SIZE and/or SHARED_POOL_SIZE) during the index creation process. Once the index has been created, the SGA size can be increased again to improve query performance."
(End here from the white paper excerpt)
My question is:
1) To apply this procedure (ctx_adm.set_parameter) required me to login as CTXSYS user. Is that right? or can it be avoided and be done from the application schema? This user CTXSYS is locked by default and I had to unlock it. Is that ok to do in production?
2) What is the value that I should use for the max_index_memory should it be 500 mb - my SGA is 2 GB in Dev/ QA and 3GB in production. Also in the index creation what is the value I should set for index memory parameter - I had left that at default but how should I change now? Should it be 50MB as shown in example above?
3) The white paper also refer to rebuilding an index at some interval like once in a month:   ALTER INDEX DR$index_name$X REBUILD ONLINE;
--Is this correct advice? i would like to ask the experts once before doing that. We are on Oracle 11g and the white paper was written in 2003.
Basically while I read the paper, I am still not very clear on several aspects and need help to understand this.
Thanks,
OrauserN

Perhaps it's time I updated that paper
1. To change max_index_memory you must be a DBA user OR ctxsys. As you say, the ctxsys account is locked by default. It's usually easiest to log in as a DBA and run something like
exec ctxsys.ctx_adm.set_parameter('MAX_INDEX_MEMORY', '10G')
2. Index memory is allocated from PGA memory, not SGA memory. So the size of SGA is not relevant. If you use too high a setting your index build may fail with an error saying you have exceeded PGA_AGGREGATE_LIMIT. Of course, you can increase that parameter if necessary. Also be aware that when indexing in parallel, each parallel process will allocated up to the index memory setting.
What should it be set to? It's really a "safety" setting to prevent users grabbing too much machine memory when creating indexes. If you don't have ad-hoc users, then just set it as high as you need. In 10.1 it was limited to just under 500M, in 10.2 you can set it to any value.
The actual amount of memory used is not governed by this parameter, but by the MEMORY setting in the parameters clause of the CREATE INDEX statement. eg:
create index fooindex on foo(bar) indextype is ctxsys.context parameters ('memory 1G')
What's a good number to use for memory? Somewhere in the region of 100M to 200M is usually good.
3. No - that's out of date. To optimize your index use CTX_DDL.OPTIMIZE_INDEX. You can do that in FULL mode daily or weekly, and REBUILD mode perhaps once a month.

Problems indexing 30M documents

Platform: Sun 4800, 12 CPU, Solaris 9, 48 Gb RAM
Oracle Version: 10.1.04
Database Character Set: UTF-8
SGA MAX SIZE: 24 Gb
hi,
Our database contains a mix of image files and plain text documents in 30 different languages (approximately 30 million rows). When we try to index the documents (using IGNORE in the format column to skip the rows containing images), the indexing either bombs out or hangs indefinitely.
When I first started working on the problem, there were rows in the ctx_user_index_errors table which didn't really give any good indication of what the problem was. I created a new table containing just these rows and was able to index them with no problems using the same set of preferences and the same indexing script. At that time, they were using just the BASIC_LEXER.
We created a MULTI_LEXER preference and added sub-lexers when lexers existed for the specified language, using the BASIC_LEXER as the default. When we tried to create the index using a parallel setting of 6, the indexing failed after 2 hours, and we got the following error codes: ORA-29855, ORA-20000, DRG-50853, DRG-50857, ORA-01002, and ORA-06512. We then tried to create the index without parallel slaves, and it failed after 3 hours with an end of file on communication channel error.
Thinking perhaps that it was the MULTI_LEXER that was causing the problem (because the data is converted to UTF-8 by an external program, and the character set and language ID is not always 100% accurate), we tried to create the index using just the BASIC_LEXER (knowing that we wouldn't get good query results on our CJK data). We set the parallel slaves to 6, and it ran for more than 24 hours, with each slave indexing about 4 million documents (according to the logs) before just hanging. The index state in ctx_user_indexes is POPULATE, and in user_indexes is INPROGRESS. There were three sessions active, 2 locked, and 1 blocking. When we were finally able to ctl-C out of the create index command, SQL*Plus core dumped. It takes hours to drop the index as well.
We're at a loss to figure out what to try next. This database has been offline for about a week now, and this is becoming critical. In my experience, once the index gets hung in POPULATE, there's no way to get it out other than dropping and recreating the index. I know that Text should be able to handle this volume of data, and the machine is certainly capable of handling the load. It could be that the MULTI_LEXER is choking on improperly identified languages, or that there are errors in the UTF-8 conversion, but it also has problems when we use BASIC_LEXER. It could be a problem indexing in parallel, but it also dies when we don't use parallel. We did get errors early on that the parallel query server died unexpectedly, but we increased the PARALLEL_EXECUTION_MESSAGE_SIZE to 65536, and that stopped the parallel errors (and got us to the point of failure quicker).
Any help you can provide would be greatly appreciated.
thanks,
Tarisa.

I'm working with the OP on this. Here is the table definition and
the index creation with all the multi_lexer prefs. The table
is hash partitioned, and we know the index cannot be
local because of this, so it is a global domain index.
Perhaps of interest, we have changed PARALLEL_EXECUTION_MESSAGE_SIZE
from the default up to 32K. This made a huge difference in indexing speed, but
so far has just helped us get to the point of failure faster.
CREATE TABLE m (
DOC_ID NUMBER,
CID NUMBER,
DATA CLOB,
TYPE_ID NUMBER(10),
FMT VARCHAR2(10),
ISO_LANG CHAR(3)
LOB (data) store as meta_lob_segment
( ENABLE STORAGE IN ROW
PCTVERSION 0
NOCACHE
NOLOGGING
STORAGE (INITIAL 32K NEXT 32K)
CHUNK 16K )
PARTITION BY HASH ( doc_id )
PARTITIONS 6
STORE IN (ts1, ts2, ts3, ts4, ts5, ts6),
pctfree 20
initrans 12
maxtrans 255
tablespace ts
ALTER TABLE m
ADD (CONSTRAINT pk_m_c PRIMARY KEY (doc_id, cid)
USING index
pctfree 20
initrans 12
maxtrans 255
tablespace ts
nologging )
BEGIN
ctx_ddl.create_preference('english_lexer', 'basic_lexer');
ctx_ddl.set_attribute('english_lexer','index_themes','false');
ctx_ddl.set_attribute('english_lexer','index_text','true');
ctx_ddl.create_preference('japanese_lexer','japanese_lexer');
ctx_ddl.create_preference('chinese_lexer','chinese_lexer');
ctx_ddl.create_preference('korean_lexer','korean_morph_lexer');
ctx_ddl.create_preference('german_lexer','basic_lexer');
ctx_ddl.set_attribute('german_lexer','index_themes','false');
ctx_ddl.set_attribute('german_lexer','index_text','true');
ctx_ddl.set_attribute('german_lexer','composite','german');
ctx_ddl.set_attribute('german_lexer','mixed_case','yes');
ctx_ddl.set_attribute('german_lexer','alternate_spelling','german');
ctx_ddl.create_preference('french_lexer','basic_lexer');
ctx_ddl.set_attribute('french_lexer','index_text','true');
ctx_ddl.set_attribute('french_lexer','index_themes','false');
ctx_ddl.set_attribute('french_lexer','base_letter','yes');
ctx_ddl.create_preference('spanish_lexer','basic_lexer');
ctx_ddl.set_attribute('spanish_lexer','index_text','true');
ctx_ddl.set_attribute('spanish_lexer','index_themes','false');
ctx_ddl.set_attribute('spanish_lexer','base_letter','yes');
ctx_ddl.create_preferences('global_lexer','multi_lexer');
ctx_ddl.add_sub_lexer('global_lexer','default','english_lexer');
ctx_ddl.add_sub_lexer('global_lexer','english','english_lexer','eng');
ctx_ddl.add_sub_lexer('global_lexer','gernan','german_lexer','ger');
ctx_ddl.add_sub_lexer('global_lexer','french','french_lexer','fra');
ctx_ddl.add_sub_lexer('global_lexer','spanish','spanish_lexer','spa');
ctx_ddl.add_sub_lexer('global_lexer','japanese','japanese_lexer','jpn');
ctx_ddl.add_sub_lexer('global_lexer','korean','korean_lexer','kor');
ctx_ddl.add_sub_lexer('global_lexer','simplified chinese','chinese_lexer','zho');
ctx_ddl.add_sub_lexer('global_lexer','traditional chinese','chinese_lexer');
END;
BEGIN
ctx_output.start_log('m_ctx_data.log');
END;
CREATE INDEX m_ctx_data ON m (data)
INDEXTYPE IS ctxsys.context
PARAMETERS ('memory 1G
lexer global_lexer
format column fmt
language column iso_lang
sync (every "sysdate+1")' )
PARALLEL 6
BEGIN
ctx_output.end_log();
END;
/

Indexing Speed

Similar Messages

Maybe you are looking for