Efficient Indexing

Hi,
I have written an application where users can search for keywords within a set of documents in a directory. Users can also upload documents. My current implementation is such that whenever a user uploads a document, I am reindexing (<cfindex action="update" ... >) and this is takes a lot of time and user has to wait a while before the action is completed.
Can I make this re-indexing happen at the background OR is there any other way to not make the user wait a long time before his form is submitted?
Thanks
Nikhil

nikhil20101 wrote:
Can I make this re-indexing happen at the background OR is there any other way to not make the user wait a long time before his form is submitted?
Both!
You could use a schedule task to do the indexing on a 'ahem' schedule. Once a day, Once an hour or whatever might work. This, of course, means that new data will not be indexed until the next scheduled task is executed.
You could fire off some type of asynchronous process that does the indexing, while not involving the thread that is processing the form request. Thus that request can be allowed to complete and the user can go on their way while the indexing is occurring. This can be done in ColdFusion with either Gateways or the <cfthread...> tags, whichever meets your needs better. With old school servers it can also be done with the <cfhttp...> tag utilizing a timeout parameter of zero. But that use has largely been supplanted by the <cfthread...> tag. Just becareful that multiple users firing off multiple tasks to index your data at the same time does not cause a problem with this solution.

Similar Messages

Rebuild index vs Analyze index

Hi All,
I am realy confused about rebuilding index versus Analyzing index.
Could anyone plz help me out what is the diffrence between them.
How to Perform analyze of indexes and Rebuld of Indexes for both Oracle 9i and 10g databases.
Thanks a lot

CKPT wrote:
You can see the posts of experts by jonathan
I am realy confused about rebuilding index versus Analyzing index. tell us you are getting confused why we need to ananlyze before reubild index? if so
if index analyzed the whole statistics of index will be gathered.... then you can check what is the hieght of the index.. according to the height of the index you need to take step is index need to be really rebuild or not...
lets see furhter posts from experts if not clear..Thanks OK, so you determine the height of an index is (say) 4. What then ? If you decide to rebuild the index and the index remains at a height of 4, what now ? Was it really worth doing and do you rebuild it again as the index height is still 4 and still within your index rebuild criteria ? At what point do you decide that rebuilding the index just because it has a height of 4 is a total waste of time in this case ?
OK, so you determine the index only has a height of (say) 3, does that mean you don't rebuild the index ? But what if by rebuilding the index, the index now reduces to a height of just 1 ? Perhaps not rebuilding the index even though it has just a height of 3 and doesn't currently meet your index rebuild criteria is totally the wrong thing to do and a rebuild would result in a significantly leaner and more efficient index structure ?
So what if it's pointless rebuilding an index with a height of 4 but another index with a height of 3 is a perfect candidate to be rebuilt ?
Perhaps knowing just the height of an index leaves one totally clueless after all as to whether the index might benefit from an index rebuild ...
Cheers
Richard Foote
http://richardfoote.wordpress.com/

Slow Indexing

Hi all,
I have to index a repository of 20000 documents. It took 5 hours to index 1000 documents. if it continues like this it will take 100 hrs to complete the indexing.
What could be the reason for this low performance? We are having the portal and Trex on different servers.
In the portal server's trace file I can see the following message:
#1.5 #001321CCE41F00840000004F000022F80004595C881A43B2#1224153862312#com.sap.portal.prt.runtime#sap.com/irj#com.sap.portal.prt.runtime#index_service#189##n/a##7eaeefb09b6c11ddbc3d001321cce41f#SAPEngine_Application_Threadimpl:3_37##0#0#Error##Java###04:14_16/10/08_0315_21412850
EXCEPTION
#1#com.sap.engine.services.servlets_jsp.server.exceptions.WebIllegalStateException: The stream has already been taken by method getOutputStream().
In the TREX admin's trace I can see several messages like:
5752 2008-10-16 19:23:47.312 e preprocessor Preprocessor.cpp(00941) : HTTP-GET failed for URL http:// <file name>
with Errorcode -5 , but HTTP-HEAD worked, trying again
5752 2008-10-16 19:23:47.421 e HTTPData Preprocessor.cpp(04944) : HTTPGET: Stop retries after 5 rounds, skipping
5752 2008-10-16 19:23:47.421 e preprocessor Preprocessor.cpp(00951) : HTTPHEAD failed for URL http:// : <file name>
Errorcode -5 , Message Reader::readHeaderSkip100 failed, url=http://<file name>
The TREX server has 16GB RAM.
What can be done to improve the performance?
Thanks and Regards,
Shyam.

Hi Shyam,
Not sure if it could help you but the below guide makes recommendations for configuring search and classification (using TREX 6.1) for efficient indexing. It covers the following topics: fast initial indexing of large data sets; fast updating of indexes; and fast index replication in distributed TREX systems.
1) How to Configure TREX 6.1 for Efficient Indexing
https://www.sdn.sap.com/irj/sdn/howtoguides?rid=/library/uuid/1545e1bf-0d01-0010-a5ab-f80e574423bf
Hope that helps.
Ray

Question for Database EXPERTS about implementing an Embedded Database

One of my current projects is to implement a Embedded pure java database with a smaill subset of SQL. Since I estimate it will take me at least 400 man hours (based on my intial unfinished attempt) to build something robust and usable, I thought I'd ask my fellow java folks for suggestions. The SQL and parsing is no problem, but the best way to implement the records structure is. My first approach would be to represent each table in a seperate file (file defined as a standard OS file). Each record would have a fixed byte length, say 256.
A seperate definition file would containt the column names, type, and maximum byte length for the column. So the data file would be a repetitive list of 256 byte rows, some arbitrary byte value could be used as a 'filler' for the part of a row not being used. When a row is deleted, the entire row is merely overwritten with the filler. The next inserted record would then take its place.
String types, numeric types, and small strings would be written directly to the data file at their appropriate byte offset from the start of its respective byte row position.
Complex or large data types like multimedia and memos would merely have a filename string pointing to another file. Things like security and encryption (hopefully) could be added on later. My main concerns are about how to implement an efficient indexing algorithm, and how to minimize the space required for each table, while maintaining simplicity.
Furthermore I currently plan to use the most basic and common classes to implement this (Java I/O API, and of course some of the SQL API to implement JDBC). Any advice on other APIs that may benefit this? Any red flags come up yet?

One of my current projects is to implement a Embedded
pure java database with a smaill subset of SQL. [ ... ]
fellow java folks for suggestions. The SQL and parsing
is no problem, but the best way to implement the
records structure is.I once (1981) did it this way: a database file consists of consecutive pages, all of equal size; say, 4KB per page. Every page consists of consecutive cells; every cell can store a short, so 2K cells can be stored in a page.
Cells in a page serve two purposes. From the bottom to the top, the cells represent offsets in the page where the records are stored and from the top to the bottom the cells are used for the actual data. The first cell of the page contains the offset to the first free cell in the page; the second cell contains the offset to the first occupied (data) cell in the page. Any record offset cell containing zero (0) is a free record offset cell.
This scenario implies that the maximum record length is 4KB - 4*3 bytes.
A record takes up one offset cell, as many cells that are needed for the acual data, preceeded by a cell containing the length of the record. The offset cell contains the cell number where the record starts (the length cell).
Records are addressed by their page number and by their cell offset number [PR]. The R number points to the cell containing the actual offset of the record. The [PR] number doesn't change during the lifetime of the record. (this is important).
Inserting a record is easy: find a page with enough room and store the record in that page. If no page is available, add another page to the database file. Deleting a record is simple too: delete the record, zero out the offset cell and compact the data part of the page. Updating a record is a bit more complicated. If the record size shrinks, stuff is easy. If the record size grows, and it doesn't fit in that page anymore, another page has to be found. The old data is removed and a small 'indirection record' is inserted instead. This new record contains the actual [PR] number where the new (larger) record is stored. The original R cell has a few more bits available. 2K different numbers take up 11 bits. The cell itself can store 16 bits, so one bit can be used, indicating that it's not pointing to an actual data record, but its offset is pointing to an 'indirection' record instead. Some careful fiddling takes care that subsequente updates of a record keep this indirection stuff limited to just one indirection (this is true, believe me).
Pages are loaded in core, and swapped out whenever necessary. Every loaded page has a page number and a 'dirty bit' attached to it. Whenever this bit is set, the page has to be written back to disk if it needs to be flushed from memory (due to memory exhaustion etc.) otherwise, the memory can simply be released. A simple LRU (Least Recently Used) list does wonders here.
B-tree records can be stored as ordinary records too; this implies that the indexes of the tables (a bunch of records) are ordinary records too. Note that these spare 5 bits (one taken indicating an 'indirect' record) can do wonders here. As a matter of fact, a bit indicating that its cell is pointing to a B-tree node came in very handy. And still three bits free for other miracles! ;-)
I better quit before I turn another message into a novel again.
kind regards,
Jos

Looking for the latest Performance documents

Hello,
Latley we installed SP14.
I am looking for SAP's latest documents which involves performence
issues such as: Tuning, Fine Tuning, Troubleshooting guide ect.
My current documents are relevant to SP2 and I would like to read the latest ones at this subject.
Can someone please attach here the relevant links? Is there a central place where I can download these documents?
Roy.

Ho Roy,
> Latley we installed SP14
Congrats
> Can someone please attach here the relevant links
The tuning guides are to be found under http://service.sap.com/nw04 -- How-to Guides -- Portal, KM and Collaboration; for example:
TREX: How To Configure Efficient Indexing
EP: Finetuning Performance of Portal Platform
KM: Tuning the Performance of Knowledge Management
Hope it helps
Detlev

Poor performance of Report Writer reports (Special Ledger Library)

Greetings - We are running into problems with poor performance of reports that are written with the SAP Report Writer. The problem appears to be caused when SAP is using the primary-key index in our Special Purpose ledger (where the reports are generated). The index contains object fields that cannot be added to the report library (COBJNR, SOBJNR, ROBJNR). We have created alternate indices, but they are not being picked up with the Report Writer reports.
Are there any configurable or technical settings that we can work with in order to force the use of a specific index for a report? It seems logical that SAP would find the most efficient index to use, but with the reports that we are looking at, this does not appear to be the case.
Any help that can be offered will be greatly appreciated...We are currently using version 4.6C, but are planning an upgrade to ECC 6.0 later this year.
Thanks in advance -

Arjun,
Where / which files contains these parameters we cannot find them all ???
Tomcat - Java _properties and try again ( You can tune below value as per your system memory)
-XX:PermSize=256m
-XX:MaxPermSize=256m
-XX:NewSize=171m
-XX:MaxNewSize=171m
-XX:SurvivorRatio=2
-XX:TargetSurvivorRatio=90
-XX:+DisableExplicitGC
-XX:+UseTLAB
As a general update it looks like we need to use the Monitoring tools that are installed by default, we are now in the process of installing the database etc
Cheers

Search Funtion

Dear Sir,
I need to do Search Function and specific some folder for search in EP6 SP14,
How should we do?
Thanks
Vimol

Dear Vimol,
Please refer to http://help.sap.com/saphelp_nw2004s/helpdata/en/40/83505303bd5616e10000000a114cbd/frameset.htm. This talks about the configuration of TREX and anything and everything you would like to know about TREX.
https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/1545e1bf-0d01-0010-a5ab-f80e574423bf - How ToConfigure TREX 6.1 for Efficient Indexing
https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/f0d3be0e-0401-0010-b780-ff7e4e103ea0 - How Toenable semantic search/search for synonyms
https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/d17d18d0-0a01-0010-0db1-f96a947e38a0 - How-to Guide: Searchable HTML Tags
https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/77f6aa90-0201-0010-b681-e013540efb3b - How toset up Web Repository and Crawling it for Indexing
You may also refer to : TREX and its application on SDN to know about the different kinds of search.
If you want to create your own search using KM APIs then refer : How to write a Search Application using the KM Indexmanagement API for TREX
Regards,
Sunil

TREX Documentation

hey people!
Does any one sit here on some TREX material? I really need a configuration/administration guide.
I have search now for several days and all i can find is installguides and efficient indexing and search.
What im looking for is HOW to do it? Like making index'es in CRM, reindex, setting schedules, what to index and other usefull docs.
OR you have done this before, please help me out
I have installed TREX 7.0, ContentServer and CRM 4.0 and the communication works fine. I can search for docs on CRM, from TREX to CRM and from the Portal to CRM. But i'm experiencing some problems with different searchresults and index isnt updated when uploading new documents.
Best Regards
Kristoffer Engh

Thank you Amit, that whould be great!
My email is: [email protected]
Im looking for how i can administrate index in trex (create, edit, reindex in a nonportal environment)
But im glad for all the documentation i can get
I got this now, but it doesnt contain what im looking for:
How To Configure an alternative document access URL for TREX.pdf
How To Configure Efficient Indexing.pdf
How to enable semantic search.pdf
TREX60SP1_InstNonPort.pdf
TREX61_DistributedSystems.pdf
TREX61_SP17_Install_Guide.pdf
TREX Recommendations.pdf
TREX Cluster Configuration and Monitoring.pdf
Best Regards
Kristoffer Engh
Message was edited by: Kristoffer Engh
Message was edited by: Kristoffer Engh

IRM LDAP Read timeout

Good day everyone!
We are facing problem when LDAP read procedure takes too long
Environment is: IRM + OVD as user repository (many AD domains behind it)
When we search for users in IRM we get exception:
<05.07.2011 14:08:36 MSD> <Warning> <oracle.irm.web> <IRM-03012> <Исключение при извлечении сведений о пользователе.
oracle.irm.jps.exception.JpsIdentityStoreFailureException: oracle.security.idm.OperationFailureException: javax.naming.NamingException: [LDAP: error code 1 - LDAP Error 1 : LDAP response read timed out, timeout used:15000ms.]; remaining name 'dc=ovd'
 at oracle.irm.jps.util.Store.searchIdentities(Store.java:404)
 at oracle.irm.web.services.IrmCommonUtil.searchByUserOrGroup(IrmCommonUtil.java:131)
 at oracle.irm.web.backing.common.CommonUserSearch.searchAction(CommonUserSearch.java:169)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at com.sun.el.parser.AstValue.invoke(Unknown Source)
 at com.sun.el.MethodExpressionImpl.invoke(Unknown Source)
 at org.apache.myfaces.trinidad.component.UIXComponentBase.broadcastToMethodExpression(UIXComponentBase.java:1300)
 at oracle.adf.view.rich.component.UIXQuery.broadcast(UIXQuery.java:116)
 at oracle.adf.view.rich.component.fragment.UIXRegion.broadcast(UIXRegion.java:148)
 at oracle.adf.view.rich.component.fragment.UIXRegion.broadcast(UIXRegion.java:148)
 at oracle.adf.view.rich.component.fragment.ContextSwitchingComponent$1.run(ContextSwitchingComponent.java:92)
 at oracle.adf.view.rich.component.fragment.ContextSwitchingComponent._processPhase(ContextSwitchingComponent.java:361)
 at oracle.adf.view.rich.component.fragment.ContextSwitchingComponent.broadcast(ContextSwitchingComponent.java:96)
 at oracle.adf.view.rich.component.fragment.UIXInclude.broadcast(UIXInclude.java:102)
 at oracle.adf.view.rich.component.fragment.ContextSwitchingComponent$1.run(ContextSwitchingComponent.java:92)
 at oracle.adf.view.rich.component.fragment.ContextSwitchingComponent._processPhase(ContextSwitchingComponent.java:361)
 at oracle.adf.view.rich.component.fragment.ContextSwitchingComponent.broadcast(ContextSwitchingComponent.java:96)
 at oracle.adf.view.rich.component.fragment.UIXInclude.broadcast(UIXInclude.java:96)
 at oracle.adfinternal.view.faces.lifecycle.LifecycleImpl.broadcastEvents(LifecycleImpl.java:902)
 at oracle.adfinternal.view.faces.lifecycle.LifecycleImpl._executePhase(LifecycleImpl.java:313)
 at oracle.adfinternal.view.faces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:186)
 at javax.faces.webapp.FacesServlet.service(FacesServlet.java:265)
 at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
 at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
 at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:300)
 at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:26)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
 at oracle.adfinternal.view.faces.webapp.rich.RegistrationFilter.doFilter(RegistrationFilter.java:106)
 at org.apache.myfaces.trinidadinternal.webapp.TrinidadFilterImpl$FilterListChain.doFilter(TrinidadFilterImpl.java:446)
 at oracle.adfinternal.view.faces.activedata.AdsFilter.doFilter(AdsFilter.java:60)
 at org.apache.myfaces.trinidadinternal.webapp.TrinidadFilterImpl$FilterListChain.doFilter(TrinidadFilterImpl.java:446)
 at org.apache.myfaces.trinidadinternal.webapp.TrinidadFilterImpl._doFilterImpl(TrinidadFilterImpl.java:271)
 at org.apache.myfaces.trinidadinternal.webapp.TrinidadFilterImpl.doFilter(TrinidadFilterImpl.java:177)
 at org.apache.myfaces.trinidad.webapp.TrinidadFilter.doFilter(TrinidadFilter.java:92)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
 at oracle.help.web.rich.OHWFilter.doFilter(Unknown Source)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
 at oracle.adf.model.servlet.ADFBindingFilter.doFilter(ADFBindingFilter.java:205)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
 at oracle.security.jps.ee.http.JpsAbsFilter$1.run(JpsAbsFilter.java:111)
 at java.security.AccessController.doPrivileged(Native Method)
 at oracle.security.jps.util.JpsSubject.doAsPrivileged(JpsSubject.java:313)
 at oracle.security.jps.ee.util.JpsPlatformUtil.runJaasMode(JpsPlatformUtil.java:413)
 at oracle.security.jps.ee.http.JpsAbsFilter.runJaasMode(JpsAbsFilter.java:94)
 at oracle.security.jps.ee.http.JpsAbsFilter.doFilter(JpsAbsFilter.java:161)
 at oracle.security.jps.ee.http.JpsFilter.doFilter(JpsFilter.java:71)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
 at oracle.dms.servlet.DMSServletFilter.doFilter(DMSServletFilter.java:136)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
 at weblogic.servlet.internal.RequestEventsFilter.doFilter(RequestEventsFilter.java:27)
 at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
 at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3715)
 at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3681)
 at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
 at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)
 at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2277)
 at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2183)
 at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1454)
 at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
 at weblogic.work.ExecuteThread.run(ExecuteThread.java:178)
is there anyway to increase ldap read timeout? We tried to change OVD Authenticator properties, but it seems that IRM doesn't bother to look at them =(

Hi,
This problem is caused by the type of search IRM makes when searching for users. By default, if you specify 'ran' as a search parameter, then we search for \*ran\* and so Frank would be returned. The problem is that very large LDAP repositories can take a long time to return when performing this kind of search unless they're setup appropriately. So you have two options:
1. Implement Medial (also known as Tuple) Indexing in your LDAP. This builds a larger less efficient index but gives much faster responses on substring searches.
2. As of 11.1.1.3 we have a patch (patch number 10354979) which allows you to refine the type of search to 'begins', 'ends' or 'exact match'. For example, you can do prefix searching. So 'ran' would not find 'Frank' but it would find 'Randolph'. If you are running 11.1.1.3, you need to apply the patch. If you are running a later version you just need to switch on the parameter defined in the patch readme. You cannot apply the patch to 11.1.1.2.1; in this case you have to upgrade to the latest version.
You should refer to KM note 1272153.1 which tells you more about this.
Regards,
Frank.

Vectors VS Arrays VS LinkedLists

Hi
I am basically looking for some info on the advantages or disadvantages of using any of the above. I would like to know how one justifies the use of one over the other.
I would be grateful for any pointers or resources you could point me to.
Cheers

Use arrays if you know the size you will need, you won't be inserting or removing elements and you need highly efficient indexed access to the elements.
Use ArrayList if you don't know how large to make the container, you need efficient indexed access to the elements, you don't need to have efficient insertion or removal of elements from the interior of the container and you don't need thread safety on the individual operations (the most common case).
Use Vector or a synchronized ArrayList is your needs are like those for ArrayList but you need thread safety of the individual operations.
Use LinkedList if you don't need efficient indexed access but you do need efficient insertion and removal of elements from the interior of the container and you don't need thread safety of individual operations (the usual case).
Use the synchronized wrapper of LinkedList if your needs are like those that lead you to LinkedList but you need thread safe individual operations.
I am sure this is documented better someplace else, but I haven't taken the time to check.
Chuck

Problems with Searching documents!

I am having a problem with the ability to search documents in the portal, everytime I run a search I get the message no results found.
I have done the following:
1. Create a File system Repository
2. Created a Crawler Profile
3. Created an Index and assigned the crawler.
4. Wait until it had finished and said their were 13 items.
However when I search their are none!
What am I doing wrong?

Hi Phil,
1. Check for trex server status in sys Admin--> Monitoring --> KM --> Trex Monitor
2. Check for the documents in the Folder on which you are creating index.
3. After you create a index, modify the parameters in sys Admin--> Monitoring --> KM --> Trex Monitor --> Edit Queue Parameters.
4. Check the Queue status in sys Admin--> Monitoring --> KM --> Trex Monitor --> Display queues. Sometimes try to flush the index.
5. If still not working, at times re-index the index or delete and recreate the index.
For efficient indexing see the doc
https://websmp102.sap-ag.de/~sapidb/011000358700000378692005E.PDF
Hope this helps,
thanks,
Praveen
PS. Dont forget to reward points

Indexing for Load efficiency or Query Efficiency

Hello,
I've just come to start querying our warehouse which is currently being developed. Unfortunately, although the tables and indexes were spec'd at the design for query performance, the ETL team loading the table have set up indexes to improve load efficiency.
Is there anyway of placing 2 sets of indexes on a table, one for the load process and one for querying? Or should the queries be done on materialised views?
Any help would be much appreciated.
Regards
James

Just to clarify, there may be a harm in having multiple sets of indexes since index maintenance may consume a significant fraction of the ETL time. If the goal is to optimize the load, you may want to drop and rebuild the indexes that exist for query performance (assuming that queries and loads run during separate windows). Depending on volumes and load characteristics, I could envision scenarios where you could drop the indexes used by consumers, build the indexes for ETL, run the ETL, drop the ETL indexes, and rebuild the consumer indexes, but that would be rather rare.
Materialized views can certainly come into play to satisfy warehouse queries, particular for aggregate data. It wouldn't make much sense to drop indexes on base tables and create them on materialized views that were exact copies of those base tables since that wouldn't save you any processing time. If your MV's have sigificantly fewer rows than the base tables, though, indexes on the aggregates may be significantly cheaper to maintain.
Justin
Distributed Database Consulting, Inc.
http://www.ddbcinc.com/askDDBC

Can I refactor this query to use an index more efficiently?

I have a members table with fields such as id, last name, first name, address, join date, etc.
I have a unique index defined on (last_name, join_date, id).
This query will use the index for a range scan, no sort required since the index will be in order for that range ('Smith'):
SELECT members.*
        FROM members
        WHERE last_name = 'Smith'
        ORDER BY joindate, idIs there any way I can get something like the following to use the index (with no sort) as well:
SELECT members.*
        FROM members
        WHERE last_name like 'S%'
        ORDER BY joindate, idI understand the difficulty is probably; even if it does a range scan on every last name 'S%' (assuming it can?), they're not necessarily in order. Case in point:
Last_Name: JoinDate:
Smith          2/5/2010
Smuckers     1/10/2010An index range scan of 'S%' would return them in the above order, which is not ordered by joindate.
So is there any way I can refactor this (query or index) such that the index can be range scanned (using LIKE 'x%') and return rows in the correct order without performing a sort? Or is that simply not possible?

xaeryan wrote:
I have a members table with fields such as id, last name, first name, address, join date, etc.
I have a unique index defined on (last_name, join_date, id).
This query will use the index for a range scan, no sort required since the index will be in order for that range ('Smith'):
SELECT members.*
FROM members
WHERE last_name = 'Smith'
ORDER BY joindate, idIs there any way I can get something like the following to use the index (with no sort) as well:
SELECT members.*
FROM members
WHERE last_name like 'S%'
ORDER BY joindate, idI understand the difficulty is probably; even if it does a range scan on every last name 'S%' (assuming it can?), they're not necessarily in order. Case in point:
Last_Name: JoinDate:
Smith          2/5/2010
Smuckers     1/10/2010An index range scan of 'S%' would return them in the above order, which is not ordered by joindate.
So is there any way I can refactor this (query or index) such that the index can be range scanned (using LIKE 'x%') and return rows in the correct order without performing a sort? Or is that simply not possible?Come on. Index column order does matter. "LIKE 'x%'" actually is full table scan. The db engine accesses contiguous index entries and then uses the ROWID values in the index to retrieve the table rows.

Index efficiency

Hi,
I have 3 tables A, B and C.
Table A has A1 as 1 column.
Table B has B1 and A1 (referenced from Table A).
Table C has C1 and A1 (referenced from Table A).
All 3 tables are Indexed properly for Primary Key and Foreign Key.
If I want to join B and C then, which query will perform better among the below queries.
Query1:
SELECT C.C1
FROM B, A, C
WHERE B.A1 = A.A1
AND A.A1 = C.A1Query2:
SELECT C.C1
FROM B,C
WHERE B.A1 = C.A1

Write the query in whichever way makes most sense.
In later versions, the optimizer is able to do table and join elimination.
For example:
SQL> create table a
2 (a1 number not null primary key);
Table created.
SQL>
SQL> create table b
2 (b1 number not null primary key
3 ,a1 number references a (a1));
Table created.
SQL>
SQL> create table c
2 (c1 number not null primary key
3 ,a1 number references a (a1));
Table created.
SQL> explain plan for
2 select c.c1
3 from   a,b,c
4 where b.a1 = a.a1
5 and    c.a1 = b.a1
6 and    c.a1 = a.a1;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
Plan hash value: 3136813453
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT   |      |     1 |    39 |     5 (20)| 00:00:01 |
|* 1 | HASH JOIN         |      |     1 |    39 |     5 (20)| 00:00:01 |
|* 2 |   TABLE ACCESS FULL| B    |     1 |    13 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| C    |     1 |    26 |     2   (0)| 00:00:01 |
Predicate Information (identified by operation id):
   1 - access("C"."A1"="B"."A1")
   2 - filter("B"."A1" IS NOT NULL)
Note
   - dynamic sampling used for this statement (level=4)
20 rows selected.
SQL>

How oracle decide whetehr to use index or full scan (statistics)

Hi Guys,
Let say i have a index on a column.
The table and index statistics has been gathered. (without histograms).
Let say i perform a select * from table where a=5;
Oracle will perform a full scan.
But from which statistics it will be able to know indeed most of the column = 5? (histograms not used)
After analyzing, we get the below:
Table Statistics :
(NUM_ROWS)
(BLOCKS)
(EMPTY_BLOCKS)
(AVG_SPACE)
(CHAIN_COUNT)
(AVG_ROW_LEN)
Index Statistics :
(BLEVEL)
(LEAF_BLOCKS)
(DISTINCT_KEYS)
(AVG_LEAF_BLOCKS_PER_KEY)
(AVG_DATA_BLOCKS_PER_KEY)
(CLUSTERING_FACTOR)
thanks
Index Column (A)
======
1
1
2
2
5
5
5
5
5
5

I have prepared some explanation and have not noticed that the topic has been marked as answered.
This my sentence is not completely true.
A column "without histograms" means that the column has only one bucket. More correct: even without histograms there are data in dba_tab_histograms which we can consider as one bucket for whole column. In fact these data are retrieved from hist_head$, not from histgrm$ as usual buckets.
Technically there is no any buckets without gathered histograms.
Let's create a table with skewed data distribution.
SQL> create table t as
2 select least(rownum,3) as val, '*' as pad
3 from dual
4 connect by level <= 1000000;
Table created
SQL> create index idx on t(val);
Index created
SQL> select val, count(*)
2 from t
3 group by val;
 VAL COUNT(*)
 1 1
 2 1
 3 999998So, we have table with very skewed data distribution.
Let's gather statistics without histograms.
SQL> exec dbms_stats.gather_table_stats( user, 'T', estimate_percent => 100, method_opt => 'for all columns size 1', cascade => true);
PL/SQL procedure successfully completed
SQL> select blocks, num_rows from dba_tab_statistics
2 where table_name = 'T';
 BLOCKS NUM_ROWS
 3106 1000000
SQL> select blevel, leaf_blocks, clustering_factor
2 from dba_ind_statistics t
3 where table_name = 'T'
4 and index_name = 'IDX';
 BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
 2 4017 3107
SQL> select column_name,
2 num_distinct,
3 density,
4 num_nulls,
5 low_value,
6 high_value
7 from dba_tab_col_statistics
8 where table_name = 'T'
9 and column_name = 'VAL';
COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS LOW_VALUE HIGH_VALUE
VAL 3 0,33333333 0 C102 C104So, Oracle suggests that values between 1 and 3 (raw C102 and C104) are distributed uniform and the density of the distribution is 0.33.
Let's try to explain plan
SQL> explain plan for
2 select --+ no_cpu_costing
3 *
4 from t
5 where val = 1
6 ;
Explained
SQL> @plan
| Id | Operation | Name | Rows | Cost |
| 0 | SELECT STATEMENT | | 333K| 300 |
|* 1 | TABLE ACCESS FULL| T | 333K| 300 |
Predicate Information (identified by operation id):
 1 - filter("VAL"=1)
Note
 - cpu costing is off (consider enabling it)Below is an excerpt from trace 10053
BASE STATISTICAL INFORMATION
Table Stats::
Table: T Alias: T
 #Rows: 1000000 #Blks: 3106 AvgRowLen: 5.00
Index Stats::
Index: IDX Col#: 1
 LVLS: 2 #LB: 4017 #DK: 3 LB/K: 1339.00 DB/K: 1035.00 CLUF: 3107.00
SINGLE TABLE ACCESS PATH
BEGIN Single Table Cardinality Estimation
Column (#1): VAL(NUMBER)
 AvgLen: 3.00 NDV: 3 Nulls: 0 Density: 0.33333 Min: 1 Max: 3
Table: T Alias: T
 Card: Original: 1000000 Rounded: 333333 Computed: 333333.33 Non Adjusted: 333333.33
END Single Table Cardinality Estimation
Access Path: TableScan
 Cost: 300.00 Resp: 300.00 Degree: 0
 Cost_io: 300.00 Cost_cpu: 0
 Resp_io: 300.00 Resp_cpu: 0
Access Path: index (AllEqRange)
 Index: IDX
 resc_io: 2377.00 resc_cpu: 0
 ix_sel: 0.33333 ix_sel_with_filters: 0.33333
 Cost: 2377.00 Resp: 2377.00 Degree: 1
Best:: AccessPath: TableScan
 Cost: 300.00 Degree: 1 Resp: 300.00 Card: 333333.33 Bytes: 0Cost of FTS here is 300 and cost of Index Range Scan here is 2377.
I have disabled cpu costing, so selectivity does not affect the cost of FTS.
cost of Index Range Scan is calculated as
blevel + (leaf_blocks * selectivity + clustering_factor * selecivity) = 2 + (4017*0.33333 + 3107*0.33333) = 2377.
Oracle considers that it has to read 2 root/branch blocks of the index, 1339 leaf blocks of the index and 1036 blocks of the table.
Pay attention that selectivity is the major component of the cost of the Index Range Scan.
Let's try to gather histograms:
SQL> exec dbms_stats.gather_table_stats( user, 'T', estimate_percent => 100, method_opt => 'for columns val size 3', cascade => true);
PL/SQL procedure successfully completedIf you look at dba_tab_histograms you will see following
SQL> select endpoint_value,
2 endpoint_number
3 from dba_tab_histograms
4 where table_name = 'T'
5 and column_name = 'VAL'
6 ;
ENDPOINT_VALUE ENDPOINT_NUMBER
 1 1
 2 2
 3 1000000ENDPOINT_VALUE is the column value (in number for any type of data) and ENDPOINT_NUMBER is cumulative number of rows.
Number of rows for any ENDPOINT_VALUE = ENDPOINT_NUMBER for this ENDPOINT_VALUE - ENDPOINT_NUMBER for the previous ENDPOINT_VALUE.
explain plan and 10053 trace of the same query:
| Id | Operation | Name | Rows | Cost |
| 0 | SELECT STATEMENT | | 1 | 4 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 4 |
|* 2 | INDEX RANGE SCAN | IDX | 1 | 3 |
Predicate Information (identified by operation id):
 2 - access("VAL"=1)
Note
 - cpu costing is off (consider enabling it)
BASE STATISTICAL INFORMATION
Table Stats::
Table: T Alias: T
 #Rows: 1000000 #Blks: 3106 AvgRowLen: 5.00
Index Stats::
Index: IDX Col#: 1
 LVLS: 2 #LB: 4017 #DK: 3 LB/K: 1339.00 DB/K: 1035.00 CLUF: 3107.00
SINGLE TABLE ACCESS PATH
BEGIN Single Table Cardinality Estimation
Column (#1): VAL(NUMBER)
 AvgLen: 3.00 NDV: 3 Nulls: 0 Density: 5.0000e-07 Min: 1 Max: 3
 Histogram: Freq #Bkts: 3 UncompBkts: 1000000 EndPtVals: 3
Table: T Alias: T
 Card: Original: 1000000 Rounded: 1 Computed: 1.00 Non Adjusted: 1.00
END Single Table Cardinality Estimation
Access Path: TableScan
 Cost: 300.00 Resp: 300.00 Degree: 0
 Cost_io: 300.00 Cost_cpu: 0
 Resp_io: 300.00 Resp_cpu: 0
Access Path: index (AllEqRange)
 Index: IDX
 resc_io: 4.00 resc_cpu: 0
 ix_sel: 1.0000e-06 ix_sel_with_filters: 1.0000e-06
 Cost: 4.00 Resp: 4.00 Degree: 1
Best:: AccessPath: IndexRange Index: IDX
 Cost: 4.00 Degree: 1 Resp: 4.00 Card: 1.00 Bytes: 0Pay attention on selectivity, ix_sel: 1.0000e-06
Cost of the FTS is still the same = 300,
but cost of the Index Range Scan is 4 now: 2 root/branch blocks + 1 leaf block + 1 table block.
Thus, conclusion: histograms allows to calculate selectivity more accurate. The aim is to have more efficient execution plans.
Alexander Anokhin
http://alexanderanokhin.wordpress.com/

Efficient Indexing

Similar Messages

Maybe you are looking for