Efficient Indexing

Hi,
I have written an application where users can search for keywords within a set of documents in a directory. Users can also upload documents. My current implementation is such that whenever a user uploads a document, I am reindexing (<cfindex action="update" ... >) and this is takes a lot of time and user has to wait a while before the action is completed.
Can I make this re-indexing happen at the background OR is there any other way to not make the user wait a long time before his form is submitted?
Thanks
Nikhil

nikhil20101 wrote:
 Can I make this re-indexing happen at the background OR is there any other way to not make the user wait a long time before his form is submitted?
Both!
You could use a schedule task to do the indexing on a 'ahem' schedule.  Once a day, Once an hour or whatever might work.  This, of course, means that new data will not be indexed until the next scheduled task is executed.
You could fire off some type of asynchronous process that does the indexing, while not involving the thread that is processing the form request.  Thus that request can be allowed to complete and the user can go on their way while the indexing is occurring.  This can be done in ColdFusion with either Gateways or the <cfthread...> tags, whichever meets your needs better.  With old school servers it can also be done with the <cfhttp...> tag utilizing a timeout parameter of zero.  But that use has largely been supplanted by the <cfthread...> tag.  Just becareful that multiple users firing off multiple tasks to index your data at the same time does not cause a problem with this solution.

Similar Messages

  • Rebuild index vs Analyze index

    Hi All,
    I am realy confused about rebuilding index versus Analyzing index.
    Could anyone plz help me out what is the diffrence between them.
    How to Perform analyze of indexes and Rebuld of Indexes for both Oracle 9i and 10g databases.
    Thanks a lot

    CKPT wrote:
    You can see the posts of experts by jonathan
    I am realy confused about rebuilding index versus Analyzing index. tell us you are getting confused why we need to ananlyze before reubild index? if so
    if index analyzed the whole statistics of index will be gathered.... then you can check what is the hieght of the index.. according to the height of the index you need to take step is index need to be really rebuild or not...
    lets see furhter posts from experts if not clear..Thanks OK, so you determine the height of an index is (say) 4. What then ? If you decide to rebuild the index and the index remains at a height of 4, what now ? Was it really worth doing and do you rebuild it again as the index height is still 4 and still within your index rebuild criteria ? At what point do you decide that rebuilding the index just because it has a height of 4 is a total waste of time in this case ?
    OK, so you determine the index only has a height of (say) 3, does that mean you don't rebuild the index ? But what if by rebuilding the index, the index now reduces to a height of just 1 ? Perhaps not rebuilding the index even though it has just a height of 3 and doesn't currently meet your index rebuild criteria is totally the wrong thing to do and a rebuild would result in a significantly leaner and more efficient index structure ?
    So what if it's pointless rebuilding an index with a height of 4 but another index with a height of 3 is a perfect candidate to be rebuilt ?
    Perhaps knowing just the height of an index leaves one totally clueless after all as to whether the index might benefit from an index rebuild ...
    Cheers
    Richard Foote
    http://richardfoote.wordpress.com/

  • Slow Indexing

    Hi all,
    I have to index a repository of 20000 documents. It took 5 hours to index 1000 documents. if it continues like this it will take 100 hrs to complete the indexing.
    What could be the reason for this low performance? We are having the portal and Trex on different servers.
    In the portal server's trace file I can see the following message:
    #1.5 #001321CCE41F00840000004F000022F80004595C881A43B2#1224153862312#com.sap.portal.prt.runtime#sap.com/irj#com.sap.portal.prt.runtime#index_service#189##n/a##7eaeefb09b6c11ddbc3d001321cce41f#SAPEngine_Application_Threadimpl:3_37##0#0#Error##Java###04:14_16/10/08_0315_21412850
    EXCEPTION
    #1#com.sap.engine.services.servlets_jsp.server.exceptions.WebIllegalStateException: The stream has already been taken by method getOutputStream().
    In the TREX admin's trace I can see several messages like:
    5752 2008-10-16 19:23:47.312 e preprocessor Preprocessor.cpp(00941) : HTTP-GET failed for URL http:// <file name>
    with Errorcode -5 , but HTTP-HEAD worked, trying again
    5752 2008-10-16 19:23:47.421 e HTTPData Preprocessor.cpp(04944) : HTTPGET: Stop retries after 5 rounds, skipping
    5752 2008-10-16 19:23:47.421 e preprocessor Preprocessor.cpp(00951) : HTTPHEAD failed for URL http:// : <file name>
    Errorcode -5 , Message Reader::readHeaderSkip100 failed, url=http://<file name>
    The TREX server has 16GB RAM.
    What can be done to improve the performance?
    Thanks and Regards,
    Shyam.

    Hi Shyam,
    Not sure if it could help you but the below guide makes recommendations for configuring search and classification (using TREX 6.1) for efficient indexing. It covers the following topics: fast initial indexing of large data sets; fast updating of indexes; and fast index replication in distributed TREX systems.
    1) How to Configure TREX 6.1 for Efficient Indexing  
    https://www.sdn.sap.com/irj/sdn/howtoguides?rid=/library/uuid/1545e1bf-0d01-0010-a5ab-f80e574423bf
    Hope that helps.
    Ray

  • Question for Database EXPERTS about implementing an Embedded Database

    One of my current projects is to implement a Embedded pure java database with a smaill subset of SQL. Since I estimate it will take me at least 400 man hours (based on my intial unfinished attempt) to build something robust and usable, I thought I'd ask my fellow java folks for suggestions. The SQL and parsing is no problem, but the best way to implement the records structure is. My first approach would be to represent each table in a seperate file (file defined as a standard OS file). Each record would have a fixed byte length, say 256.
    A seperate definition file would containt the column names, type, and maximum byte length for the column. So the data file would be a repetitive list of 256 byte rows, some arbitrary byte value could be used as a 'filler' for the part of a row not being used. When a row is deleted, the entire row is merely overwritten with the filler. The next inserted record would then take its place.
    String types, numeric types, and small strings would be written directly to the data file at their appropriate byte offset from the start of its respective byte row position.
    Complex or large data types like multimedia and memos would merely have a filename string pointing to another file. Things like security and encryption (hopefully) could be added on later. My main concerns are about how to implement an efficient indexing algorithm, and how to minimize the space required for each table, while maintaining simplicity.
    Furthermore I currently plan to use the most basic and common classes to implement this (Java I/O API, and of course some of the SQL API to implement JDBC). Any advice on other APIs that may benefit this? Any red flags come up yet?

    One of my current projects is to implement a Embedded
    pure java database with a smaill subset of SQL. [ ... ]
    fellow java folks for suggestions. The SQL and parsing
    is no problem, but the best way to implement the
    records structure is.I once (1981) did it this way: a database file consists of consecutive pages, all of equal size; say, 4KB per page. Every page consists of consecutive cells; every cell can store a short, so 2K cells can be stored in a page.
    Cells in a page serve two purposes. From the bottom to the top, the cells represent offsets in the page where the records are stored and from the top to the bottom the cells are used for the actual data. The first cell of the page contains the offset to the first free cell in the page; the second cell contains the offset to the first occupied (data) cell in the page. Any record offset cell containing zero (0) is a free record offset cell.
    This scenario implies that the maximum record length is 4KB - 4*3 bytes.
    A record takes up one offset cell, as many cells that are needed for the acual data, preceeded by a cell containing the length of the record. The offset cell contains the cell number where the record starts (the length cell).
    Records are addressed by their page number and by their cell offset number [PR]. The R number points to the cell containing the actual offset of the record. The [PR] number doesn't change during the lifetime of the record. (this is important).
    Inserting a record is easy: find a page with enough room and store the record in that page. If no page is available, add another page to the database file. Deleting a record is simple too: delete the record, zero out the offset cell and compact the data part of the page. Updating a record is a bit more complicated. If the record size shrinks, stuff is easy. If the record size grows, and it doesn't fit in that page anymore, another page has to be found. The old data is removed and a small 'indirection record' is inserted instead. This new record contains the actual [PR] number where the new (larger) record is stored. The original R cell has a few more bits available. 2K different numbers take up 11 bits. The cell itself can store 16 bits, so one bit can be used, indicating that it's not pointing to an actual data record, but its offset is pointing to an 'indirection' record instead. Some careful fiddling takes care that subsequente updates of a record keep this indirection stuff limited to just one indirection (this is true, believe me).
    Pages are loaded in core, and swapped out whenever necessary. Every loaded page has a page number and a 'dirty bit' attached to it. Whenever this bit is set, the page has to be written back to disk if it needs to be flushed from memory (due to memory exhaustion etc.) otherwise, the memory can simply be released. A simple LRU (Least Recently Used) list does wonders here.
    B-tree records can be stored as ordinary records too; this implies that the indexes of the tables (a bunch of records) are ordinary records too. Note that these spare 5 bits (one taken indicating an 'indirect' record) can do wonders here. As a matter of fact, a bit indicating that its cell is pointing to a B-tree node came in very handy. And still three bits free for other miracles! ;-)
    I better quit before I turn another message into a novel again.
    kind regards,
    Jos

  • Looking for the latest Performance documents

    Hello,
    Latley we installed SP14.
    I am looking for SAP's latest documents which involves performence
    issues such as: Tuning, Fine Tuning, Troubleshooting guide ect.
    My current documents are relevant to SP2 and I would like to read the latest ones at this subject.
    Can someone please attach here the relevant links? Is there a central place where I can download these documents?
    Roy.

    Ho Roy,
    > Latley we installed SP14
    Congrats
    > Can someone please attach here the relevant links
    The tuning guides are to be found under http://service.sap.com/nw04 -- How-to Guides -- Portal, KM and Collaboration; for example:
    TREX: How To Configure Efficient Indexing
    EP: Finetuning Performance of Portal Platform
    KM: Tuning the Performance of Knowledge Management
    Hope it helps
    Detlev

  • Poor performance of Report Writer reports (Special Ledger Library)

    Greetings - We are running into problems with poor performance of reports that are written with the SAP Report Writer. The problem appears to be caused when SAP is using the primary-key index in our Special Purpose ledger (where the reports are generated). The index contains object fields that cannot be added to the report library (COBJNR, SOBJNR, ROBJNR). We have created alternate indices, but they are not being picked up with the Report Writer reports.
    Are there any configurable or technical settings that we can work with in order to force the use of a specific index for a report? It seems logical that SAP would find the most efficient index to use, but with the reports that we are looking at, this does not appear to be the case.
    Any help that can be offered will be greatly appreciated...We are currently using version 4.6C, but are planning an upgrade to ECC 6.0 later this year.
    Thanks in advance -

    Arjun,
    Where / which files contains these parameters we cannot find them all ??? 
    Tomcat - Java _properties and try again ( You can tune below value as per your system memory)
    -XX:PermSize=256m
    -XX:MaxPermSize=256m
    -XX:NewSize=171m
    -XX:MaxNewSize=171m
    -XX:SurvivorRatio=2
    -XX:TargetSurvivorRatio=90
    -XX:+DisableExplicitGC
    -XX:+UseTLAB
    As a general update it looks like we need to use the Monitoring tools that are installed by default, we are now in the process of installing the database etc
    Cheers

  • Search Funtion

    Dear Sir,
    I need to do Search Function and specific some folder for search in EP6 SP14,
    How should we do?
    Thanks
    Vimol

    Dear Vimol,
    Please refer to http://help.sap.com/saphelp_nw2004s/helpdata/en/40/83505303bd5616e10000000a114cbd/frameset.htm. This talks about the configuration of TREX and anything and everything you would like to know about TREX.
    https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/1545e1bf-0d01-0010-a5ab-f80e574423bf - <b>How To…Configure TREX 6.1 for Efficient Indexing</b>
    https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/f0d3be0e-0401-0010-b780-ff7e4e103ea0 - <b>How To…enable semantic search/search for synonyms</b>
    https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/d17d18d0-0a01-0010-0db1-f96a947e38a0 - <b>How-to Guide: Searchable HTML Tags</b>
    https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/77f6aa90-0201-0010-b681-e013540efb3b - <b>How to…set up Web Repository and Crawling it for Indexing</b>
    You may also refer to : TREX and its application on SDN to know about the different kinds of search.
    If you want to create your own search using KM APIs then refer : How to write a Search Application using the KM Indexmanagement API for TREX
    Regards,
    Sunil

  • TREX Documentation

    hey people!
    Does any one sit here on some TREX material? I really need a configuration/administration guide.
    I have search now for several days and all i can find is installguides and efficient indexing and search.
    What im looking for is HOW to do it? Like making index'es in CRM, reindex, setting schedules, what to index and other usefull docs.
    OR you have done this before, please help me out
    I have installed TREX 7.0, ContentServer and CRM 4.0 and the communication works fine. I can search for docs on CRM, from TREX to CRM and from the Portal to CRM. But i'm experiencing some problems with different searchresults and index isnt updated when uploading new documents.
    Best Regards
    Kristoffer Engh

    Thank you Amit, that whould be great!
    My email is: [email protected]
    Im looking for how i can administrate index in trex (create, edit, reindex in a nonportal environment)
    But im glad for all the documentation i can get
    I got this now, but it doesnt contain what im looking for:
    How To Configure an alternative document access URL for TREX.pdf
    How To Configure Efficient Indexing.pdf
    How to enable semantic search.pdf
    TREX60SP1_InstNonPort.pdf
    TREX61_DistributedSystems.pdf
    TREX61_SP17_Install_Guide.pdf
    TREX Recommendations.pdf
    TREX Cluster Configuration and Monitoring.pdf
    Best Regards
    Kristoffer Engh
    Message was edited by: Kristoffer Engh
    Message was edited by: Kristoffer Engh

  • IRM LDAP Read timeout

    Good day everyone!
    We are facing problem when LDAP read procedure takes too long
    Environment is: IRM + OVD as user repository (many AD domains behind it)
    When we search for users in IRM we get exception:
    <05.07.2011 14:08:36 MSD> <Warning> <oracle.irm.web> <IRM-03012> <Исключение при извлечении сведений о пользователе.
    oracle.irm.jps.exception.JpsIdentityStoreFailureException: oracle.security.idm.OperationFailureException: javax.naming.NamingException: [LDAP: error code 1 - LDAP Error 1 : LDAP response read timed out, timeout used:15000ms.]; remaining name 'dc=ovd'
         at oracle.irm.jps.util.Store.searchIdentities(Store.java:404)
         at oracle.irm.web.services.IrmCommonUtil.searchByUserOrGroup(IrmCommonUtil.java:131)
         at oracle.irm.web.backing.common.CommonUserSearch.searchAction(CommonUserSearch.java:169)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at com.sun.el.parser.AstValue.invoke(Unknown Source)
         at com.sun.el.MethodExpressionImpl.invoke(Unknown Source)
         at org.apache.myfaces.trinidad.component.UIXComponentBase.broadcastToMethodExpression(UIXComponentBase.java:1300)
         at oracle.adf.view.rich.component.UIXQuery.broadcast(UIXQuery.java:116)
         at oracle.adf.view.rich.component.fragment.UIXRegion.broadcast(UIXRegion.java:148)
         at oracle.adf.view.rich.component.fragment.UIXRegion.broadcast(UIXRegion.java:148)
         at oracle.adf.view.rich.component.fragment.ContextSwitchingComponent$1.run(ContextSwitchingComponent.java:92)
         at oracle.adf.view.rich.component.fragment.ContextSwitchingComponent._processPhase(ContextSwitchingComponent.java:361)
         at oracle.adf.view.rich.component.fragment.ContextSwitchingComponent.broadcast(ContextSwitchingComponent.java:96)
         at oracle.adf.view.rich.component.fragment.UIXInclude.broadcast(UIXInclude.java:102)
         at oracle.adf.view.rich.component.fragment.ContextSwitchingComponent$1.run(ContextSwitchingComponent.java:92)
         at oracle.adf.view.rich.component.fragment.ContextSwitchingComponent._processPhase(ContextSwitchingComponent.java:361)
         at oracle.adf.view.rich.component.fragment.ContextSwitchingComponent.broadcast(ContextSwitchingComponent.java:96)
         at oracle.adf.view.rich.component.fragment.UIXInclude.broadcast(UIXInclude.java:96)
         at oracle.adfinternal.view.faces.lifecycle.LifecycleImpl.broadcastEvents(LifecycleImpl.java:902)
         at oracle.adfinternal.view.faces.lifecycle.LifecycleImpl._executePhase(LifecycleImpl.java:313)
         at oracle.adfinternal.view.faces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:186)
         at javax.faces.webapp.FacesServlet.service(FacesServlet.java:265)
         at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
         at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
         at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:300)
         at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:26)
         at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
         at oracle.adfinternal.view.faces.webapp.rich.RegistrationFilter.doFilter(RegistrationFilter.java:106)
         at org.apache.myfaces.trinidadinternal.webapp.TrinidadFilterImpl$FilterListChain.doFilter(TrinidadFilterImpl.java:446)
         at oracle.adfinternal.view.faces.activedata.AdsFilter.doFilter(AdsFilter.java:60)
         at org.apache.myfaces.trinidadinternal.webapp.TrinidadFilterImpl$FilterListChain.doFilter(TrinidadFilterImpl.java:446)
         at org.apache.myfaces.trinidadinternal.webapp.TrinidadFilterImpl._doFilterImpl(TrinidadFilterImpl.java:271)
         at org.apache.myfaces.trinidadinternal.webapp.TrinidadFilterImpl.doFilter(TrinidadFilterImpl.java:177)
         at org.apache.myfaces.trinidad.webapp.TrinidadFilter.doFilter(TrinidadFilter.java:92)
         at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
         at oracle.help.web.rich.OHWFilter.doFilter(Unknown Source)
         at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
         at oracle.adf.model.servlet.ADFBindingFilter.doFilter(ADFBindingFilter.java:205)
         at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
         at oracle.security.jps.ee.http.JpsAbsFilter$1.run(JpsAbsFilter.java:111)
         at java.security.AccessController.doPrivileged(Native Method)
         at oracle.security.jps.util.JpsSubject.doAsPrivileged(JpsSubject.java:313)
         at oracle.security.jps.ee.util.JpsPlatformUtil.runJaasMode(JpsPlatformUtil.java:413)
         at oracle.security.jps.ee.http.JpsAbsFilter.runJaasMode(JpsAbsFilter.java:94)
         at oracle.security.jps.ee.http.JpsAbsFilter.doFilter(JpsAbsFilter.java:161)
         at oracle.security.jps.ee.http.JpsFilter.doFilter(JpsFilter.java:71)
         at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
         at oracle.dms.servlet.DMSServletFilter.doFilter(DMSServletFilter.java:136)
         at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
         at weblogic.servlet.internal.RequestEventsFilter.doFilter(RequestEventsFilter.java:27)
         at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)
         at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3715)
         at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3681)
         at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
         at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)
         at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2277)
         at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2183)
         at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1454)
         at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
         at weblogic.work.ExecuteThread.run(ExecuteThread.java:178)
    is there anyway to increase ldap read timeout? We tried to change OVD Authenticator properties, but it seems that IRM doesn't bother to look at them =(

    Hi,
    This problem is caused by the type of search IRM makes when searching for users. By default, if you specify 'ran' as a search parameter, then we search for \*ran\* and so Frank would be returned. The problem is that very large LDAP repositories can take a long time to return when performing this kind of search unless they're setup appropriately. So you have two options:
    1. Implement Medial (also known as Tuple) Indexing in your LDAP. This builds a larger less efficient index but gives much faster responses on substring searches.
    2. As of 11.1.1.3 we have a patch (patch number 10354979) which allows you to refine the type of search to 'begins', 'ends' or 'exact match'. For example, you can do prefix searching. So 'ran' would not find 'Frank' but it would find 'Randolph'. If you are running 11.1.1.3, you need to apply the patch. If you are running a later version you just need to switch on the parameter defined in the patch readme. You cannot apply the patch to 11.1.1.2.1; in this case you have to upgrade to the latest version.
    You should refer to KM note 1272153.1 which tells you more about this.
    Regards,
    Frank.

  • Vectors VS Arrays VS LinkedLists

    Hi
    I am basically looking for some info on the advantages or disadvantages of using any of the above. I would like to know how one justifies the use of one over the other.
    I would be grateful for any pointers or resources you could point me to.
    Cheers

    Use arrays if you know the size you will need, you won't be inserting or removing elements and you need highly efficient indexed access to the elements.
    Use ArrayList if you don't know how large to make the container, you need efficient indexed access to the elements, you don't need to have efficient insertion or removal of elements from the interior of the container and you don't need thread safety on the individual operations (the most common case).
    Use Vector or a synchronized ArrayList is your needs are like those for ArrayList but you need thread safety of the individual operations.
    Use LinkedList if you don't need efficient indexed access but you do need efficient insertion and removal of elements from the interior of the container and you don't need thread safety of individual operations (the usual case).
    Use the synchronized wrapper of LinkedList if your needs are like those that lead you to LinkedList but you need thread safe individual operations.
    I am sure this is documented better someplace else, but I haven't taken the time to check.
    Chuck

  • Problems with Searching documents!

    I am having a problem with the ability to search documents in the portal, everytime I run a search I get the message no results found.
    I have done the following:
    1. Create a File system Repository
    2. Created a Crawler Profile
    3. Created an Index and assigned the crawler.
    4. Wait until it had finished and said their were 13 items.
    However when I search their are none!
    What am I doing wrong?

    Hi Phil,
    1. Check for trex server status in sys Admin--> Monitoring --> KM --> Trex Monitor
    2. Check for the documents in the Folder on which you are creating index.
    3. After you create a index, modify the parameters in sys Admin--> Monitoring --> KM --> Trex Monitor --> Edit Queue Parameters.
    4. Check the Queue status in sys Admin--> Monitoring --> KM --> Trex Monitor --> Display queues. Sometimes try to flush the index.
    5. If still not working, at times re-index the index or delete and recreate the index.
    For efficient indexing see the doc
    https://websmp102.sap-ag.de/~sapidb/011000358700000378692005E.PDF
    Hope this helps,
    thanks,
    Praveen
    PS. Dont forget to reward points

  • Indexing for Load efficiency or Query Efficiency

    Hello,
    I've just come to start querying our warehouse which is currently being developed. Unfortunately, although the tables and indexes were spec'd at the design for query performance, the ETL team loading the table have set up indexes to improve load efficiency.
    Is there anyway of placing 2 sets of indexes on a table, one for the load process and one for querying? Or should the queries be done on materialised views?
    Any help would be much appreciated.
    Regards
    James

    Just to clarify, there may be a harm in having multiple sets of indexes since index maintenance may consume a significant fraction of the ETL time. If the goal is to optimize the load, you may want to drop and rebuild the indexes that exist for query performance (assuming that queries and loads run during separate windows). Depending on volumes and load characteristics, I could envision scenarios where you could drop the indexes used by consumers, build the indexes for ETL, run the ETL, drop the ETL indexes, and rebuild the consumer indexes, but that would be rather rare.
    Materialized views can certainly come into play to satisfy warehouse queries, particular for aggregate data. It wouldn't make much sense to drop indexes on base tables and create them on materialized views that were exact copies of those base tables since that wouldn't save you any processing time. If your MV's have sigificantly fewer rows than the base tables, though, indexes on the aggregates may be significantly cheaper to maintain.
    Justin
    Distributed Database Consulting, Inc.
    http://www.ddbcinc.com/askDDBC

  • Can I refactor this query to use an index more efficiently?

    I have a members table with fields such as id, last name, first name, address, join date, etc.
    I have a unique index defined on (last_name, join_date, id).
    This query will use the index for a range scan, no sort required since the index will be in order for that range ('Smith'):
    SELECT members.*
            FROM members
            WHERE last_name = 'Smith'
            ORDER BY joindate, idIs there any way I can get something like the following to use the index (with no sort) as well:
    SELECT members.*
            FROM members
            WHERE last_name like 'S%'
            ORDER BY joindate, idI understand the difficulty is probably; even if it does a range scan on every last name 'S%' (assuming it can?), they're not necessarily in order. Case in point:
    Last_Name:  JoinDate:
    Smith          2/5/2010
    Smuckers     1/10/2010An index range scan of 'S%' would return them in the above order, which is not ordered by joindate.
    So is there any way I can refactor this (query or index) such that the index can be range scanned (using LIKE 'x%') and return rows in the correct order without performing a sort? Or is that simply not possible?

    xaeryan wrote:
    I have a members table with fields such as id, last name, first name, address, join date, etc.
    I have a unique index defined on (last_name, join_date, id).
    This query will use the index for a range scan, no sort required since the index will be in order for that range ('Smith'):
    SELECT members.*
    FROM members
    WHERE last_name = 'Smith'
    ORDER BY joindate, idIs there any way I can get something like the following to use the index (with no sort) as well:
    SELECT members.*
    FROM members
    WHERE last_name like 'S%'
    ORDER BY joindate, idI understand the difficulty is probably; even if it does a range scan on every last name 'S%' (assuming it can?), they're not necessarily in order. Case in point:
    Last_Name:  JoinDate:
    Smith          2/5/2010
    Smuckers     1/10/2010An index range scan of 'S%' would return them in the above order, which is not ordered by joindate.
    So is there any way I can refactor this (query or index) such that the index can be range scanned (using LIKE 'x%') and return rows in the correct order without performing a sort? Or is that simply not possible?Come on. Index column order does matter. "LIKE 'x%'" actually is full table scan. The db engine accesses contiguous index entries and then uses the ROWID values in the index to retrieve the table rows.

  • Index efficiency

    Hi,
    I have 3 tables A, B and C.
    Table A has A1 as 1 column.
    Table B has B1 and A1 (referenced from Table A).
    Table C has C1 and A1 (referenced from Table A).
    All 3 tables are Indexed properly for Primary Key and Foreign Key.
    If I want to join B and C then, which query will perform better among the below queries.
    Query1:
    SELECT C.C1
    FROM B, A, C
    WHERE B.A1 = A.A1
    AND A.A1 = C.A1Query2:
    SELECT C.C1
    FROM B,C
    WHERE B.A1 = C.A1

    Write the query in whichever way makes most sense.
    In later versions, the optimizer is able to do table and join elimination.
    For example:
    SQL> create table a
      2  (a1 number not null primary key);
    Table created.
    SQL>
    SQL> create table b
      2  (b1 number not null primary key
      3  ,a1 number references a (a1));
    Table created.
    SQL>
    SQL> create table c
      2  (c1 number not null primary key
      3  ,a1 number references a (a1));
    Table created.
    SQL> explain plan for
      2  select c.c1
      3  from   a,b,c
      4  where  b.a1 = a.a1
      5  and    c.a1 = b.a1
      6  and    c.a1 = a.a1;
    Explained.
    SQL> select * from table(dbms_xplan.display);
    PLAN_TABLE_OUTPUT
    Plan hash value: 3136813453
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    |   0 | SELECT STATEMENT   |      |     1 |    39 |     5  (20)| 00:00:01 |
    |*  1 |  HASH JOIN         |      |     1 |    39 |     5  (20)| 00:00:01 |
    |*  2 |   TABLE ACCESS FULL| B    |     1 |    13 |     2   (0)| 00:00:01 |
    |   3 |   TABLE ACCESS FULL| C    |     1 |    26 |     2   (0)| 00:00:01 |
    Predicate Information (identified by operation id):
       1 - access("C"."A1"="B"."A1")
       2 - filter("B"."A1" IS NOT NULL)
    Note
       - dynamic sampling used for this statement (level=4)
    20 rows selected.
    SQL>

  • How oracle decide whetehr to use index or full scan (statistics)

    Hi Guys,
    Let say i have a index on a column.
    The table and index statistics has been gathered. (without histograms).
    Let say i perform a select * from table where a=5;
    Oracle will perform a full scan.
    But from which statistics it will be able to know indeed most of the column = 5? (histograms not used)
    After analyzing, we get the below:
    Table Statistics :
    (NUM_ROWS)
    (BLOCKS)
    (EMPTY_BLOCKS)
    (AVG_SPACE)
    (CHAIN_COUNT)
    (AVG_ROW_LEN)
    Index Statistics :
    (BLEVEL)
    (LEAF_BLOCKS)
    (DISTINCT_KEYS)
    (AVG_LEAF_BLOCKS_PER_KEY)
    (AVG_DATA_BLOCKS_PER_KEY)
    (CLUSTERING_FACTOR)
    thanks
    Index Column (A)
    ======
    1
    1
    2
    2
    5
    5
    5
    5
    5
    5

    I have prepared some explanation and have not noticed that the topic has been marked as answered.
    This my sentence is not completely true.
    A column "without histograms" means that the column has only one bucket. More correct: even without histograms there are data in dba_tab_histograms which we can consider as one bucket for whole column. In fact these data are retrieved from hist_head$, not from histgrm$ as usual buckets.
    Technically there is no any buckets without gathered histograms.
    Let's create a table with skewed data distribution.
    SQL> create table t as
      2  select least(rownum,3) as val, '*' as pad
      3    from dual
      4  connect by level <= 1000000;
    Table created
    SQL> create index idx on t(val);
    Index created
    SQL> select val, count(*)
      2    from t
      3   group by val;
           VAL   COUNT(*)
             1          1
             2          1
             3     999998So, we have table with very skewed data distribution.
    Let's gather statistics without histograms.
    SQL> exec dbms_stats.gather_table_stats( user, 'T', estimate_percent => 100, method_opt => 'for all columns size 1', cascade => true);
    PL/SQL procedure successfully completed
    SQL> select blocks, num_rows  from dba_tab_statistics
      2   where table_name = 'T';
        BLOCKS   NUM_ROWS
          3106    1000000
    SQL> select blevel, leaf_blocks, clustering_factor
      2    from dba_ind_statistics t
      3   where table_name = 'T'
      4     and index_name = 'IDX';
        BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
             2        4017              3107
    SQL> select column_name,
      2         num_distinct,
      3         density,
      4         num_nulls,
      5         low_value,
      6         high_value
      7    from dba_tab_col_statistics
      8   where table_name = 'T'
      9     and column_name = 'VAL';
    COLUMN_NAME  NUM_DISTINCT    DENSITY  NUM_NULLS      LOW_VALUE      HIGH_VALUE
    VAL                     3 0,33333333          0           C102            C104So, Oracle suggests that values between 1 and 3 (raw C102 and C104) are distributed uniform and the density of the distribution is 0.33.
    Let's try to explain plan
    SQL> explain plan for
      2  select --+ no_cpu_costing
      3         *
      4    from t
      5   where val = 1
      6  ;
    Explained
    SQL> @plan
    | Id  | Operation         | Name | Rows  | Cost  |
    |   0 | SELECT STATEMENT  |      |   333K|   300 |
    |*  1 |  TABLE ACCESS FULL| T    |   333K|   300 |
    Predicate Information (identified by operation id):
       1 - filter("VAL"=1)
    Note
       - cpu costing is off (consider enabling it)Below is an excerpt from trace 10053
    BASE STATISTICAL INFORMATION
    Table Stats::
      Table:  T  Alias:  T
        #Rows: 1000000  #Blks:  3106  AvgRowLen:  5.00
    Index Stats::
      Index: IDX  Col#: 1
        LVLS: 2  #LB: 4017  #DK: 3  LB/K: 1339.00  DB/K: 1035.00  CLUF: 3107.00
    SINGLE TABLE ACCESS PATH
      BEGIN Single Table Cardinality Estimation
      Column (#1): VAL(NUMBER)
        AvgLen: 3.00 NDV: 3 Nulls: 0 Density: 0.33333 Min: 1 Max: 3
      Table:  T  Alias: T
        Card: Original: 1000000  Rounded: 333333  Computed: 333333.33  Non Adjusted: 333333.33
      END   Single Table Cardinality Estimation
      Access Path: TableScan
        Cost:  300.00  Resp: 300.00  Degree: 0
          Cost_io: 300.00  Cost_cpu: 0
          Resp_io: 300.00  Resp_cpu: 0
      Access Path: index (AllEqRange)
        Index: IDX
        resc_io: 2377.00  resc_cpu: 0
        ix_sel: 0.33333  ix_sel_with_filters: 0.33333
        Cost: 2377.00  Resp: 2377.00  Degree: 1
      Best:: AccessPath: TableScan
             Cost: 300.00  Degree: 1  Resp: 300.00  Card: 333333.33  Bytes: 0Cost of FTS here is 300 and cost of Index Range Scan here is 2377.
    I have disabled cpu costing, so selectivity does not affect the cost of FTS.
    cost of Index Range Scan is calculated as
    blevel + (leaf_blocks * selectivity + clustering_factor * selecivity) = 2 + (4017*0.33333 + 3107*0.33333) = 2377.
    Oracle considers that it has to read 2 root/branch blocks of the index, 1339 leaf blocks of the index and 1036 blocks of the table.
    Pay attention that selectivity is the major component of the cost of the Index Range Scan.
    Let's try to gather histograms:
    SQL> exec dbms_stats.gather_table_stats( user, 'T', estimate_percent => 100, method_opt => 'for columns val size 3', cascade => true);
    PL/SQL procedure successfully completedIf you look at dba_tab_histograms you will see following
    SQL> select endpoint_value,
      2         endpoint_number
      3    from dba_tab_histograms
      4   where table_name = 'T'
      5     and column_name = 'VAL'
      6  ;
    ENDPOINT_VALUE ENDPOINT_NUMBER
                 1               1
                 2               2
                 3         1000000ENDPOINT_VALUE is the column value (in number for any type of data) and ENDPOINT_NUMBER is cumulative number of rows.
    Number of rows for any ENDPOINT_VALUE = ENDPOINT_NUMBER for this ENDPOINT_VALUE - ENDPOINT_NUMBER for the previous ENDPOINT_VALUE.
    explain plan and 10053 trace of the same query:
    | Id  | Operation                   | Name | Rows  | Cost  |
    |   0 | SELECT STATEMENT            |      |     1 |     4 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T    |     1 |     4 |
    |*  2 |   INDEX RANGE SCAN          | IDX  |     1 |     3 |
    Predicate Information (identified by operation id):
       2 - access("VAL"=1)
    Note
       - cpu costing is off (consider enabling it)
    BASE STATISTICAL INFORMATION
    Table Stats::
      Table:  T  Alias:  T
        #Rows: 1000000  #Blks:  3106  AvgRowLen:  5.00
    Index Stats::
      Index: IDX  Col#: 1
        LVLS: 2  #LB: 4017  #DK: 3  LB/K: 1339.00  DB/K: 1035.00  CLUF: 3107.00
    SINGLE TABLE ACCESS PATH
      BEGIN Single Table Cardinality Estimation
      Column (#1): VAL(NUMBER)
        AvgLen: 3.00 NDV: 3 Nulls: 0 Density: 5.0000e-07 Min: 1 Max: 3
        Histogram: Freq  #Bkts: 3  UncompBkts: 1000000  EndPtVals: 3
      Table:  T  Alias: T
        Card: Original: 1000000  Rounded: 1  Computed: 1.00  Non Adjusted: 1.00
      END   Single Table Cardinality Estimation
      Access Path: TableScan
        Cost:  300.00  Resp: 300.00  Degree: 0
          Cost_io: 300.00  Cost_cpu: 0
          Resp_io: 300.00  Resp_cpu: 0
      Access Path: index (AllEqRange)
        Index: IDX
        resc_io: 4.00  resc_cpu: 0
        ix_sel: 1.0000e-06  ix_sel_with_filters: 1.0000e-06
        Cost: 4.00  Resp: 4.00  Degree: 1
      Best:: AccessPath: IndexRange  Index: IDX
             Cost: 4.00  Degree: 1  Resp: 4.00  Card: 1.00  Bytes: 0Pay attention on selectivity, ix_sel: 1.0000e-06
    Cost of the FTS is still the same = 300,
    but cost of the Index Range Scan is 4 now: 2 root/branch blocks + 1 leaf block + 1 table block.
    Thus, conclusion: histograms allows to calculate selectivity more accurate. The aim is to have more efficient execution plans.
    Alexander Anokhin
    http://alexanderanokhin.wordpress.com/

Maybe you are looking for

  • Error 4002 every time I open iTunes

    Every time I open iTunes up I get an error box stating; "We could not complete your iTunes store request. An unknown error occurred (4002). There was an error in the iTunes store" This box comes up every time I open the application, whether it's been

  • Applet socket communication

    Hello, I've successfully created a simple client-server application that communicate through the socket, and now converting to applet, but the following error occured on the java console java.security.AccessControlException: access denied (java.net.S

  • Mac has a bluish tint to it, just happened last week

    Hello! My Mac is a refurb, early 2011, I got it in the September. It's always run a bit low, because it's only 4gb. But a few days ago the browsers turned a bit, well, grey. Sometimes a bluish colour. ColorSync says no problem. I need to know if I'm

  • Uninstalling Quicktime Preview for Windows

    Hi. I bought an iPod some days ago and was gonna install iTunes. As the installation proceeded right at the end this shows up: "This installer cannot upgrade the Preview of Quicktime 7 for Windows. You must first remove it." or something like that as

  • Ampersand problem in numbers

    Serious problems with passing "&" in HyperLink function for example this =HYPERLINK("https://www.google.com/finance?fct=big&q="&B108,B108) where B108 is a cell containing "AAPL" text. will be  passed to SAFARI as: https://www.google.com/finance?fct=b