4 lookups against single large table

I need to do a 4 column lookup against a large table (1 Million rows) that contains 4 different record types. The first lookup will match on colums A, B, C, and D. If no match is found, I try again with colums A, B, C, and '99' in column D. If no match, try again with column A, B, D, and '99' in Column C. Finally, if no match in any of the above, use column A, '99' in B, '99' in C, '99' in D. I will retreive 2 columns from the lookup table.
My thought is that breaking this sequence out into 4 different tables/ lookups would be most efficient. The other option would be to write a script that handled this logic in a single transform with an in-memory table. My concern is that the size of the table would be too large to load into memory.
Any ideas/suggestions would be appreciated.

Hi all, ..Jamie glad your looking at this post. I've a spreadsheet contains about 26K rows and a warehouse fact table containing 10 million rows (after being filtered). I want to pull back only the fact table rows that match the id field in the spreadsheet,
and thus extend the 26K rows in the spreadsheet with the fields from the fact table. Pretty basic join ID = ID. To make it interesting/challenging I'm using Excel 2010 32 Bit no issue with 26K rows but no way I can even browse 10 mil records without timeout.
Seems like PQ can use SQL functions so was wondering if that might be an option to do the work on the server I.e. fn_ReturnRow(@ID). Haven't seen an example of this being used anywhere so perhaps you, Chris Webb or one of the other PQ Gurus could lend a hand?
This may also help the original poster with some more functions to return back rows that would normally make PQ choke. Hopefully with minimal M. And BTW is there a way to use SQL Stored Procs in PQ? Cheers
Geoff Fane

Similar Messages

Lookup from a large table?

Hi, All
Very new to APEX. If I have a very large table of securities, and want the user to be able to lookup the security a number of different ways (Name, CUSIP or ticker), is there a way to do this in APEX? When they are faced with entering a "security id", they need to find a way to retrieve the correct one.
Thanks

Hi
Yes, this is very simple.
Essentially, you create a number of items where they can enter the details.
In you case, let's call them P1_SEC_ID, P1_NAME, P1_CUSIP and P1_TICKER.
Next create a button called GO (or whatever you want) that branches back to the same page.
Not create a report region with a source something like this...
SELECT *
FROM my_big_table
WHERE INSTR(UPPER(sec_id),UPPER(:P1_SEC_ID)) > 0
OR    INSTR(UPPER(name),UPPER(:P1_NAME)) > 0
OR    INSTR(UPPER(cusip),UPPER(:P1_CUSIP)) > 0
OR    INSTR(UPPER(ticker),UPPER(:P1_TICKER)) > 0Next, make the report rgion conditional on the request value being 'GO' and this should work for you.
Cheers
Ben

Is there a limit on the number of Key Lookups against a table in a mapping

I'm using OWB 11.1 and have a mapping with 15 Key Lookups against the one table. When I Validate the mapping it objects to a Key Lookup not being connected, even thuogh in the mapping all the Key Lookups have been renamed to their relevant fields?
Is there a limit to the number of key lookups against the one table?

Thanks for the replys.
I'm getting a validation error, so can't run the mapping. Error VLD-1108: Operator Key_LOOKUP is not properly connected.
The issue is solved, after checking each of the Key_Lookups, one was not connected to an output. The error occurs if an output field isn't used (connected to) a table or other operator.
Edited by: user616385 on Sep 30, 2009 7:58 PM
Edited by: user616385 on Sep 30, 2009 8:01 PM

Partitioning - query on large table v. query accessing several partitions

Hi,
We are using partitioning on a large fact table, however, in deciding partitioning strategy looking for advice regarding queries which have to access several partitions versus query against a large table.
What is quicker - a query which acccesses a large table or a query which accesseses several partitions to return results. I
Need to partition due to size/admin etc. but want to make sure queries which need to access > 1 partition are not significantly slower than ones which access a large table by comparison.
Ones which access just one partition fine but some queries have to accesse several partitions
Many Thanks

Here are your choices stated another way. Is it better to:
1. Get one weeks data by reading one month's data and throwing away 75% of it (assumes partitioning by month)
2. Get one weeks data by reading three weeks of it and throwing away part of two weeks? (assumes partitioning by week)
3. Get one weeks data by reading seven daily partitions and not having to throw away any of it? (assumes daily partitioning)
I have partitioned as frequently as every 5-15 minutes (banking and telecom) and have yet to find a situation where partitions larger than the minimum date-range for the majority of queries makes sense.
Anyone can insert data into a table ... an extra millisecond per insert is generally irrelevant. What you want to do is optimize reading the data where that extra millisecond per row, over millions of rows, adds up to measurable time.
But this is Oracle so the best answer to your questions is to recommend you not take anyone advice on this but rather run some tests with real data, in real-world volumes, with real-world DML and queries.

How to efficiently select random rows from a large table ?

Hello,
The following code will select 5 rows out of a random set of rows from the emp (employee) table
select *
from (
       select ename, job
         from emp
       order by dbms_random.value()
where rownum <= 5my concern is that the inner select will cause a table scan in order to assign a random value to each row. This code when used against a large table can be a performance problem.
Is there an efficient way of selecting random rows from a table without having to do a table scan ? (I am new to Oracle, therefore it is possible that I am missing a very simple way to perform this task.)
thank you for your help,
John.
Edited by: 440bx on Jul 10, 2010 6:18 PM

Have a look at the SAMPLE clause of the select statement. The number in parenthesis is a percentage of the table.
SQL> create table t as select * from dba_objects;
Table created.
SQL> explain plan for select * from t sample (1);
Explained.
SQL> @xp
PLAN_TABLE_OUTPUT
Plan hash value: 2767392432
| Id | Operation           | Name | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT    |      |   725 | 70325 |   289   (1)| 00:00:04 |
|   1 | TABLE ACCESS SAMPLE| T    |   725 | 70325 |   289   (1)| 00:00:04 |
8 rows selected.

Block creation of multiple prod. order against single sales order in CO08

Hi All,
We have a requirement of blocking the creation of multiple production order against single sales order in CO08 if the production order is already created for that item.
How to achieve this?
SmanS

Hi,
you need to check if any user exit is there which will check order haeder table for the said sales order if any Prod Order exist for the same and flag custome error whicle prod order save.
Pls revert if you need any further information.

Is it ok? if we have 42 million records in a single fact table!!

Hello,
We have three outstanding fact tables, and we need to add one more fact type, and we were thinking whether we can do two different fact tables, or can we put the values in one of the same fact table which is similar, but the records are upto 42 million if we add ,so my question is having a single fact table with all records, or breaking it down, to two different ones!!!Thnx!!

I am not sure what is an "outstanding fact" or an "fact type". A 42m fact table doesn't necessarily indicate you are doing something wrong although it does sound as odd. I would expect most facts to be small as they should have aggregated measures to speed up report. In some cases you may want to drill down to the detailed transaction level in which case you may find these large facts. But care should be taken not to allow users to query on this fact without user the "transaction ID" which obviously should be indexed and should guarantee that queries will be quick.
Guessing from your post (as it is not clear not descriptive enough) it would seem to imply that you are adding a new dimension to your fact and that will cause the fact to increase it's row count to 42m. That probably means that you are changing the granularity of the fact. That may or may not be correct, depending on your model.

OutOfMemory error when trying to display large tables

We use JDeveloper 10.1.3. Our project uses ADF Faces + EJB3 Session Facade + TopLink.
We have a large table (over 100K rows) which we try to show to the user via an ADF Read-only Table. We build the page by dragging the facade findAllXXX method's result onto the page and choosing "ADF Read-only Table".
The problem is that during execution we get an OutOfMemory error. The Facade method attempts to extract the whole result set and to transfer it to a List. But the result set is simply too large. There's not enough memory.
Initially, I was under the impression that the table iterator would be running queries that automatically fetch just a chunk of the db table data at a time. Sadly, this is not the case. Apparently, all the data gets fetched. And then the iterator simply iterates through a List in memory. This is not what we needed.
So, I'd like to ask: is there a way for us to show a very large database table inside an ADF Table? And when the user clicks on "Next", to have the iterator automatically execute queries against the database and fetch the next chunk of data, if necessary?
If that is not possible with ADF components, it looks like we'll have to either write our own component or simply use the old code that we have which supports paging for huge tables by simply running new queries whenever necessary. Alternatively, each time the user clicks on "Next" or "Previous", we might have to intercept the event and manually send range information to a facade method which would then fetch the appropriate data from the database. I don't know how easy or difficult that would be to implement.
Naturally, I'd prefer to have that functionality available in ADF Faces. I hope there's a way to do this. But I'm still a novice and I would appreciate any advice.

Hi Shay,
We do use search pages and we do give the users the opportunity to specify search criteria.
The trouble comes when the search criteria are not specific enough and the result set is huge. Transferring the whole result set into memory will be disastrous, especially for servers used by hundreds of users simultaneously. So, we'll have to limit the number of rows fetched at a time. We should do this either by setting the Maximum Rows option for the TopLink query (or using rownum<=XXX inside the SQL), or through using a data provider that supports paging.
I don't like the first approach very much because I don't have a good recipe for calculating the optimum number of Maximum Rows for each query. By specifying some average number of, say, 500 rows, I risk fetching too many rows at once and I also risk filling the TopLink cache with objects that are not necessary. I can use methods like query.dontMaintainCache() but in my case this is a workaround, not a solution.
I would prefer fetching relatively small chunks of data at a time and not limiting the user to a certain number of maximum rows. Furthermore, this way I won't fetch large amounts of data at the very beginning and I won't be forced to turn off the caching for the query.
Regarding the "ADF Developer's Guide", I read there that "To create a table using a data control, you must bind to a method on the data control that returns a collection. JDeveloper allows you to do this declaratively by dragging and dropping a collection from the Data Control Palette."
So, it looks like I'll have to implement a collection which, in turn, implements the paging functionality that I need. Is the TopLink object you are referring to some type of collection? I know that I can specify a collection class that TopLink should use for queries through the query.useCollectionClass(...) method. But if TopLink doesn't provide the collection I need, I will have to write that collection myself. I still haven't found the section in the TopLink documentation that says what types of Collections are natively provided by TopLink. I can see other collections like oracle.toplink.indirection.IndirectList, for example. But I have not found a specific discussion on large result sets with the exception of Streams and Cursors and I feel uneasy about maintaining cursors between client requests.
And I completely agree with you about reading the docs first and doing the programming afterwards. Whenever time permits, I always do that. I have already read the "ADF Developer's Guide" with the exception of chapters 20 and 21. And I switched to the "TopLink Developer's Guide" because it seems that we must focus on the model. Unfortunately, because of the circumstances, I've spent a lot of time reading and not enough time practicing what I read. So, my knowledge is kind of shaky at the moment and perhaps I'm not seeing things that are obvious to you. That's why I tried using this forum -- to ask the experts for advice on the best method for implementing paging. And I'm thankful to everyone who replied to my post so far.

Insert Value(s) in Lookup Flat Multi-valued Table with Java API

I've been looking in MDM Java API Library Reference Guide, MDM SP4 API JavaDoc, and SDN Forums for information on how to Insert/Update different values in a field in the Main Table of a given repository that belongs to a Lookup Flat Multi-valued Table using the Java API with no success.
I also haven't been successful in adding this values in the same way that I'll add a single value, using the MDM Java API, in a single-value Lookup Table (a2iFields.Add(new A2iField(FIELD_CODE,FIELD_VALUE)) for each value I want to add, like for example:
a2iFields.Add(new A2iField("Country","USA")
a2iFields.Add(new A2iField("Country","Mexico")
a2iFields.Add(new A2iField("Country","Germany")
Can anybody point me to the correct documentation that I need to read to fulfill this task? Or, even better, if someone can post a piece of code, I'll be more thankful.
Thanks for your help.

HI,
little code example, where you add existing lookup values based on there record id:
     int USA = 1;
                    int GERMANY = 2;
                    A2iValueArray countryArray = new A2iValueArray();
                    countryArray.Add(new Value(USA));
                    countryArray.Add(new Value(GERMANY));
                    A2iFields record = new A2iFields();
                    record.Add(new A2iField("Country", new Value(countryArray)));
Please reward points if helpful.
Regards,
Robert

Split a large table into multiple packages - R3load/MIGMON

Hello,
We are in the process of reducing the export and import downtime for the UNICODE migration/Conversion.
In this process, we have identified couple of large tables which were taking long time to export and import by a single R3load process.
Step 1:> We ran the System Copy --> Export Preparation
Step 2:> System Copy --> Table Splitting Preparation
We have created a file with the large tables which are required to split into multiple packages and where able to create a total of 3 WHR files for the following table under DATA directory of main EXPORT directory.
SplitTables.txt (Name of the file used in the SAPINST)
CATF%2
E071%2
Which means, we would like each of the above large tables to be exported using 2 R3load processes.
Step 3:> System Copy --> Database and Central Instance Export
During the SAPInst process at Split STR files screen , we have selected the option 'Split Predefined Tables' and select the file which has predefined tables.
Filename: SplitTable.txt
CATF
E071
When we started the export process, we haven't seen the above tables been processed by mutiple R3load processes.
They were exported by a Single R3load processes.
In the order_by.txt file, we have found the following entries...
order_by.txt----
# generated by SAPinst at: Sat Feb 24 08:33:39 GMT-0700 (Mountain
Standard Time) 2007
default package order: by name
CATF
D010TAB
DD03L
DOKCLU
E071
GLOSSARY
REPOSRC
SAP0000
SAPAPPL0_1
SAPAPPL0_2
We have selected a total of 20 parallel jobs.
Here my questions are:
a> what are we doing wrong here?
b> Is there a different way to specify/define a large table into multiple packages, so that they get exported by multiple R3load processes?
I really appreciate your response.
Thank you,
Nikee

Hi Haleem,
As for your queries are concerned -
1. With R3ta , you will split large tables using WHERE clause. WHR files get generated. If you have mentioned CDCLS%2 in the input file for table splitting, then it generates 2~3 WHR files CDCLS-1, CDCLS-2 & CDCLS-3 (depending upon WHERE conditions)
2. While using MIGMON ( for sequencial / parallel export-import process), you have the choice of Package Order in th e properties file.
E.g : For Import - In the import_monitor_cmd.properties, specify
Package order: name | size | file with package names
orderBy=/upgexp/SOURCE/pkg_imp_order.txt
And in the pkg_imp_txt, I have specified the import package order as
BSIS-7
CDCLS-3
SAPAPPL1_184
SAPAPPL1_72
CDCLS-2
SAPAPPL2_2
CDCLS-1
Similarly , you can specify the Export package order as well in the export properties file ...
I hope this clarifies your doubt
Warm Regards,
SANUP.V

LookUp to the same table with multiple conditions

Hi,
I nead to do a lookup to the same table in the flow but with diffrent quieres, each query contains it's own 'where'.
Can I do it somehow in one look up or do I have to use a few ?
select a from table where a=1
select b from table where c=3
Thanks

Hi,
Using multiple lookups will be a cleaner approach. If you are using multiple lookups on the same table consider using Cache transform. Refer the below link for details on Cache transform
Lookup and Cache Transforms in SQL Server Integration Services
Alternatively if you want to go ahead with single look up , you may have to modify the SQL statement in the Lookup accordingly to return the proper value. In you case it may be
select a,b from table where a=1 or c=3
Note : Consider the above as a pseudo code. This needs to be tested and applied based on your requirement.
Best Regards Sorna

Updating Large Tables

Hi,
I was asked the following during an interview
U have a large table with millions rows and want to add a column. What is the best way to do it without effecting the preformance of DB
Also u have a large table with million rows how do u organise the indexes
My answer was to coalasce the indexes
I was wondering what is the best answers for these questions
Thanks

Adding a column to a table, even a really big one is trivial and will have no impact on the performance of the database. Just do:
ALTER TABLE t ADD (new_column DATATYPE);This is simply an update to the data dictionary. Aside from the few milliseconds that Oracle will lock some dicitonary tables (no different than the locks held if you update a column in a user table) there will be no impact on the performance of the database. Now, populatng that column would be a different kettle of fish, and would depend on how (i.e. single value for all rows, calculated based on other columns) the column needs to be populated.
I would have asked for clarification on what they meant by "oraganise the indexes". If they meant what tablespaces should they go in, I would say in the same tablespace as other objects of similar size (You are using locally managed tablespaces aren't you?). If they meant what indexes would you create, I would say that I would create the indexes neccessary to answer the queries that you run.
HTH
John

Updating a large table

Hello,
We need to update 2 columns on a very large table (20000000 records). Every row in the table is to be updated and the client wants to be able to update the records by year. Below the procedure that has been developed
DECLARE
l_year VARCHAR2 (4) := '2008';
CURSOR c_1 (l_year1 VARCHAR2)
IS
SELECT ROWID l_rowid, (SELECT tmp.new_code_x
FROM new_mapping_code_x tmp
WHERE tmp.old_code_x = l.code_x) code_x,
(SELECT tmp.new_code_x
FROM new_mapping_code_x tmp
WHERE tmp.old_code_x = l.code_x_ori) code_x_ori
FROM tableX l
WHERE TO_CHAR (created_date, 'YYYY') = l_year1;
TYPE typec1 IS TABLE OF c_1%ROWTYPE
INDEX BY PLS_INTEGER;
l_c1 typec1;
BEGIN
DBMS_OUTPUT.put_line ( 'Update start - '
|| TO_CHAR (SYSDATE, 'DD/MM/YYYY HH24:MI:SS')
OPEN c_1 (l_year);
LOOP
FETCH c_1
BULK COLLECT INTO l_c1 LIMIT 100000;
EXIT WHEN l_c1.COUNT = 0;
FOR indx IN 1 .. l_c1.COUNT
LOOP
UPDATE tableX
SET code_x = NVL (l_c1 (indx).code_x, code_x),
code_x_ori =
NVL (l_c1 (indx).code_x_ori, code_x_ori)
WHERE ROWID = l_c1 (indx).l_rowid;
END LOOP;
COMMIT;
END LOOP;
CLOSE c_1;
DBMS_OUTPUT.put_line ( 'Update end - '
|| TO_CHAR (SYSDATE, 'DD/MM/YYYY HH24:MI:SS')
END;
We do not want to do a single update by year as we fear the update might fail with for example rollback segment error.
It seems to me the above developed is not the most efficient one. Any comments on the above or anyone having a better solution?
Thanks

Everything wrong with the sample code and approach used. This is not how one uses Oracle. This is not how one designs performant and scalable code.
Transactions must be consistent and logical. A commit in the middle of "+doing something+" is wrong. Period. (and no, the reasons for committing often and frequently in something like SQL-Server do not and never have applied to Oracle)
Also, as I/O is the slowest and most expensive operation that one can perform in a database, it simply makes sense to reduce I/O as far as possible. This means not doing this:
WHERE TO_CHAR (created_date, 'YYYY') = l_year1;Why? Because an index on created_date is now rendered utterly useless... and in this specific case will result in a full table scan.
It means using the columns in their native data types. If the column is a date then use it as a date! E.g.
where created_date between :startDate and :endDateThe proper approach to this problem is to determine what is the most effective logical transaction that can be done, given the available resources (redo/undo/etc).
This could very likely be daily - dealing and updating with a single day's data at a time. So then one will write a procedure that updates a single day as a single transaction.
One can also create a process log table - and have this procedure update this table with the day being updated, the time started, the time completed, and the number of rows updated.
One now has a discrete business process that can be run. This allows one to run 10 or 30 or more of these processes at the same time using DBMS_JOB - thus doing the updates for a month using parallel processing.
The process log table can be used to manage the entire update. It will also provide basic execution time details allowing one to estimate the average time for updating a day and the total time it will take for all the data in the large table to be updated.
This is a structured approach. An approach that ensures the integrity of the data (all rows for a single day is treated as a single transaction). One that also provides management data that gives a clear picture of the state of the data in the large table.
I'm a firm believer that is something is worth doing, it is worth doing well. Using a hacked approach of blindly updating data and committing ad-hoc without any management and process controls... That is simply doing something very badly. Why? It may be interesting running into a brick wall the first time around. However, subsequent encounters with the wall should be avoided.

Error in sync group with large tables

Since a few days ago the automated sync process configured in windows azure portal is failing. The following message appears
SqlException Error Code: -2146232060 - SqlError Number:40550,
Message: The session has been terminated because it has acquired too many locks.
Looking for the error on internet I've found the following post
http://blogs.msdn.com/b/sync/archive/2010/09/24/how-to-sync-large-sql-server-databases-to-sql-azure.aspx
Basically it says that in order to increase the application transaction size it's necessary to include some parameters in the remote and local provider. There's an example script for that.
But how can i apply this change if my data sync process was created through azure web portal? Is there a way to access the sync scripts? How can I increase the transaction size from azure portal?
Please, any help is welcome
Alvaro

Hi Alvaro,
I’m afraid that there’s no method to access the sync scripts and increase the transaction size from azure portal when using SQL Azure data sync group.
The error 40550 occurs when sessions consuming greater than one million locks. You can use the following DMVs to monitor your transactions in SQL Azure. Usually, the solution of this error is to read or modify fewer rows in a single transaction.
sys.dm_tran_active_transactions
sys.dm_tran_database_transactions
sys.dm_tran_locks
sys.dm_tran_session_transactions
In your scenario, to overcome the error 40550, I recommend you use the
bcp utility or SQL Server Integration Services (SSIS) to move data from large table to SQL Azure.
With bcp utility, you can divide your data into multiple sections and upload each section by executing multiple bcp commands simultaneously. With SSIS, you can divide your data into multiple files on the file system and upload each file by executing multiple
Streams simultaneously.
Reference:
Optimizing Data Access and Messaging - SQL Azure Connection Management
Thanks,
Lydia Zhang
If you have any feedback on our support, please click
here.
Lydia Zhang
TechNet Community Support

Large tables truncated or withheld from webhelp

I'm running into a major issue trying to include a large table in my WebHelp build. I'm using RoboHelp 8 in Word. When I include a large table (6 columns x 180 rows) the table is either truncated or withheld from the compiled WebHelp.
I've tried several things to resolve it, but they all end in the same results. I've tried importing the table from its original Word file. I've tried breaking it up into many smaller tables. I've tried building a new table in Word, then copying the data. Oddly enough if I build the table blank and compile--the table appears. But once I copy data into the table, it disappears.
RoboHelp seems unable to process the table, as when I've broken the single table into several smaller tables, it chokes, doesn't include the table or even put the topic in the TOC, even though it is in the source file.
Any ideas? I've not been able to find anything in the forums or anywhere else online.
Many thanks!

Can you tell us what you mean "using RoboHelp in Word"? Do you mean you are using it as your editor or that you are using the RoboHelp for Word application? If the later, is there a reason why you can't use the RoboHelp HTML application? This is much more suited to producing WebHelp. Personally I wouldn't touch the HTML that Word creates with a bargepole.

4 lookups against single large table

Similar Messages

Maybe you are looking for