Maitaning huge volume of data (Around 60 Million records)

Iu2019ve requirement to load the data from ODS to Cube by full load. This ODS is getting 50 Million records down the line for 6 months which we have to maintain in BW.
Can you please put the advise on the following things?
Can we accommodate 50 Million records in ODS?
If i.e. the case u201CCan we put the load for 50 Million records from ODS to Cube?u201D And each record has to go check in another ODS to get the value for another InfoObject. Hence u201CIs the load going to be successful for the 50 Million records? Iu2019m not sure. Or do we get time out error?

Harsha,
The data load should go through ... some things to do / check...
Delete the indices on cube before loading and then rebuild the same later after the load completes.
regarding the lookup - if you are looking up specific values in another DSO - build a suitable secondary index on the DSO for the same ( preferably unique index )
A DSo or cube can definitely hold 50 million records - we have had cases where we has 50 million records for 1 month with the DSO holding data for 6 to 10 months and the same with the cube also. Only that the reporting on the cube might be slow at a very detailed level.
Also please state your version - 3.x or 7.0...
also if you are on Oracle - plan for providing / backing up archive logs - since loading generates a lot of arcive logs...
Edited by: Arun Varadarajan on Apr 21, 2009 2:30 AM

Similar Messages

Error while extracting huge volumes of data from BW

Hi,
we see this error while extracting huge volumes of data (apprx 3.4 million and with more no.of columns) and we see this error.
R3C-151001: |Dataflow DF_SAPSI_SAPSI3131_SAPBW_To_Teradata
Error calling R/3 to get table data: <RFC Error:
Key: TSV_TNEW_PAGE_ALLOC_FAILED
Status: EXCEPTION SYSTEM_FAILURE RAISED
No more storage space available for extending an internal table.
>.
We are not sure if DoP works with source as SAP BW, but when tried with DoP also, we got the same error.
Will this issue be resolved with an R/3 or ABAP dataflow? Can anyone suggest some possible solutions for this scenario?
Sri

The problem is that you've reached the maximum memory configure for your system.
If this is batch job reconfigure the profile parameter
abap/heap_area_nondia
Markus

Data with huge volume of data with DTP

Hi Experts,
I have this problem with upload of huge volume of data with DTPs.
I have my initialisation done as I am doing reloads, Now I have this data from fiscal year period 000.2010 to 016.9999.
I have huge volume of data.
I have tried uploading this data in chunks by dividing 3 months for each DTP and had made full load.
But when I processed the DTP the data packages are decided at source and I have about 2000 data packages.
Now my request is turning to red after processing about 1000 datapackages, batch processes allocated to this also stopped.
I have tried dividing DTP only by month and processed the DTP I have same problem. I have deleted the indexes before uplaoding to the cube, Changed the setting battch processing from 3 to 5.
Please can any one advise what could be problem.I am uplaoding this reloads in quality system.
How can upload this data which are in millions.
Thanks,
Tati

Hi Galban,
I have made the parallel processing from 3 to 5 even and the datapakcage size
Can you please advise in this area how can I increase the data package size as the data package size for my upload is the package size corresponds to package size in source it is determined dynammically at runtime.
Please advise.
Thanks
Tati

In Bdc I have huge volume of data to upload for the given transaction

Hi gurus,
In Bdc I have huge volume of data to upload for the given transaction, here am using session method, it takes lots of exection time to complete the whole transaction, Is there any other method to process the huge volume with minimum time,
reward awaiting
with regards
Thambe

Selection of BDC Method depends on the type of the requirement you have. But you can decide which one will suite requirement basing the difference between the two methods. The following are the differences between Session & Call Transaction.
Session method.
1) synchronous processing.
2) can tranfer large amount of data.
3) processing is slower.
4) error log is created
5) data is not updated until session is processed.
Call transaction.
1) asynchronous processing
2) can transfer small amount of data
3) processing is faster.
4) errors need to be handled explicitly
5) data is updated automatically
Batch Data Communication (BDC) is the oldest batch interfacing technique that SAP provided since the early versions of R/3. BDC is not a typical integration tool, in the sense that, it can be only be used for uploading data into R/3 and so it is
not bi-directional.
BDC works on the principle of simulating user input for transactional screen, via an ABAP program.
Typically the input comes in the form of a flat file. The ABAP program reads this file and formats the input data screen by screen into an internal table (BDCDATA). The transaction is then started using this internal table as the input and executed in the background.
In Call Transaction, the transactions are triggered at the time of processing itself and so the ABAP program must do the error handling. It can also be used for real-time interfaces and custom error handling & logging features. Whereas in
Batch Input Sessions, the ABAP program creates a session with all the transactional data, and this session can be viewed, scheduled and processed (using Transaction SM35) at a later time. The latter technique has a built-in error processing mechanism too.
Batch Input (BI) programs still use the classical BDC approach but doesnt require an ABAP program to be written to format the BDCDATA. The user has to format the data using predefined structures and store it in a flat file. The BI program then reads this and invokes the transaction mentioned in the header record of the file.
Direct Input (DI) programs work exactly similar to BI programs. But the only difference is, instead of processing screens they validate fields and directly load the data into tables using standard function modules. For this reason, DI programs are much faster (RMDATIND - Material Master DI program works at least 5 times faster) than the BDC counterpart and so ideally suited for loading large volume data. DI programs are not available for all application areas.
synchronous & Asynchronous updating:
http://www.icesoft.com/developer_guides/icefaces/htmlguide/devguide/keyConcepts4.html
synchronous & Asynchronous processings
Asynchronous refers to processes that do not depend on each other's outcome, and can therefore occur on different threads simultaneously. The opposite is synchronous. Synchronous processes wait for one to complete before the next begins. For those Group Policy settings for which both types of processes are available as options, you choose between the faster asynchronous or the safer, more predictable synchronous processing.
By default, the processing of Group Policy is synchronous. Computer policy is completed before the CTRLALTDEL dialog box is presented, and user policy is completed before the shell is active and available for the user to interact with it.
Note
You can change this default behavior by using a policy setting for each so that processing is asynchronous. This is not recommended unless there are compelling performance reasons. To provide the most reliable operation, leave the processing as synchronous.

Most efficient method of storing configuration data for huge volume of data

The scenario in which i'm boggled up is as follows:
I have a huge volume of raw data (as CSV files).
This data needs to be rated based on the configuration tables.
The output is again CSV data with some new fields appended to the original records.
These new fields are derived from original data based on the configuration tables.
There are around 15 configuration tables.
Out of these 15 tables 4 tables have huge configurations.
1 table has 15 million configuration data of 10 colums.
Other three tables have around 1-1.5 million configuration data of 10-20 columns.
Now in order to carry forward my rating process, i'm left with the following methods:
1) Leave the configurations in database table. Query the table for each configuration required.
Disadvantage: Even if the indexes are created on the table, it takes a lot of time to query 15 configuration tables for each record in the file.
2) Load the configurations as key value pairs in RAM using a suitable collection (Eg HashMap)
Advantage: Processing is fast
Disadvantage: Takes around 2 GB of RAM per instance.
Also when the CPU context swithes (as i'm using a 8 CPU server), the process gets hanged up for 10 secs.
This happens very frequently, so the net-net speed which i get is again less
3) Store the configurations as CSV sorted files and then perform a binary search on it.
Advantages: No RAM usage, Same configuration shared by multiple instances
Disadvantages: Only 1 configuration table has an integer key, so cant use this concept for other tables
(If i'm wrong in that please correct)
4) Store the configurations as an XML file
Dont know the advantages/disadvantages for it.
Please suggest with the methodology which should be carried out....
Edited by: Vishal_Vinayak on Jul 6, 2009 11:56 PM

Vishal_Vinayak wrote:
2) Load the configurations as key value pairs in RAM using a suitable collection (Eg HashMap)
Advantage: Processing is fast
Disadvantage: Takes around 2 GB of RAM per instance.
Also when the CPU context swithes (as i'm using a 8 CPU server), the process gets hanged up for 10 secs.
This happens very frequently, so the net-net speed which i get is again lessSounds like you don't have enough physical memory. Your application shouldn't be hanging at all.
How much memory is attached to each CPU? e.g. numactl --show

Load Huge volume of data

Hi,
We are having a Go-live soon and I need to initialize the data from a huge table which is almost 1.5 TB. This is a billing condition table. Any tips on how to load the data effectively wiht out any data lossand optima time frame.
Thanks

HI,
If you have source downtime. and if you have selection conditions on company code or fical year period any thing specific to your requiremnets.
If it is fiscal year period you can pull repur fulls for all the periods eg:001.2009 to 006.2009 and immediately you can create one more infopackage abd load 007.2009 to 012.2009.Like this u can kep intervals for all the fiscal years depending how much history you have.like 2000 to 2009. or 2001 to 2009.
Before doing this make sure that u have enough back ground process and table space available, keep sm12 authorisations also. and kiling a job in sm50 if have done some thing wrong.
After pulling every thing if still the down time is tere do a init without data transver by having all the fical year periods in the selctions eg_001.2000 to 012.2009.
Please start the IP in start later in back ground.
Cheers,
Vikram

Huge volume of data not getting processed

Hello Everyone,
Its a single file to multiple idoc scenario. There is no mapping involved. But the problem is that the file is of size 50 MB having 50000 idocs . This idocs get divided and are sent to BW for reporting based on the message id . Now since the message id should remain same for all the 50000 idocs, so i cannot split the file.
But when i process this, it gives me Lock_Table_Overflow error . The function module IDOC_INBOUND_ASYNCHRONOUS is in error in SM58 . I have checked the enque/table_size and its 64000. i think its enough to process a 50 MB file.
Please let me know how to proceed further with this.
Regards,
Ravi

Hi Ravi,
I don't really think this is a problem of PI itself, especially that you get the error in IDOC_INBOUND_ASYNCHRONOUS. The enque/table_size equal 64 000 might not be enough for 50 000 IDocs - just think if each IDoc requires two locks.
Hopefully, you should be able solve the issue by setting the Queue Processing checkbox in your receiver IDoc adapter in PI. This will force IDocs being processed one by one, so that so many locks will not be created simultaneously. The only problem is that I cannot foresee, how big the overall increase in processing time will be.
But you will not know until you try and please let us know about the results, as there might be others to follow your path
Hope this helps,
Greg

How to update a table that has around 1 Million Records

Hi,
Lets take the basic emp table for our Referenece and lets assume that it contains around 60000 Records and all the deptno in that table are Initially 10. Please provide an update statement which would update deptno column of EMP table((based on) order by EMPNO) in for every 120 records incrementing by 1.(DeptNo to be incremented by 1,like 10 ,11 , 12 etc).
First 120 Records deptno should be 10,
Next 120 Records deptno should be 11, and so on.
For Last 120 records deptno should be updated with 500.
Please advise.
Regards,

Ok, I've done it on a smaller set, incrementing every 5 records...
SQL> ed
Wrote file afiedt.buf
1 update emp2 set deptno = (select newdeptno
2                            from (select empno, 10+floor((row_number() over (order by empno)-1)/5) as newdeptno
3                                  from emp2
4                                 ) e2
5                            where e2.empno = emp2.empno
6*                          )
SQL> /
14 rows updated.
SQL> select * from emp2;
     EMPNO ENAME      JOB              MGR HIREDATE                    SAL       COMM     DEPTNO
      7369 SMITH      CLERK           7902 17-DEC-1980 00:00:00        800                    10
      7499 ALLEN      SALESMAN        7698 20-FEB-1981 00:00:00       1600        300         10
      7521 WARD       SALESMAN        7698 22-FEB-1981 00:00:00       1250        500         10
      7566 JONES      MANAGER         7839 02-APR-1981 00:00:00       2975                    10
      7654 MARTIN     SALESMAN        7698 28-SEP-1981 00:00:00       1250       1400         10
      7698 BLAKE      MANAGER         7839 01-MAY-1981 00:00:00       2850                    11
      7782 CLARK      MANAGER         7839 09-JUN-1981 00:00:00       2450                    11
      7788 SCOTT      ANALYST         7566 19-APR-1987 00:00:00       3000                    11
      7839 KING       PRESIDENT            17-NOV-1981 00:00:00       5000                    11
      7844 TURNER     SALESMAN        7698 08-SEP-1981 00:00:00       1500          0         11
      7876 ADAMS      CLERK           7788 23-MAY-1987 00:00:00       1100                    12
      7900 JAMES      CLERK           7698 03-DEC-1981 00:00:00        950                    12
      7902 FORD       ANALYST         7566 03-DEC-1981 00:00:00       3000                    12
      7934 MILLER     CLERK           7782 23-JAN-1982 00:00:00       1300                    12
14 rows selected.but the principle would be the same.

Data Load for 20M records from PSA

Hi Team,
We need to reload a huge volume of data (around 20 million records) of Billing data (2LIS_13_VDITM) PSA to the first level DSO and then to the higher level targets.
If we are going to run the entire load with one full request from PSA to DSO for 20M records will it have any performance issue?
Will it be a good approach to split the load based on ‘Billing Document Number’?
In Case, If we the load by 'Billing Document Number'; will it create any performance issue from the reporting perspective (if we receive the data from multiple requests?) Since most of the report would be ran based on Date and not by 'Billing Document Number'.
Thanks
San

Hi,
Better solution put the filter based on the year and fiscal year.
check the how many years of data based on the you can put filter.
Thanks,
Phani.

Delete volumes of Data in a table.

We have a table which has loads of data. When we execute the delete statement , it gives DB error as it was not able to delete huge volume of data.
Approach tried so far :
Keep a counter who size is say 10000. and execute delete statements every 10000 records. But still as the data is huge it still takes time.
Can anybody suggest some approaches ?

user12944938 wrote:
Oracle Version : 10g
SQL : DELETE FROM STUDENT WHERE YEAR = 2000.
The requirement is more of a yearly basis but on different tables[STUDENT, ADMIN, MANAGERS,........].
We have tried to run in off-hours.
STUDENT TABLE [ ID, FIRST NAME, MIDDLE NAME LAST NAME,.......................................................................................] AROUND 200 COLUMNS.Well you left out a lot of information from what i asked for so i don't have much to suggest.
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2345591157689
Is a good read and may be of help to you.
Partitioning could also be an option (at which point you could drop the old partitions).
Really can't say knowing what we do (and more importantly, do not know) about your situation.

What is the best way to extract large volume of data from a BW InfoCube?

Hello experts,
Wondering if someone can suggest the best method that is availabe in SAP BI 7.0 to extract a large amount of data (approx 70 million records) from an InfoCube. I've tried OpenHub and APD but not working. I always need to separate the extracts into small datasets. Any advice is greatly appreciated.
Thanks,
David

Hi David,
We had the same issue but that was loading from an ODS to cube. We have over 50 million records. I think there is no such option like parallel loading using DTPs. As suggested earlier in the forum, the only best option is to split according to the calender year of fis yr.
But remember even with the above criteria sometimes for some cal yr you might have lot of data, even that becomes a problem.
What i can suggest you is apart from Just the cal yr/fisc, also include some other selection criteria like comp code or sales org.
yes you will end up load more requests, but the data loads would go smooth with lesser volumes.
Regards
BN

Delta data load for huge table in Data Services XI R3

Hi,
We have a project requirement of pulling delta data once in a week from table which has around 44 million records but it doesn't have any last modified column.
In such case we have to use table comparison transform, but it will be very time consuming.
Please provide some suggestions to meet this requirement.
The source of DS job is a snapshot(the source table is in a remote database).
Thanks!

Because SAP Business Objects Data Services XI 3.x doesn't have any built-in delta-enablement mechanism with R3/ECC, the only possibility right now is to do a table compare. Given that this is only a weekly activity, while maybe time-consuming, it may be sufficient.
In the future, you may want to consider activating the related SAP BW standard content DataSource(s) to extract into Data Services prior to loading to the target repositories. This may provide delta-enablement for the data that you're extracting.

How to update a table that has Million Records

Hi,
Lets consider the basic EMP table and lets assume that it has around 20 Million Records . we need to have an update statement.Normal UPdate statement may hang the system or it may take a lot of time.
The basic or Normal update statement goes like this and hope it may not work.
update emp set hiredate = sysdate where comm is null and hiredate is null;Basic statement may not work. sugestions Needed.
Regards,
Vinesh

sri wrote:
I heard Bulk collect will resolve these type of issues and i am really poor at Bulk Collect concepts.Exactly what type of issue are you concerned with? The business requirements here are pretty important-- what problem is the UPDATE causing, specifically, that you are trying to work around.
so looking for a solution to the problem using Bulk Collect .Without knowing the problem, it's very tough to suggest a solution. If you process data in batches using BULK COLLECT, your UPDATE statement will take longer to run and will consume more resources on the database. If the problem you are trying to solve is that your UPDATE is not fast enough, this is a poor approach.
On the other hand, if you process data in batches, and do interim commits, you can probably hold locks on individual rows for a shorter amount of time. That would only be a concern, though, if you have some other process that is trying to update the same rows that you are updating at the same time that you're updating them, which is pretty rare. And breaking your update into multiple transactions introduces a whole bunch of complexity. You now have to write a bunch of code to ensure that your process is restartable should the update fail mid-way through leaving some number of updates committed and some number rolled back. You have to have a very detailed understanding of the data and data consistency to ensure that breaking up the transaction isn't going to negatively impact any process, report, etc. To do it correctly is a pile of work and then it's something that is constantly at risk of creating problems in the future when requirements change.
In the vast majority of cases, you're better off issuing a simple SQL statement during a time when the system isn't particularly busy.
Justin

Table has 80 million records - Performance impact if we stop archiving

HI All,
I have a table (Oracle 11g) which has around 80 million records till now we used to do weekly archiving to maintain size. But now one of the architect at my firm suggested that oracle does not have any problem with maintaining even billion of records with just a few performance tuning.
I was just wondering is it true and moreover what kind of effect would be their on querying and insertion if table size is 80 million and increasing every day ?
Any comments welcomed.

What is true is that Oracle database can manage tables with billions of rows but when talking about data size you should give table size instead of number of rows because you wont't have the same table size if the average row size is 50 bytes or if the average row size is 5K.
About performance impact, it depends on the queries that access this table: the more data queries need to process and/or to return as result set, the more this can have an impact on performance for these queries.
You don't give enough input to give a good answer. Ideally you should give DDL statements to create this table and its indexes and SQL queries that are using these tables.
In some cases using table partitioning can really help: but this is not always true (and you can only use partitioning with Entreprise Edition and additional licensing).
Please read http://docs.oracle.com/cd/E11882_01/server.112/e25789/schemaob.htm#CNCPT112 .

Update performance on a 38 million records table

Hi all,
I´m trying to create a script to update a table that have around 38 million records. That table isn´t partitioned and I just have to update one CHAR(1 byte) field and set it to 'N'.
The Database is 10g r2 running on a Unix TRU64.
The script I create have a LOOP on a CURSOR that Bulk 200.000 records by pass and do a FORALL to update the table by ROWID.
The problem is, on the performances tests that method took about 20 minutes to update 1 million rows and should take about 13 hours to update all table.
My question is: Is that any way to improve the performance?
The Script:
DECLARE
CURSOR C1
IS
SELECT ROWID
FROM RTG.TCLIENTE_RTG;
type rowidtab is table of rowid;
d_rowid rowidtab;
v_char char(1) := 'N';
BEGIN
OPEN C1;
LOOP
FETCH C1
BULK COLLECT INTO d_rowid LIMIT 200000;
FORALL i IN d_rowid.FIRST..d_rowid.LAST
UPDATE RTG.TCLIENTE_RTG
SET CLI_VALID_IND = v_char
WHERE ROWID = d_rowid(i);
COMMIT;
EXIT WHEN C1%NOTFOUND;
END LOOP;
CLOSE C1;
END;
Kind Regards,
Fabio

I'm just curious... Is this a new varchar2(1) column that has been added to that table? If so will the value for this column remain 'N' for the future for the majority of the rows in that table?
Has this column specifically been introduced to support one of the business functions in your application / will it not be used everywhere where the table is currently in use?
If your answers to above questions contain many yes'ses, then why did you choose to add a column for this that needs to be initialized to 'N' for all existing rows?
Why not add a new single-column table for this requirement: the single column being the pk-column(s) of the existing table. And the meaning being if a pk is present in this new table, then the "CLI_VALID_IND" for this client is 'yes'. And if a pk is not present, then the "CLI_VALID_IND" for this client is 'no'.
That way you only have to add the new table. And do nothing more. Of course your SQL statements in support for the business logic of this new business function will have to use, and maybe join, this new table. But is that really a huge disadvantage?

Maitaning huge volume of data (Around 60 Million records)

Similar Messages

Maybe you are looking for