Performance on huge tables

Hi,
I have an application that should contain 100 million records.
Each record has a primary key.
The application fetches a row using the record primary key.
Can anyone tell me what is the problem when using such a big table?
What is the performance of the index on 100 million records?
What is the performance of updates?
Thanks
dyahav

user10952094 wrote:
Can anyone tell me what is the problem when using such a big table?
What is the performance of the index on 100 million records?
What is the performance of updates?It is not about the size of the table.
It is about the size of the I/O.
In other words, how efficient the I/O paths are for getting to the required rows. A small table can cause worse performance problems than a table 10x its size due to the way the smaller table has been defined and is used.
Simple (real world) example:
SQL> select count(*) from daily_xxxxx;
COUNT(*)
2255362806
Elapsed: 00:00:12.03
SQL>Same database, a select against the data dictionary (containing only a couple of rows in comparison):
SQL> select count(*) from all_objects;
COUNT(*)
50908
Elapsed: 00:00:49.17The difference is caused by the amount and nature of I/O that was done - not by the sizes of the tables.
There are however certain features in Oracle that can be used to effectively scale large tables for performance... and make data management significantly easier. The Partitioning Option is an Oracle Enterprise Edition feature that can be considered as an essential, if not a mandatory feature, for effectively dealing and scaling with very large tables (VLT).
However, such a feature aside - the same rules for effective performance for a small table apply to effective performance on large tables. So do not treat a VLT differently. The fundamentals for performance and scalability do not change.

Similar Messages

SELECT query performance : One big table Vs many small tables

Hello,
We are using BDB 11g with SQLITE support. I have a query about 'select' query performance when we have one huge table vs. multiple small tables.
Basically in our application, we need to run select query multiple times and today we have one huge table. Do you guys think breaking them into
multiple small tables will help ?
For test purposes we tried creating multiple tables but performance of 'select' query was more or less same. Would that be because all tables will map to only one database in backed with key/value pair and when we run lookup (select query) on small table or big table it wont make difference ?
Thanks.

Hello,
There is some information on this topic in the FAQ at:
http://www.oracle.com/technology/products/berkeley-db/faq/db_faq.html#9-63
If this does not address your question, please just let me know.
Thanks,
Sandra

Huge Group by operation on Huge Table takes lot of time

Hi,
Pl find the below given process which takes of time in execution (approx 5-6 hrs)
The mailn reason for this is
1) It Fetch data from huge table partition (i.e 18GB data for per day)
2)Performs Group by operations
3)In the where clause Index is not there on destination_number so performs Full Table scan
I have some idea i.e I need to change the Some Parameter which will make the process faster ,
Can you please help on this
create or replace table tmp_kumar nologging as
SELECT c.series_num , subscriber_id , COUNT(1) cnt , SUM(NVL(total_currency_charge,0))total_currency_charge ,
TRUNC(disconnect_date) FROM
(select * from prepcdr.PREPCDR_MAR_P3_10 partition(disconnect_date_11) union all
select * from prepcdr.PREPCDR_MAR_P3_10 partition(disconnect_date_11_new)) b,
(SELECT series_num, des, created_dt, LENGTH (series_num) len
FROM PREPCDR.HSS_SERIES_MAST where home_ind ='Y'
UNION
SELECT cimd_number, des, created_dt, LENGTH (cimd_number)
FROM PREPCDR.HSS_CIMD_MASTER) c
WHERE b.cdr_call_type = '86'
AND SUBSTR (b.destination_number, 1, c.len) = c.series_num
AND c.len = (SELECT MAX(x.len) FROM (SELECT series_num, des, created_dt, LENGTH (series_num) len
FROM PREPCDR.HSS_SERIES_MAST where home_ind ='Y'
UNION
SELECT cimd_number, des, created_dt, LENGTH (cimd_number) len
FROM PREPCDR.HSS_CIMD_MASTER) x WHERE x.series_num = SUBSTR (b.destination_number, 1, x.len))
AND disconnect_date >= '11-MAR-2010'
AND disconnect_date < '12-MAR-2010'
GROUP BY c.series_num , TRUNC(disconnect_date) , suBscriber_id

This, most likely, will be more efficient:
SELECT c.series_num,
        subscriber_id,
        COUNT(1) cnt,
        SUM(NVL(total_currency_charge,0)) total_currency_charge,
        TRUNC(disconnect_date)
FROM (
          select *
            from prepcdr.PREPCDR_MAR_P3_10 partition(disconnect_date_11)
         union all
          select *
            from prepcdr.PREPCDR_MAR_P3_10 partition(disconnect_date_11_new)
        ) b,
         SELECT DISTINCT series_num,
                          des,
                          created_dt,
                          len
           FROM (
                  SELECT series_num,
                          des,
                          created_dt,
                          len,
                          RANK() OVER(ORDER BY len) rnk
                    FROM (
                            SELECT series_num,
                                    des,
                                    created_dt,
                                    LENGTH(series_num) len
                              FROM PREPCDR.HSS_SERIES_MAST
                              where home_ind ='Y'
                           UNION ALL
                            SELECT cimd_number,
                                    des,
                                    created_dt,
                                    LENGTH(cimd_number)
                              FROM PREPCDR.HSS_CIMD_MASTER
           WHERE rnk = 1
        ) c
WHERE b.cdr_call_type = '86'
    AND SUBSTR(b.destination_number,1,c.len) = c.series_num
   AND disconnect_date >= DATE '2010-03-11'
   AND disconnect_date < DATE '2010-03-12'
GROUP BY c.series_num,
            TRUNC(disconnect_date),
            suBscriber_id
/SY.

Selecting Max Value from Huge Table

Dear Proffessionals
I have a huge table (20,000,000+ records) with the following columns:
[Time], [User], [Value]
The values in [Value] column can recur for a single User at a Time e.g.
2015-01-01, Me, X
2015-01-01, Me, Y
2015-01-01, Me, X
2015-01-02, Me, Z
2015-01-02, Me, X
2015-01-02, Me, Z
For each day, and for every user I want to have the maximum recurring value :
2015-01-01, Me, X
2015-01-02, Me, Z
to be inserted into another table.
PS: I want the MOST optimized way of achieving this functionality, bcause I am expecting a growth on the raw table over time, so PERFORMANCE is of great consideration.
I would really appreciate it, if somebody can help me.
Regards

I can think of two techniques based on the data selecticity
1) using row number function
2) using cross apply operator
USE Northwind;
-- Solution 1
SELECT S.SupplierID, S.CompanyName, CA.ProductID, CA.UnitPrice
FROM dbo.Suppliers AS S
CROSS APPLY
(SELECT TOP (10) *
FROM dbo.Products AS P
WHERE P.SupplierID = S.SupplierID
ORDER BY UnitPrice DESC, ProductID DESC) AS CA
ORDER BY S.SupplierID, CA.UnitPrice DESC, CA.ProductID DESC;
-- Solution 2
WITH C AS
SELECT S.SupplierID, S.CompanyName, P.ProductID, P.UnitPrice,
ROW_NUMBER() OVER(
PARTITION BY P.SupplierID
ORDER BY P.UnitPrice DESC, P.ProductID DESC) AS RowNum
FROM dbo.Suppliers AS S
JOIN dbo.Products AS P
ON P.SupplierID = S.SupplierID
SELECT SupplierID, CompanyName, ProductID, UnitPrice
FROM C
WHERE RowNum <= 10
ORDER BY SupplierID, ProductID DESC, UnitPrice DESC;
Best Regards,Uri Dimant SQL Server MVP,
http://sqlblog.com/blogs/uri_dimant/
MS SQL optimization: MS SQL Development and Optimization
MS SQL Consulting:
Large scale of database and data cleansing
Remote DBA Services:
Improves MS SQL Database Performance
SQL Server Integration Services:
Business Intelligence

Update records in huge table

Hi,
I need to update two fields in a huge table (> 200.000.000 records). I've created 2 basic update scripts with a where clause. The problem is that there isn't an index on these fields in the where clause. How can I solve this? Creating a new index is not an option.
An other solution is to update the whole table (so without a where clause) but I don't know if it takes a lot of time, locks records,...
Any suggestions?
Thanks.
Ken

Ken,
You may be better off reading the metalink documents. PDML stands for Parallel DML. You can use parallel slaves to get the update done quickly. Obviously this is dependent on the number of parallel slaves you have and the degree you set
Type PDML on metalink
G

How to improve Query performance on large table in MS SQL Server 2008 R2

I have a table with 20 million records. What is the best option to improve query performance on this table. Is partitioning the table into filegroups is a best option or splitting the table into multiple smaller tables?

Hi bala197164,
First, I want to inform that both to partition the table into filegroups and split the table into multiple smaller tables can improve the table query performance, and they are fit for different situation. For example, our table have one hundred columns and
some columns are not related to this table object directly (for example, there is a table named userinfo to store user information, it has columns address_street, address_zip,address_ province columns, at this time, we can create a new table named as Address,
and add a foreign key in userinfo table references Address table), under this situation, by splitting a large table into smaller, individual tables, queries that access only a fraction of the data can run faster because there is less data to scan. Another
situation is our table records can be grouped easily, for example, there is a column named year to store information about product release date, at this time, we can partition the table into filegroups to improve the query performance. Usually, we perform
both of methods together. Additionally, we can add index to table to improve the query performance. For more detail information, please refer to the following document:
Partitioning:
http://msdn.microsoft.com/en-us/library/ms178148.aspx
CREATE INDEX (Transact-SQL):
http://msdn.microsoft.com/en-us/library/ms188783.aspx
TechNet
Subscriber Support
If you are
TechNet Subscription user and have any feedback on our support quality, please send your feedback
here.
Allen Li
TechNet Community Support

Bitmap index or Composite index better on a huge table

Hi All,
I got a question regarding the Bitmap index and Composite Index.
I got a table which has got only two colums CUSTOMER(group_no NUMBER, order_no NUMBER)
This is a 100Million+ record table and here I got 100K Group_nos and and unique 100Million order numbers. I.E Each group should have 1000 order numbers.
I tested by creating a GLOBAL Bitmap index on this huge table(more than 1.5gb in size) and the GLOBAL Bitmap index that got created is under 50MB and when I query for a group number say SELECT * FROM CUSTOMER WHERE group_no=67677; --> 0.5 seconds to retrive all the 1000 rows. I checked for different groups and it is the same.
Now I dropped the BitMap Index and re-created a Composite index on( group_no and order_no). The index size more than the table size and is around 2GB in size and when I query using the same select statment SELECT * FROM CUSTOMER WHERE group_no=67677; -->0.5 seconds to retrive all the 1000 rows.
My question is which one is BETTER. BTree or BITMAP Index and WHY?
Appreciate your valuable inputs on this one.
Regars,
Madhu K.

Dear,
Hi All,
I got a question regarding the Bitmap index and Composite Index.
I got a table which has got only two colums CUSTOMER(group_no NUMBER, order_no NUMBER)
This is a 100Million+ record table and here I got 100K Group_nos and and unique 100Million order numbers. I.E Each group should have 1000 order numbers.
I tested by creating a GLOBAL Bitmap index on this huge table(more than 1.5gb in size) and the GLOBAL Bitmap index that got created is under 50MB and when I query for a group number say SELECT * FROM CUSTOMER WHERE group_no=67677; --> 0.5 seconds to retrive all the 1000 rows. I checked for different groups and it is the same.
Now I dropped the BitMap Index and re-created a Composite index on( group_no and order_no). The index size more than the table size and is around 2GB in size and when I query using the same select statment SELECT * FROM CUSTOMER WHERE group_no=67677; -->0.5 seconds to retrive all the 1000 rows.
My question is which one is BETTER. BTree or BITMAP Index and WHY?
Appreciate your valuable inputs on this one.First of all, bitmap indexes are not recommended for write intensive OLTP applications due to the locking threat they can produce in such a kind of applications.
You told us that this table is never updated; I suppose it is not deleted also.
Second, bitmap indexes are suitable for columns having low cardinality. The question is how can we define "low cardinality", you said that you have 100,000 distincts group_no on a table of 100,000,000 rows.
You have a cardinality of 100,000/100,000,000 =0,001. Group_no column might be a good candidate for a bitmap index.
You said that order_no is unique so you have a very high cardinality on this column and it might not be a candidate for your bitmap index
Third, your query where clause involves only the group_no column so why are you including both columns when testing the bitmap and the b-tree index?
Are you designing such a kind of index in order to not visit the table? but in your case the table is made only of those two columns, so why not follow Hermant advise for an Index Organized Table?
Finally, you can have more details about bitmap indexes in the following richard foot blog article
http://richardfoote.wordpress.com/2008/02/01/bitmap-indexes-with-many-distinct-column-values-wotsuh-the-deal/
Best Regards
Mohamed Houri

Deletion from huge table

hi,
we need to delete from a huge table (~11 million records) based on a column lookup from another table. Other than general DELETE statement , is there any best way to have fast delete
thanks.

SHMYG@rex> create table test (f1 varchar2(10));
SHMYG@rex> create table test1 (f1 varchar2(10));
SHMYG@rex> insert into test values ('a');
SHMYG@rex> insert into test values ('b');
SHMYG@rex> insert into test1 values ('a');
SHMYG@rex> select * from test;
F1
a
b
SHMYG@rex> select * from test1;
F1
a
SHMYG@rex> delete from test where exists (select * from test1 where test.f1 = test1.f1);
SHMYG@rex> select * from test;
F1
b

Query for the huge table is not working.

Hi,
I am having a link between oracle server and Microsoft sql server let' say 'SQLWEB' this link is perfectly working fine when I query table having few hundred thousand records but It’s not working for one of the table which is having a more then 3 million record at sql server. any one of you is having any Idea why this peculiar behavior is there any limitations for this heterogeneous link is there any workaround for the same. Below you can see the first query returns the count from table but second query is getting disconnected as that’s a very huge table having millions of record.
shams at oracleserver> select count(*) from investors@sqlweb ;
COUNT(*)
15096
shams at oracleserver> select count(*) from transactions@sqlweb;
select count(*) from transactions@sqlweb
ERROR at line 1:
ORA-02068: following severe error from SQLWEB
ORA-28511: lost RPC connection to heterogeneous remote agent using SID=%s
ORA-28509: unable to establish a connection to non-Oracle system
Regards
Shamsheer
Message was edited by:
Shamsheer

In general you want to minimize the traffic going over the dblink. This is best handled with view on the sql server try. You might try creating a view on sql server like:
create view all_investors as
select * from investors
Then from sql plus:
select count from all_investors@sqlweb.

Partitioning huge tables.

Hi all,
I am looking in a database for a customer where they have huge tables.
I have just executed:
SELECT SEGMENT_NAME, SEGMENT_TYPE, OWNER, (BYTES/1024/1024) MEGAS
FROM DBA_SEGMENTS
ORDER BY MEGAS DESC;
Some of them are displayed below:
GL_JE_LINES     TABLE     GL     42,272
SYS_IOT_TOP_789022     INDEX     APPLSYS     19,670
WIP_PERIOD_BALANCES     TABLE     WIP     11,157
SYS_IOT_TOP_789028     INDEX     APPLSYS     10,923
MTL_TRANSACTION_ACCOUNTS     TABLE     INV     10,796
WIP_TRANSACTION_ACCOUNTS     TABLE     WIP     10,763
RLM_SCHEDULE_LINES_ALL     TABLE     RLM     10,482
What kind of partition has anybody used for GL_JE_LINES table for example?
Any advice or comment will be really appreciated.
Thanks in advance.
Kind regards,
Francisco

Francisco,
Please see old threads for similar discussion -- http://forums.oracle.com/forums/search.jspa?threadID=&q=Partitioning&objID=c3&dateRange=all&userID=&numResults=15&rankBy=10001
Thanks,
Hussein

Create Index on a huge table

Hi,
We have a huge table and there is no index on this table, I want to create index on this table, we are working on Oracle 11g/Linux.
Our business users are frequently accessing this table and select statement taking very long time.
Please let me know which index type would be best suited for this and what would the command to create index on it.
Would appreciate your assistance.
Regards.

>
We have a huge table and there is no index on this table, I want to create index on this table, we are working on Oracle 11g/Linux.
Our business users are frequently accessing this table and select statement taking very long time.
Please let me know which index type would be best suited for this and what would the command to create index on it.
>
We need loads of information to suggest anything useful
a) What database version is yours? (if 11g, you have good more options)
b) What type of environment is it? OLTP or Warehouse?
c) How big is the table?
d) How many distinct values are there in that column that you want to index?
e) Would this index undergo lots of inserts/deletes/updates? That is, is this table used mainly for querying or will it undergo continous inserts/updates/deletes?
In case your environment is warehouse type where you load once and then mainly used for queries and more importantly, if that column has very few distinct columns (typical example is a GENDER column where you have only two distinct values), you will be largely benefitted from BITMAP index. If it's an OLTP environment where multiple processes will be inserting into the table, you never ever go near BITMAP index but do only B-Tree index.
Finally, have you arrived at a concrete reason on why you want to build that index now rather than when you designed the table? If you don't need an index for sure, better not have it. If you are on 11g, you can have INVISIBLE index.
Also, if it's a very large table, you may create the index nologging to avoid loads of redo generation (not recommended on production environment though). But you have to be aware that in the event of disaster recovery you will have to recreate the index after you restore the database. Also if you are on Dataguard environment, you have to take necessary precautions while doing NOLOGGING operations.
Edited by: user12035575 on Sep 11, 2011 12:37 PM

Performance difference between tables and materialized views

hi ,
I created a materialized view on a query that involves partition table in it.
When i used the same query and created a table out of it <create table xyz as select * from (the query)> ,the table got created quickly.
So does that mean performance wise creating table is faster than creating/refreshing the materialized view ?or is that due to the refresh method i use ?Currently i use a complete refresh

I created a materialized view on a query that involves partition table in it.
When i used the same query and created a table out of it <create table xyz as select * from (the query)> ,the table got created quickly.
So does that mean performance wise creating table is faster than creating/refreshing the materialized view ?or is that due to the refresh method i use ?Currently i use a complete refresh Well, for starters, if you created the materialized view first and then the standard table, the data for the second one has already been fetched recently and so will reduce your I/O due to caching, and will therefore be quicker. There are also other factors such as the materialized view creating other internal bits that are required to allow for refreshes to be done quickly, such as the primary key etc which you haven't created on your second creation.
What you have shown is that two completely different statements running at different times, appear to operate with different speed. It is not a comparison of whether the materialized view is slower or quicker than the create table statement.

Accessing huge tables like bseg, bkpf

1) What are the precautions we should consider while accessing huge tables like bseg, bkpf or mseg tables.

Hi,
Some tips may be:
1)
Write the Select statements covering all( or almost all ) the primary keys in the same order as defined in the DB table in the WHERE clause.
2)
Incase, if you were using the fields that were not in the Primary key of the DB table, create Secondary indexes on these fields.
3)
Always try using an Array fetch of the records on the table instead of going for Select & Endselect....
Thanks,
Vishnu.

Need help with performance for very very huge tables...

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production.
My DB has many tables and out of which I am interested in getting data from product and sales.
select /*parallel 32*/count(1) from (
select /*parallel 32*/distinct prod_code from product pd, sales s
where pd.prod_opt_cd is NULL
and s.sales_id = pd.sales_id
and s.creation_dts between to_date ('2012-07-01','YYYY-MM-DD') and
to_date ('2012-07-31','YYYY-MM-DD')
More information -
Total Rows in sales table - 18001217
Total rows in product table - 411800392
creation_dts dont have index on it.
I started query in background but after 30 hours I saw the error saying -
ORA-01555: snapshot too old: rollback segment number 153 with name
Is there any other way to get above data in optimized way?

Formatting your query a bit (and removing the hints), it evaluates to:
SELECT COUNT(1)
FROM (SELECT DISTINCT prod_code
       FROM   product pd
              INNER JOIN sales s
              ON s.sales_id = pd.sales_id
       WHERE pd.prod_opt_cd is NULL
       AND    s.creation_dts BETWEEN TO_DATE('2012-07-01','YYYY-MM-DD')
                                 AND TO_DATE('2012-07-31','YYYY-MM-DD')
      );This should be equivalent to
SELECT COUNT(DISTINCT prod_code)
FROM   product pd
       INNER JOIN sales s
       ON s.sales_id = pd.sales_id
WHERE pd.prod_opt_cd is NULL
AND    s.creation_dts BETWEEN TO_DATE('2012-07-01','YYYY-MM-DD')
                          AND TO_DATE('2012-07-31','YYYY-MM-DD');On the face of it, that's a ridiculously simple query If s.sales_id and pd.sales_id are both indexed, then I don't see why it would take a huge amount of time. Even having to perform a FTS on the sales table because creation_dts isn't indexed shouldn't make it a 30-hour query. If either of those two is not indexed, then it's a much uglier prospect in joining the two tables. However, if you often join the product and sales tables (which seems likely), then not having those fields indexed would be contraindicated.

Performance updating a extra huge table

Hi guys, just an advice. I'm handling table with more than 300 millions rows, sometimes even 800 millions and so far I came up with some good solution but now I really need to be concerned about the performance. I got a table with:
FlyID int, FlyNumber int, SettlDate datetime2, SettlPeriod double, Consumpt dec, Ixl dec, Aunit int
300 millions rows. The settldate is a date , settperiod is an half hour ( so 48 period each day).
The other table is:
BMUnit int, SettlDate datetime2, SettlPeriod double, Chargefact dec
I'm going to join the two table on bmunit=bmunit, settdate=settdate, settperiod=settperiod and with an insert filling a new table
Fingers crossed and I hope it wors within a reasonable time ( 3 hours...more?)
The real concern is:
I got another table with
FlyID int, Company varchar, CompanyID int, FromDate datetime, ToDate datetime
The logic should be something like this:
Update table1 set 1companyid=dd.companyid , company=company
where table1.flyid=company.flyid
and settlementdate >= fromdate and settlementdate <= todate
but just yesterday I tried something without date and the querr ran for more than seven hours and so I had to killed it. I'm wondering if there is a better way...all this stuff because I'm going to build several cube taking as source a big table. That's
it's going to make the retrievement really fast, so far I cut pratically entire hours but now I need you this more element and before I start to write some code I'd like to hear some your advice..
Thanks

Tables that large are always a problem to do major maintenance.
I would do your update in batches:
DECLARE @cnt int;
SET @cnt = 1;
WHILE @cnt > 0
BEGIN
Update TOP 1000000 table1 set 1companyid=dd.companyid , company=company
where table1.flyid=company.flyid
and settlementdate >= fromdate and settlementdate <= todate
SET @cnt=@@ROWCOUNT
END

Performance on huge tables

Similar Messages

Maybe you are looking for