Performance on large table join return top result
Hi all,
We're execution POC on a public sector customer.
In their enviorement, they have a SQL like this:
select top 100 * from table A join table B on A.party_id=b.party_id ( No filter here)
table A: about 40Million records, table B: 1.2 billion records, it's in a 6 nodes cluster. each table parttion by HASH 40 across all nodes.
It take 11 mins to finish this SQL, and query nodes memory almost used up.
While other Database, such as Oracle can use hint first_rows, this wll return top 100 result without query on all table. It only takes serval seconds on other DB.
My questions here, does this any Statement optimization in HANA can return top 100 result as quickly as it can.
Sam I suggest you take advice from the HANA COE if you have a PoC in play. With this setup, you need advice from a HANA expert.
I can't see any realistic business scenario where that query would be used, so whoever is asking for it is just asking it as a science experiment. There are no hints like first_rows on the HANA DB because HANA is optimized for real-world scenarios.
You could always model it as:
select top 100 * from table A join (select top 10000000 * from table B) B on A.party_id=b.party_id
But that's just as fake as your first question (though it will respond much faster...). Best is to instead find the real queries that the business will ask.
However if you have a 6 node cluster then your design is all off. HASH 40 is far too many partitions, you need 6 - one for each node. Also for your master data, don't use a partition. Instead use the REPLICA functionality ALTER TABLE - SAP HANA SQL and System Views Reference - SAP Library
For the scenario you describe, all queries should be sub-second. Check Abani's blog for some interesting advice.
Advanced Modelling: Retail Use Case
Similar Messages
-
How to improve Query performance on large table in MS SQL Server 2008 R2
I have a table with 20 million records. What is the best option to improve query performance on this table. Is partitioning the table into filegroups is a best option or splitting the table into multiple smaller tables?
Hi bala197164,
First, I want to inform that both to partition the table into filegroups and split the table into multiple smaller tables can improve the table query performance, and they are fit for different situation. For example, our table have one hundred columns and
some columns are not related to this table object directly (for example, there is a table named userinfo to store user information, it has columns address_street, address_zip,address_ province columns, at this time, we can create a new table named as Address,
and add a foreign key in userinfo table references Address table), under this situation, by splitting a large table into smaller, individual tables, queries that access only a fraction of the data can run faster because there is less data to scan. Another
situation is our table records can be grouped easily, for example, there is a column named year to store information about product release date, at this time, we can partition the table into filegroups to improve the query performance. Usually, we perform
both of methods together. Additionally, we can add index to table to improve the query performance. For more detail information, please refer to the following document:
Partitioning:
http://msdn.microsoft.com/en-us/library/ms178148.aspx
CREATE INDEX (Transact-SQL):
http://msdn.microsoft.com/en-us/library/ms188783.aspx
TechNet
Subscriber Support
If you are
TechNet Subscription user and have any feedback on our support quality, please send your feedback
here.
Allen Li
TechNet Community Support -
Performance of large tables with ADT columns
Hello,
We are planning to build a large table ( 1 billion+ rows) with one of the columns being an Advanced Data Type (ADT) column. The ADT column will be based on a TYPE that will have approximately (250 attributes).
We are using Oracle 10g R2
Can you please tell me the following:
1. How will Oracle store the data in the ADT column?
2. Will the entire ADT record fit in one block?
3. Is it still possible to partition a table on an attribute that is part of the ADT?
4. How will the performace be affected if Oracle does a full table scan of such a table?
5. How much space will Oracle take, if any, for storing a NULL in an ADT?
I think we can create indexes on the attribute of the ADT column. Please let me know if this is not true.
Thanks for your help.I agree with D.Morgan that object type with 250 attributes is doubtful.
I don't like object tables (tables with "row objects") too.
But, your table is a relational table with object column ("column object").
C.J.Date in An introduction to Database Systems (2004, page 885) says:
"... object/relational systems ... are, or should be, basically just relational systems
that support the relational domain concept (i.e., types) properly - in other words, true relational systems,
meaning in particular systems that allow users to define their own types."
1. How will Oracle store the data in the ADT column?...
For some answers see:
“OR(DBMS) or R(DBMS), That is the Question”
http://www.quest-pipelines.com/pipelines/plsql/tips.htm#OCTOBER
and (of course):
"Oracle® Database Application Developer's Guide - Object-Relational Features" 10g Release 2 (10.2)
http://download-uk.oracle.com/docs/cd/B19306_01/appdev.102/b14260/adobjadv.htm#i1006903
Regards,
Zlatko -
Refresh using alias table not returning expected results
I am getting some unexpected results in SmartView and can't figure out the cause. I have an Essbase cube that has two alias tables: the "default" and one that I call "Short Descr". I can consistently bring back data using "none" and "default", but not "Short Descr". When I use "Short Descr" data will come back for some of the members, but not for others.
Here is an example of a query using "none" as the alias table:
Net Income
RD10078
Y-T-D(MAR)
PRJG939306 20,372.05
PRJLG921508 (26.42)
Project (1,179,752.36)
If I change the alias table to "Short Descr", I get the following results:
Net Income
ITA (RD10078)
Y-T-D(MAR)
PRJG939306 (PRJG939306) 20,372.05
PRJLG921508 (PRJLG921508) (26.42)
Project (1,179,752.36)
But then if I just clear the amounts, and do a refresh, I get the following:
Net Income
ITA (RD10078)
Y-T-D(MAR)
PRJG939306 (PRJG939306)
PRJLG921508 (PRJLG921508) (26.42)
Project (1,179,752.36)
And if I pivot Net Income, I get a message that "Any comments/ functions/ formulas on the sheet will be lost.", and the PRJG939306 member is lost.
ITA (RD10078)
Y-T-D(MAR)
Net Income PRJLG921508 (PRJLG921508) (26.42)
Net Income Project (1,179,752.36)
I can't figure out why it would start treating it as a comment. It looks to be a valid member name and I don't see anything strange if I look at the member properties in EAS. And what I really don't understand is why it would bring back the member name (with alias) and a data value when I change the alias table from "none" to "Short Descr", but not bring back a value when I just clear the amounts and do a Refresh. And why is it only doing it for some of the members and not all of them?
What could cause this type of behavior?lets see if I can say this clearly. Smartview will noit convert from one alias to another. Before switching alias tables, try changing back to no alias and do a retrieve, then switch to the new alias table and retrieve. I think the problem is Smartview is not interpeting what is on the one line so it leave it blank as an unknown member.
-
Hi,
I have an application that should contain 100 million records.
Each record has a primary key.
The application fetches a row using the record primary key.
Can anyone tell me what is the problem when using such a big table?
What is the performance of the index on 100 million records?
What is the performance of updates?
Thanks
dyahavuser10952094 wrote:
Can anyone tell me what is the problem when using such a big table?
What is the performance of the index on 100 million records?
What is the performance of updates?It is not about the size of the table.
It is about the size of the I/O.
In other words, how efficient the I/O paths are for getting to the required rows. A small table can cause worse performance problems than a table 10x its size due to the way the smaller table has been defined and is used.
Simple (real world) example:
SQL> select count(*) from daily_xxxxx;
COUNT(*)
2255362806
Elapsed: 00:00:12.03
SQL>Same database, a select against the data dictionary (containing only a couple of rows in comparison):
SQL> select count(*) from all_objects;
COUNT(*)
50908
Elapsed: 00:00:49.17The difference is caused by the amount and nature of I/O that was done - not by the sizes of the tables.
There are however certain features in Oracle that can be used to effectively scale large tables for performance... and make data management significantly easier. The Partitioning Option is an Oracle Enterprise Edition feature that can be considered as an essential, if not a mandatory feature, for effectively dealing and scaling with very large tables (VLT).
However, such a feature aside - the same rules for effective performance for a small table apply to effective performance on large tables. So do not treat a VLT differently. The fundamentals for performance and scalability do not change. -
Performance during joining large tables
Hi,
I have to maintain a report which gets data from many large tables as below. Currently it is using join statement to join all 8 tables and causing a very slow performance.
SELECT
into corresponding fields of table equip
FROM caufv
join afih on afih~aufnr = caufv~aufnr
join iloa on iloa~iloan = afih~iloan
join iflos on iflos~tplnr = iloa~tplnr
join iflotx on iflos~tplnr = iflotx~tplnr
join vbak on vbak~aufnr = caufv~aufnr
join equz on equz~equnr = afih~equnr
join equi on equi~equnr = equz~equnr
join vbap on vbak~vbeln = vbap~vbeln
WHERE
Please suggest me another way, I'm newbie in ABAP. I tried using FOR ALL ENTRIES IN but it did not work. I would very appreciate if you can leave me some sample lines of code.
Thanks,Hi Dear ,
I will suggest you not to use inner join for such i.e. 8 number of table and that too huge tables. Instead use For All entries wherever possible. But before using for all entries check initial for base table and if its not possible to avoid inner join then try to minimise it. Use inner join between header and item.
Hope this will help you to solve your problem . Feel free to ask if you have any doubt.
Regards,
Vijay -
Why oh why, weird performance on joining large tables
Hello.
I have a large table cotaining dates and customer data. Organised as:
DATE CUSTOMER_ID INFOCOLUMN1 INFOCOLUMN2 etc...
Rows per date are a couple of million.
What I'm trying to do is to make a comparison between date a and date b and track changes in the database.
When I do a:
SELECT stuff
FROM table t1
INNER JOIN table t2
ON t1.CUSTOMER_ID = t2.CUSTOMER_ID
WHERE t1.date = TO_DATE(SOME_DATE)
AND t2.date = TO_DATE(SOME_OTHER_DATE)I get a result in about 40 seconds which is acceptable.
Then I try doing:
SELECT stuff
FROM (SELECT TO_DATE(LAST_DAY(ADD_MONTHS(SYSDATE, 0 - r.l))) AS DATE FROM dual INNER JOIN (SELECT level l FROM dual CONNECT BY LEVEL <= 1) r ON 1 = 1) time
INNER JOIN table t1
ON t1.date = time.date
INNER JOIN table t2
ON t1.CUSTOMER_ID = t2.CUSTOMER_ID
WHERE t2.date = ADD_MONTHS(time.date, -1)Ie i generate a datefield from a subselect which I then use to join the tables with.
When I try that the query takes an hour or two to complete with the same resultset as the first example.
THe only difference is that in the first case I give the dates literally but in the other case I generate them in the subselect. It's the same dates and they are formatted as dates in both cases.
Any ideas?
Thanks
Edited by: user1970293 on 2010-apr-29 00:52
Edited by: user1970293 on 2010-apr-29 00:59When I try that the query takes an hour or two to complete with the same resultset as the first example.If you get the same results, than why change the query to the second one?
THe only difference is that in the first case I give the dates literally but in the other case I generate them in the subselect. It's the same dates and they are formatted as dates in both cases.Dates are dates,... the formatting is just "pretty"
This
select to_date(last_day(add_months(sysdate
,0 - r.l)))
from dual
inner join (select level l from dual connect by level <= 1) r on 1 = 1doesn't make much sense... what is it supposed to do?
(by the way: you are doing a TO_DATE on a DATE...) -
Table-Valued Function not returning any results
ALTER FUNCTION [dbo].[fGetVendorInfo]
@VendorAddr char(30),
@RemitAddr char(100),
@PmntAddr char(100)
RETURNS
@VendorInfo TABLE
vengroup char(25),
vendnum char(9),
remit char(10),
payment char(10)
AS
BEGIN
insert into @VendorInfo (vengroup,vendnum)
select ks183, ks178
from hsi.keysetdata115
where ks184 like ltrim(@VendorAddr) + '%'
update @VendorInfo
set remit = r.remit
from
@VendorInfo ven
INNER JOIN
(Select ksd.ks188 as remit, ksd.ks183 as vengroup, ksd.ks178 as vendnum
from hsi.keysetdata117 ksd
inner join @VendorInfo ven
on ven.vengroup = ksd.ks183 and ven.vendnum = ksd.ks178
where ksd.ks192 like ltrim(@RemitAddr) + '%'
and ks189 = 'R') r
on ven.vengroup = r.vengroup and ven.vendnum = r.vendnum
update @VendorInfo
set payment = p.payment
from
@VendorInfo ven
INNER JOIN
(Select ksd.ks188 as payment, ksd.ks183 as vengroup, ksd.ks178 as vendnum
from hsi.keysetdata117 ksd
inner join @VendorInfo ven
on ven.vengroup = ksd.ks183 and ven.vendnum = ksd.ks178
where ksd.ks192 like ltrim(@PmntAddr) + '%'
and ks189 = 'P') p
on ven.vengroup = p.vengroup and ven.vendnum = p.vendnum
RETURN
END
GO
Hi all,
I'm having an issue where my Table-Valued Function is not returning any results.
When I break it out into a select statement (creating a table, and replacing the passed in parameters with the actual values) it works fine, but with passing in the same exact values (copy and pasted them) it just retuns an empty table.
The odd thing is I could have SWORN this worked on Friday, but not 100% sure.
The attached code is my function.
Here is how I'm calling it:
SELECT * from dbo.fGetVendorInfo('AUDIO DIGEST', '123 SESAME ST', 'TOP OF OAK MOUNTAIN')
I tried removing the "+ '%'" and passing it in, but it doesn't work.
Like I said if I break it out and run it as T-SQL, it works just fine.
Any assistance would be appreciated.Why did you use a proprietary user function instead of a VIEW? I know the answer is that your mindset does not use sets. You want procedural code. In fact, I see you use an “f-” prefix to mimic the old FORTRAN II convention for in-line functions!
Did you know that the old Sybase UPDATE.. FROM.. syntax does not work? It gives the wrong answers! Google it.
Your data element names make no sense. What is “KSD.ks188”?? Well, it is a “payment_<something>”, “KSD.ks183” is “vendor_group” and “KSD.ks178” is “vendor_nbr” in your magical world where names mean different things from table to table!
An SQL programmer might have a VIEW with the information, something like:
CREATE VIEW Vendor_Addresses
AS
SELECT vendor_group, vendor_nbr, vendor_addr, remit_addr, pmnt_addr
FROM ..
WHERE ..;
--CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking
in Sets / Trees and Hierarchies in SQL -
JOIN ON 2 different sets of table depending on the result of first set
<br>
I have a query where it returns results. I want to join this query to
2 different sets of table depending upon the first set has a result or not.
if first set didnt had a results or records then check for the second set.
SELECT
peo.email_address,
r.segment1 requistion_num,
to_char(l.line_num) line_num,
v.vendor_name supplier,
p.CONCATENATED_SEGMENTS category,
to_char(round((nvl(l.quantity, 0) * nvl(l.unit_price, 0))),'99,999,999,999.99'),
TO_CHAR(l.need_by_date,'MM/DD/YYYY') need_by_date,
pe.full_name requestor,
l.item_description,
pr.segment1 project_num,
t.task_number,
c.segment1,
c.segment2
FROM po_requisition_headers_all r,
po_requisition_lines_all l,
(SELECT project_id,task_id,code_combination_id, distribution_id,requisition_line_id,creation_date FROM
(SELECT project_id,task_id,code_combination_id,distribution_id,creation_date,requisition_line_id,ROW_NUMBER ()
OVER (PARTITION BY requisition_line_id ORDER BY requisition_line_id,distribution_id ) rn
FROM po_req_distributions_all pod) WHERE rn = 1) d,
gl_code_combinations c,
POR_CATEGORY_LOV_V p,
per_people_v7 pe,
PA_PROJECTS_ALL pr,
PA_TASKS_ALL_V t,
ap_vendors_v v,
WHERE d.creation_date >= nvl(to_date(:DATE_LAST_CHECKED,
'DD-MON-YYYY HH24:MI:SS'),SYSDATE-1)
AND
l.requisition_header_id = r.requisition_header_id
AND l.requisition_line_id = d.requisition_line_id
AND d.code_combination_id = c.code_combination_id
AND r.APPS_SOURCE_CODE = 'POR'
AND l.category_id = p.category_id
AND r.authorization_status IN ('IN PROCESS','PRE-APPROVED','APPROVED')
AND l.to_person_id = pe.person_id
AND pr.project_id(+) = d.project_id
AND t.project_id(+) = d.project_id
AND t.task_id(+) = d.task_id
AND v.vendor_id(+) = l.vendor_id
and r.requisition_header_id in(
SELECT requisition_header_id FROM po_requisition_lines_all pl
GROUP BY requisition_header_id HAVING SUM(nvl(pl.quantity,0) * nvl(pl.unit_price, 0)) >=100000)
group by
peo.email_address,
r.REQUISITION_HEADER_ID,
r.segment1 ,
to_char(l.line_num) ,
v.vendor_name,
p.CONCATENATED_SEGMENTS ,
to_char(round((nvl(l.quantity, 0) * nvl(l.unit_price, 0))),'99,999,999,999.99'),
TO_CHAR(l.need_by_date,'MM/DD/YYYY') ,
pe.full_name ,
l.item_description,
c.segment1,
c.segment2,
pr.segment1 ,
t.task_number
<b>I want to join this query with this first set </b>
SELECT b.NAME, c.segment1 CO, c.segment2 CC,
a.org_information2 Commodity_mgr,
b.organization_id, p.email_address
FROM hr_organization_information a, hr_all_organization_units b, pay_cost_allocation_keyflex c, per_people_v7 p
WHERE a.org_information_context = 'Financial Approver Information'
AND a.organization_id = b.organization_id
AND b.COST_ALLOCATION_KEYFLEX_ID = c.COST_ALLOCATION_KEYFLEX_ID
and a.ORG_INFORMATION2 = p.person_id
AND NVL (b.date_to, SYSDATE + 1) >= SYSDATE
AND b.date_from <= SYSDATE;
<b>if this doesnt return any result then i need to join the query with the 2nd set</b>
select lookup_code, meaning, v.attribute1 company, v.attribute2 cc,
decode(v.attribute3,null,null,p1.employee_number || '-' || p1.full_name) sbu_controller,
decode(v.attribute4,null,null,p2.employee_number || '-' || p2.full_name) commodity_mgr
from fnd_lookup_values_vl v,
per_people_v7 p1, per_people_v7 p2
where lookup_type = 'BIO_FIN_APPROVER_INFO'
and v.attribute3 = p1.person_id(+)
and v.attribute4 = p2.person_id(+)
order by lookup_code
How do i do it?
[pre]<br>
I have hard coded the 2 jon sets into one using UNION ALL but if one record exists in both sets how would i diferentiate between the 2 sets.
COUNT(*) will only give the total records.
if there r total 14
suppose first set gives 12 records
second set gives 4 records.
But i want only 14 records which could 12 from set 1 and 2 from set 2 since set1 and set2 can have common records.
SELECT
peo.email_address,
r.segment1 requistion_num,
to_char(l.line_num) line_num,
v.vendor_name supplier,
p.CONCATENATED_SEGMENTS category,
to_char(round((nvl(l.quantity, 0) * nvl(l.unit_price, 0))),'99,999,999,999.99'),
TO_CHAR(l.need_by_date,'MM/DD/YYYY') need_by_date,
pe.full_name requestor,
l.item_description,
pr.segment1 project_num,
t.task_number,
c.segment1,
c.segment2
FROM po_requisition_headers_all r,
po_requisition_lines_all l,
(SELECT project_id,task_id,code_combination_id, distribution_id,requisition_line_id,creation_date FROM
(SELECT project_id,task_id,code_combination_id,distribution_id,creation_date,requisition_line_id,ROW_NUMBER ()
OVER (PARTITION BY requisition_line_id ORDER BY requisition_line_id,distribution_id ) rn
FROM po_req_distributions_all pod) WHERE rn = 1) d,
gl_code_combinations c,
POR_CATEGORY_LOV_V p,
per_people_v7 pe,
PA_PROJECTS_ALL pr,
PA_TASKS_ALL_V t,
ap_vendors_v v,
WHERE d.creation_date >= nvl(to_date(:DATE_LAST_CHECKED,
'DD-MON-YYYY HH24:MI:SS'),SYSDATE-1)
AND
l.requisition_header_id = r.requisition_header_id
AND l.requisition_line_id = d.requisition_line_id
AND d.code_combination_id = c.code_combination_id
AND r.APPS_SOURCE_CODE = 'POR'
AND l.category_id = p.category_id
AND r.authorization_status IN ('IN PROCESS','PRE-APPROVED','APPROVED')
AND l.to_person_id = pe.person_id
AND pr.project_id(+) = d.project_id
AND t.project_id(+) = d.project_id
AND t.task_id(+) = d.task_id
AND v.vendor_id(+) = l.vendor_id
and r.requisition_header_id in(
SELECT requisition_header_id FROM po_requisition_lines_all pl
GROUP BY requisition_header_id HAVING SUM(nvl(pl.quantity,0) * nvl(pl.unit_price, 0)) >=100000)
group by
peo.email_address,
r.REQUISITION_HEADER_ID,
r.segment1 ,
to_char(l.line_num) ,
v.vendor_name,
p.CONCATENATED_SEGMENTS ,
to_char(round((nvl(l.quantity, 0) * nvl(l.unit_price, 0))),'99,999,999,999.99'),
TO_CHAR(l.need_by_date,'MM/DD/YYYY') ,
pe.full_name ,
l.item_description,
c.segment1,
c.segment2,
pr.segment1 ,
t.task_number
UNION ALL
SELECT
r.segment1 requistion_num,
to_char(l.line_num) line_num,
v.vendor_name supplier,
p.CONCATENATED_SEGMENTS category,
to_char(round((nvl(l.quantity, 0) * nvl(l.unit_price, 0))),'99,999,999,999.99'),
TO_CHAR(l.need_by_date,'MM/DD/YYYY') need_by_date,
pe.full_name requestor,
l.item_description,
pr.segment1 project_num,
t.task_number,
c.segment1,
c.segment2
FROM po_requisition_headers_all r,
po_requisition_lines_all l,
(SELECT project_id,task_id,code_combination_id, distribution_id,requisition_line_id,creation_date FROM
(SELECT project_id,task_id,code_combination_id,distribution_id,creation_date,requisition_line_id,ROW_NUMBER ()
OVER (PARTITION BY requisition_line_id ORDER BY requisition_line_id,distribution_id ) rn
FROM po_req_distributions_all pod) WHERE rn = 1) d,
gl_code_combinations c,
POR_CATEGORY_LOV_V p,
per_people_v7 pe,
PA_PROJECTS_ALL pr,
PA_TASKS_ALL_V t,
ap_vendors_v v,
fnd_lookup_values_vl flv,
per_people_v7 p1,
per_people_v7 p2
WHERE d.creation_date >= nvl(to_date('11-APR-2008',
'DD-MON-YYYY HH24:MI:SS'),SYSDATE-1)
AND
l.requisition_header_id = r.requisition_header_id
AND l.requisition_line_id = d.requisition_line_id
AND d.code_combination_id = c.code_combination_id
AND r.APPS_SOURCE_CODE = 'POR'
AND l.org_id = 141
AND l.category_id = p.category_id
AND r.authorization_status IN ('IN PROCESS','PRE-APPROVED','APPROVED')
AND l.to_person_id = pe.person_id
AND pr.project_id(+) = d.project_id
AND t.project_id(+) = d.project_id
AND t.task_id(+) = d.task_id
AND v.vendor_id(+) = l.vendor_id
AND flv.attribute1=c.segment1
AND flv.attribute2=c.segment2
AND flv.lookup_type = 'BIO_FIN_APPROVER_INFO'
and flv.attribute3 = p1.person_id(+)
and flv.attribute4 = p2.person_id(+)
and r.requisition_header_id in(
SELECT requisition_header_id FROM po_requisition_lines_all pl
GROUP BY requisition_header_id HAVING SUM(nvl(pl.quantity,0) * nvl(pl.unit_price, 0)) >=100000)
group by
r.REQUISITION_HEADER_ID,
r.segment1 ,
to_char(l.line_num) ,
v.vendor_name,
p.CONCATENATED_SEGMENTS ,
to_char(round((nvl(l.quantity, 0) * nvl(l.unit_price, 0))),'99,999,999,999.99'),
TO_CHAR(l.need_by_date,'MM/DD/YYYY') ,
pe.full_name ,
l.item_description,
c.segment1,
c.segment2,
pr.segment1 ,
t.task_number -
Performance Tuning Query on Large Tables
Hi All,
I am new to the forums and have a very specic use case which requires performance tuning, but there are some limitations on what changes I am actualy able to make to the underlying data. Essentially I have two tables which contain what should be identical data, but for reasons of a less than optimal operational nature, the datasets are different in a number of ways.
Essentially I am querying call record detail data. Table 1 (refered to in my test code as TIME_TEST) is what I want to consider the master data, or the "ultimate truth" if you will. Table one contains the CALLED_NUMBER which is always in a consistent format. It also contains the CALLED_DATE_TIME and DURATION (in seconds).
Table 2 (TIME_TEST_COMPARE) is a reconciliation table taken from a different source but there is no consistent unique identifiers or PK-FK relations. This table contains a wide array of differing CALLED_NUMBER formats, hugely different to that in the master table. There is also scope that the time stamp may be out by up to 30 seconds, crazy I know, but that's just the way it is and I have no control over the source of this data. Finally the duration (in seconds) can be out by up to 5 seconds +/-.
I want to create a join returning all of the master data and matching the master table to the reconciliation table on CALLED_NUMBER / CALL_DATE_TIME / DURATION. I have written the query which works from a logi perspective but it performs very badly (master table = 200,000 records, rec table = 6,000,000+ records). I am able to add partitions (currently the tables are partitioned by month of CALL_DATE_TIME) and can also apply indexes. I cannot make any changes at this time to the ETL process loading the data into these tables.
I paste below the create table and insert scripts to recreate my scenario & the query that I am using. Any practical suggestions for query / table optimisation would be greatly appreciated.
Kind regards
Mike
-------------- NOTE: ALL DATA HAS BEEN DE-SENSITISED
/* --- CODE TO CREATE AND POPULATE TEST TABLES ---- */
--CREATE MAIN "TIME_TEST" TABLE: THIS TABLE HOLDS CALLED NUMBERS IN A SPECIFIED/PRE-DEFINED FORMAT
CREATE TABLE TIME_TEST ( CALLED_NUMBER VARCHAR2(50 BYTE),
CALLED_DATE_TIME DATE, DURATION NUMBER );
COMMIT;
-- CREATE THE COMPARISON TABLE "TIME_TEST_COMPARE": THIS TABLE HOLDS WHAT SHOULD BE (BUT ISN'T) IDENTICAL CALL DATA.
-- THE DATA CONTAINS DIFFERING NUMBER FORMATS, SLIGHTLY DIFFERENT CALL TIMES (ALLOW +/-60 SECONDS - THIS IS FOR A GOOD, ALBEIT UNHELPFUL, REASON)
-- AND DURATIONS (ALLOW +/- 5 SECS)
CREATE TABLE TIME_TEST_COMPARE ( CALLED_NUMBER VARCHAR2(50 BYTE),
CALLED_DATE_TIME DATE, DURATION NUMBER )
COMMIT;
--CREATE INSERT DATA FOR THE MAIN TEST TIME TABLE
INSERT INTO TIME_TEST ( CALLED_NUMBER, CALLED_DATE_TIME,
DURATION ) VALUES (
'7721345675', TO_DATE( '11/09/2011 06:10:21 AM', 'MM/DD/YYYY HH:MI:SS AM'), 202);
INSERT INTO TIME_TEST ( CALLED_NUMBER, CALLED_DATE_TIME,
DURATION ) VALUES (
'7721345675', TO_DATE( '11/09/2011 08:10:21 AM', 'MM/DD/YYYY HH:MI:SS AM'), 19);
INSERT INTO TIME_TEST ( CALLED_NUMBER, CALLED_DATE_TIME,
DURATION ) VALUES (
'7721345675', TO_DATE( '11/09/2011 07:10:21 AM', 'MM/DD/YYYY HH:MI:SS AM'), 35);
INSERT INTO TIME_TEST ( CALLED_NUMBER, CALLED_DATE_TIME,
DURATION ) VALUES (
'7721345675', TO_DATE( '11/09/2011 09:10:21 AM', 'MM/DD/YYYY HH:MI:SS AM'), 30);
INSERT INTO TIME_TEST ( CALLED_NUMBER, CALLED_DATE_TIME,
DURATION ) VALUES (
'7721345675', TO_DATE( '11/09/2011 06:18:47 AM', 'MM/DD/YYYY HH:MI:SS AM'), 6);
INSERT INTO TIME_TEST ( CALLED_NUMBER, CALLED_DATE_TIME,
DURATION ) VALUES (
'7721345675', TO_DATE( '11/09/2011 06:20:21 AM', 'MM/DD/YYYY HH:MI:SS AM'), 20);
COMMIT;
-- CREATE INSERT DATA FOR THE TABLE WHICH NEEDS TO BE COMPARED:
INSERT INTO TIME_TEST_COMPARE ( CALLED_NUMBER, CALLED_DATE_TIME,
DURATION ) VALUES (
'7721345675', TO_DATE( '11/09/2011 06:10:51 AM', 'MM/DD/YYYY HH:MI:SS AM'), 200);
INSERT INTO TIME_TEST_COMPARE ( CALLED_NUMBER, CALLED_DATE_TIME,
DURATION ) VALUES (
'00447721345675', TO_DATE( '11/09/2011 08:10:59 AM', 'MM/DD/YYYY HH:MI:SS AM'), 21);
INSERT INTO TIME_TEST_COMPARE ( CALLED_NUMBER, CALLED_DATE_TIME,
DURATION ) VALUES (
'07721345675', TO_DATE( '11/09/2011 07:11:20 AM', 'MM/DD/YYYY HH:MI:SS AM'), 33);
INSERT INTO TIME_TEST_COMPARE ( CALLED_NUMBER, CALLED_DATE_TIME,
DURATION ) VALUES (
'+447721345675', TO_DATE( '11/09/2011 09:10:01 AM', 'MM/DD/YYYY HH:MI:SS AM'), 33);
INSERT INTO TIME_TEST_COMPARE ( CALLED_NUMBER, CALLED_DATE_TIME,
DURATION ) VALUES (
'+447721345675#181345', TO_DATE( '11/09/2011 06:18:35 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 6);
INSERT INTO TIME_TEST_COMPARE ( CALLED_NUMBER, CALLED_DATE_TIME,
DURATION ) VALUES (
'004477213456759777799', TO_DATE( '11/09/2011 06:19:58 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 17);
COMMIT;
/* --- QUERY TO UNDERTAKE MATCHING WHICH REQUIRES OPTIMISATION --------- */
SELECT MAIN.CALLED_NUMBER AS MAIN_CALLED_NUMBER, MAIN.CALLED_DATE_TIME AS MAIN_CALL_DATE_TIME, MAIN.DURATION AS MAIN_DURATION,
COMPARE.CALLED_NUMBER AS COMPARE_CALLED_NUMBER,COMPARE.CALLED_DATE_TIME AS COMPARE_CALLED_DATE_TIME,
COMPARE.DURATION COMPARE_DURATION
FROM
SELECT CALLED_NUMBER, CALLED_DATE_TIME, DURATION
FROM TIME_TEST
) MAIN
LEFT JOIN
SELECT CALLED_NUMBER, CALLED_DATE_TIME, DURATION
FROM TIME_TEST_COMPARE
) COMPARE
ON INSTR(COMPARE.CALLED_NUMBER,MAIN.CALLED_NUMBER)<> 0
AND MAIN.CALLED_DATE_TIME BETWEEN COMPARE.CALLED_DATE_TIME-(60/86400) AND COMPARE.CALLED_DATE_TIME+(60/86400)
AND MAIN.DURATION BETWEEN MAIN.DURATION-(5/86400) AND MAIN.DURATION+(5/86400);What does your execution plan look like?
-
I have some queries that use an inner join between a table with a few hundred rows and a table that will eventually have many millions of rows. The join is on an integer value that is part of the primary key on the larger table. The primary key
on said table consists of the integer and another field which is a BigInt (representing Date/time to the millisecond). The query also has predicate (where clause) with an exact match for the BigInt.
The query take about a second to execute at the moment but I was wondering whether I should expect a large increase in execution time as the years go by.
Is an inner join on the large table advisable?
By the way, the first field in the primary key is the integer followed by the BigInt, so any thought of selecting on the BigInt into temp table before attempting the join probably won't help.
R CampbellJust in case anyone wants to see the full picture (which I am not actually expecting) this is a script for all SQL objects involved.
The numbers of rows in the tables are.
Tags 5,000
NumericSamples millions (over time)
TagGroups 50
GroupTags 500
CREATE TABLE [dbo].[Tags](
[ID] [int] NOT NULL,
[TagName] [nvarchar](110) NOT NULL,
[Address] [nvarchar](80) NULL,
[DataTypeID] [smallint] NOT NULL,
[DatasourceID] [smallint] NOT NULL,
[Location] [nvarchar](4000) NULL,
[Properties] [nvarchar](4000) NULL,
[LastReadSampleTime] [bigint] NOT NULL,
[Archived] [bit] NOT NULL,
CONSTRAINT [Tags_ID_PK] PRIMARY KEY CLUSTERED
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Tags] WITH NOCHECK ADD CONSTRAINT [Tags_DatasourceID_Datasources_ID_FK] FOREIGN KEY([DatasourceID])
REFERENCES [dbo].[Datasources] ([ID])
GO
ALTER TABLE [dbo].[Tags] CHECK CONSTRAINT [Tags_DatasourceID_Datasources_ID_FK]
GO
ALTER TABLE [dbo].[Tags] WITH NOCHECK ADD CONSTRAINT [Tags_DataTypeID_DataTypes_ID_FK] FOREIGN KEY([DataTypeID])
REFERENCES [dbo].[DataTypes] ([ID])
GO
ALTER TABLE [dbo].[Tags] CHECK CONSTRAINT [Tags_DataTypeID_DataTypes_ID_FK]
GO
ALTER TABLE [dbo].[Tags] ADD CONSTRAINT [DF_Tags_LastReadSampleTime] DEFAULT ((552877956000000000.)) FOR [LastReadSampleTime]
GO
ALTER TABLE [dbo].[Tags] ADD DEFAULT ((0)) FOR [Archived]
GO
CREATE TABLE [dbo].[NumericSamples](
[TagID] [int] NOT NULL,
[SampleDateTime] [bigint] NOT NULL,
[SampleValue] [float] NULL,
[QualityID] [smallint] NOT NULL,
CONSTRAINT [NumericSamples_TagIDSampleDateTime_PK] PRIMARY KEY CLUSTERED
[TagID] ASC,
[SampleDateTime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[NumericSamples] WITH NOCHECK ADD CONSTRAINT [NumericSamples_QualityID_Qualities_ID_FK] FOREIGN KEY([QualityID])
REFERENCES [dbo].[Qualities] ([ID])
GO
ALTER TABLE [dbo].[NumericSamples] CHECK CONSTRAINT [NumericSamples_QualityID_Qualities_ID_FK]
GO
ALTER TABLE [dbo].[NumericSamples] WITH NOCHECK ADD CONSTRAINT [NumericSamples_TagID_Tags_ID_FK] FOREIGN KEY([TagID])
REFERENCES [dbo].[Tags] ([ID])
GO
ALTER TABLE [dbo].[NumericSamples] CHECK CONSTRAINT [NumericSamples_TagID_Tags_ID_FK]
GO
CREATE TABLE [dbo].[TagGroups](
[ID] [int] IDENTITY(1,1) NOT NULL,
[TagGroup] [varchar](50) NULL,
[Aggregates] [varchar](250) NULL,
[NumericData] [bit] NULL,
CONSTRAINT [PK_TagGroups] PRIMARY KEY CLUSTERED
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[TagGroups] ADD CONSTRAINT [DF_Tag_Groups_Aggregates] DEFAULT ('First') FOR [Aggregates]
GO
ALTER TABLE [dbo].[TagGroups] ADD CONSTRAINT [DF_TagGroups_NumericData] DEFAULT ((1)) FOR [NumericData]
GO
CREATE TABLE [dbo].[GroupTags](
[ID] [int] IDENTITY(1,1) NOT NULL,
[TagGroupID] [int] NULL,
[TagName] [varchar](150) NULL,
[ColumnName] [varchar](50) NULL,
[SortOrder] [int] NULL,
[TotalFactor] [float] NULL,
CONSTRAINT [PK_GroupTags] PRIMARY KEY CLUSTERED
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[GroupTags] WITH CHECK ADD CONSTRAINT [FK_GroupTags_TagGroups] FOREIGN KEY([TagGroupID])
REFERENCES [dbo].[TagGroups] ([ID])
ON UPDATE CASCADE
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[GroupTags] CHECK CONSTRAINT [FK_GroupTags_TagGroups]
GO
ALTER TABLE [dbo].[GroupTags] ADD CONSTRAINT [DF_GroupTags_TotalFactor] DEFAULT ((1)) FOR [TotalFactor]
GO
CREATE VIEW [dbo].[vw_GroupTags]
AS
SELECT TOP (10000) dbo.TagGroups.TagGroup AS TableName, dbo.TagGroups.Aggregates AS SortOrder, dbo.GroupTags.SortOrder AS TagIndex, dbo.GroupTags.TagName,
dbo.Tags.ID AS TagId, dbo.TagGroups.NumericData, dbo.GroupTags.TotalFactor, dbo.GroupTags.ColumnName
FROM dbo.TagGroups INNER JOIN
dbo.GroupTags ON dbo.TagGroups.ID = dbo.GroupTags.TagGroupID INNER JOIN
dbo.Tags ON dbo.GroupTags.TagName = dbo.Tags.TagName
ORDER BY SortOrder, TagIndex
CREATE procedure [dbo].[GetTagTableValues]
@SampleDateTime bigint,
@TableName varchar(50),
@PadRows int = 0
as
BEGIN
DECLARE @i int
DECLARE @ResultSet table(TagName varchar(150), SampleValue float, ColumnName varchar(50), SortOrder int, TagIndex int)
set @i = 0
INSERT INTO @ResultSet
SELECT vw_GroupTags.TagName, NumericSamples.SampleValue, vw_GroupTags.ColumnName, vw_GroupTags.SortOrder, vw_GroupTags.TagIndex
FROM vw_GroupTags INNER JOIN NumericSamples ON vw_GroupTags.TagId = NumericSamples.TagID
WHERE (vw_GroupTags.TableName = @TableName) AND (NumericSamples.SampleDateTime = @SampleDateTime)
set @i = @@ROWCOUNT
if @i < @PadRows
BEGIN
WHILE @i < @PadRows
BEGIN
INSERT @ResultSet (TagName, SampleValue, ColumnName, SortOrder, TagIndex) VALUES ('', NULL, '', 0, 0)
set @i = @i + 1
END
END
select TagName, SampleValue, ColumnName, SortOrder, TagIndex
from @ResultSet
END
R Campbell -
Hi,
I built a query with 4 tables inside (load from Oracle DB and two of them are quite big, more than millions of rows). After filtering, I tried to build relationships between tables using Table.Join formula. However, the process took extremly long time to
bring out results (I ended the process after 15 mins' processing). There's a status bar kept updating while the query was processing, which is showed as . I suppose
this is because the query folding didn't working, so PQ had to load all the data to local memory first then do the opertion, instead of doing all the work on the source system side. Am I right? If yes, is there any ways to solve this issue?
Thanks.
Regards,
QilongHi Curt,
Here's the query that I'm refering,
let
Source = Oracle.Database("reporting"),
AOLOT_HISTS = Source{[Schema="GEN",Item="MVIEW$_AOLOT_HISTS"]}[Data],
WORK_WEEK = Source{[Schema="GEN",Item="WORK_WEEK"]}[Data],
DEVICES = Source{[Schema="GEN",Item="MVIEW$_DEVICES"]}[Data],
AO_LOTS = Source{[Schema="GEN",Item="MVIEW$_AO_LOTS"]}[Data],
Filter_WorkWeek = Table.SelectRows(WORK_WEEK, each ([WRWK_YEAR] = 2015) and (([WORK_WEEK] = 1) or ([WORK_WEEK] = 2) or ([WORK_WEEK] = 3))),
Filter_AlotHists = Table.SelectRows(AOLOT_HISTS, each ([STEP_NAME] = "BAKE" or [STEP_NAME] = "COLD TEST-IFLEX" or [STEP_NAME] = "COLD TEST-MFLEX") and ([OUT_QUANTITY] <> 0)),
#"Added Custom" = Table.AddColumn(Filter_AlotHists, "Custom", each Table.SelectRows(Filter_WorkWeek, (table2Row) => [PROCESS_END_TIME] >= table2Row[WRWK_START_DATE] and [PROCESS_END_TIME] <= table2Row[WRWK_END_DATE])),
#"Expand Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"WRWK_YEAR", "WORK_WEEK", "WRWK_START_DATE", "WRWK_END_DATE"}, {"WRWK_YEAR", "WORK_WEEK",
"WRWK_START_DATE", "WRWK_END_DATE"}),
Filter_AolotHists_byWeek = Table.SelectRows(#"Expand Custom", each ([WORK_WEEK] <> null)),
SelectColumns_AolotHists = Table.SelectColumns(Filter_AolotHists_byWeek,{"ALOT_NUMBER", "STEP_NAME", "PROCESS_START_TIME", "PROCESS_END_TIME", "START_QUANTITY", "OUT_QUANTITY", "REJECT_QUANTITY",
"WRWK_FISCAL_YEAR", "WRWK_WORK_WEEK_NO"}),
Filter_Devices= Table.SelectRows(DEVICES, each ([DEPARTMENT] = "TEST1")),
SelectColumns_Devices = Table.SelectColumns(Filter_Devices,{"DEVC_NUMBER", "PCKG_CODE"}),
Filter_AoLots = Table.SelectRows(AO_LOTS, each Text.Contains([DEVC_NUMBER], "MC09XS3400AFK") or Text.Contains([DEVC_NUMBER], "MC09XS3400AFKR2") or Text.Contains([DEVC_NUMBER], "MC10XS3412CHFK") or Text.Contains([DEVC_NUMBER],
"MC10XS3412CHFKR2")),
SelectColumns_AoLots = Table.SelectColumns(Filter_AoLots,{"ALOT_NUMBER", "DEVC_NUMBER", "TRACECODE", "WAFERLOTNUMBER"}),
TableJoin = Table.Join(SelectColumns_AolotHists, "ALOT_NUMBER", Table.PrefixColumns(SelectColumns_AoLots, "AoLots"), "AoLots.ALOT_NUMBER"),
TableJoin1 = Table.Join(TableJoin, "AoLots.DEVC_NUMBER", Table.PrefixColumns(SelectColumns_Devices, "Devices"), "Devices.DEVC_NUMBER")
in
TableJoin1
Could you please give me some hints why it needs so long to process?
Thanks. -
Joining on two large tables give breaks connections?
I am doing inner join on two large tables (172,818 and 146,215) give breaks connections. Using Oracle 8.1.7.0.0
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Oraigannly I was trying to do alter table to add constrints, it gave the error aswell.
ALTER TABLE a ADD (CONSTRAINT a_FK
FOREIGN KEY (a_ID, a_VERSION)
REFERENCES b(b_ID, b_VERSION)
DEFERRABLE INITIALLY IMMEDIATE)
Also gives same error. Trace file does not make sense to me.Thanks for the reply, no luck yet.
SQL> show parameter optimizer_max_permutations ;
NAME TYPE VALUE
optimizer_max_permutations integer 80000
SQL> show parameter resource_limit ;
NAME TYPE VALUE
resource_limit boolean FALSE
SQL> -
How to NOT return MSQuery results in a table in Excel 2010
In Excel 2010, I only have the options to return my query results in a table, pivot table, or both. If I convert the resulting table to a range, I lose the query. Why am I restricted to using a table?
I want to return the results into a spreadsheet and retain the query like I could in 2003. I want to be able to layout and format as I want. I want to be able to enter criteria in a cell and have the query return the proper results.
I am querying via ODBC sources against SQL and Redbrick databases. I was able to do this in 2003. Is the option still available? Am I not finding it?Hi Bet_T,
What do you have against getting the results in a table as opposed to a non-
formatted range? There are quite some advantages a table has over a regular range of cells, like automatic expansion when used as the source of a chart or a pivot table.
You should still be able to add parameters to cells.
Regards, Jan Karel Pieterse|Excel MVP|http://www.jkp-ads.com -
Filter item limits - search not returning any results with large number of elements otherwise ok
Hi,
We are working through a problem we've encountered with Azure Search. We are building a filter string based on an "id eq 'xxx' or id eq 'ccc' or id eq 'vvv' etc. The id's are provided in a collection and we loop through, building the string until it's
ready to apply.
We are using 2015-02-28 preview at the moment.
We are encountering a situation where, after approximately 20 id's Azure Search doesn't return any results, nor does it appear to return any error code. I'm pretty sure that the url length is less than 8K.
Is there any limit on the number of filter elements in a query?We followed up offline.
The symptom in this case was a 200 response with no body. The underlying cause is a URL parsing bug that tries to interpret colons in the query string as the delimiter of a URL scheme (like https:), but with a hard length limit of 1KB. We will work
on a fix for both the underlying URL parsing issue and the issue that caused it to surface as a body-less 200.
In the meantime, the workaround is to put colons as close to the beginning of the URL query string as possible. Specifically, putting $filter and facets first, and putting expressions with colons within those first, will mitigate this in most cases.
Note that the .NET SDK puts $filter and facets near the beginning of the query string by default, so if you're consuming Azure Search you might want to give it a try:
http://www.nuget.org/packages/Microsoft.Azure.Search/
Maybe you are looking for
-
Hi Experts, Am running FBL3N-G/L Line Item Display, and am trying to get PROFIT CENTERS for Balanace sheet type of accounts, but Its not showing up. But, am getting for Profit Loss accounts. 1) Why its so? 2) Then How to get the PROFIT CENTERS for th
-
How to delete a Customer in its SALES AREA?
Hi All, I've deleted a Customer by t.code OBR2. But it exists again in its SALES AREA. Could anyone tell me how to delete it in the sales area too? Thanks
-
How do I stop Dashboard from eating my processor?
Lately my fan has been running way more than usual, and independently of how much work I'm making it do. When this is happening, and I take a peek at the activity monitor, its always one of three "Dashboard Client" processes that is going crazy, not
-
Can I keep 2 video cards in Mac Pro if only using 1 at a time?
I have an ATI Radeon HD 2600 and an NVIDIA 8800GT for my Mac Pro (early 2008). Been having issues with my NVIDIA card since the 10.6.4 update as well as issues with Photoshop CS5. Im finding out that its probably better off to go the ATI for general
-
Photoshop pics not displaying in iphoto 08
Photoshop pictures are not displaying. Just a black photo appears. However, the photos do appear normal in Apple Preview. Why is this?