Improving Performance of Group By clause
I'm a developer who needs to create an aggreagate, or roll-up, of a large table (tens of millions of rows) using a group by clause. One of the several items I am grouping by is a numeric column called YEAR. My DBA recommended I create an index on YEAR to improve the performance of the group by clause. I read that indexes only are used when referenced in the where clause, which I do not need. Will my DBA's reccomendation help? Can you recommend a technique? Thank you.
When you select millions of rows, grouped or not, the database has to fetch
each of them, so an index on the group column isn't useful.
If you have a performance problem that cannot be solved through an index on
columns used in your where-clause, perhaps a materialzed view with the
dimension(s) of your group clause will help.
Similar Messages
-
Improving performance while adding groups
Hello,
I've been monitoring my crystal reports from a week or so and the report performance is going for a toss. I would like to narrate this in little detail. I have created 3 groups to select dynamic parameters and each group has a formula for itself. In my parameters I have added one parameter with 7 entities (which is hard coded), now a user can select any 3 entity out of those seven when initiallly refreshing the document, each of the parameter entity is bundeled in a conditional formula (mentioned under formula fields) for each entity. The user may select any entity and may get the respective data for that entity.
For all this i have created 3 groups and same formula is pasted under all the 3 groups. I have then made the formula group to be selected under Group expert. The report works fine and yields me correct data. However, during the grouping of the formula's crystal selects all the database tables from the database field as these tables are mentioned under the group formula. Agreed all fine.
But when I run the report the "Show SQL query" selects all the database tables under Select clause which should not be the case. Due to this even if i have selected an entity which has got only 48 to 50 records, crystal tends to select all the 16,56,053 records from the database fields which is hampering the crystal performance big time. When I run the same query in SQL it retrives the data in just 8 seconds but as crystal selecting all the records gives me data after 90 seconds which is frustrating for the user.
Please suggest me a workaround for this. Please help.
Thank you.Hi,
I suspect the problem isn't necessarily just your grouping but with your Record Selection Formula as well. If you do not see a complete Where clause is because your Record Selection Formula is too complicated for Crystal to translate to SQL.
The same would be said for your grouping. There are two suggestions I can offer:
1) Instead of linking the tables in Crystal, use a SQL Command and generate your query in SQL directly. You can use parameters and at the very least, get a working WHERE clause.
2) Create a Stored Procedure or view that can use the logic you need to retrieve the records.
At the very least you want to be able to streamline the query to improve performance. Grouping may not be possible but my guess it's more with the Selection formula than the grouping.
Good luck,
Brian -
Multiple log groups per thread to improve performance with high redo writes
I am reading Pro Oracle 10g RAC on Linux (good book). On p.35 the authors state that they recommend 3-5 redo log groups per thread if there is a "large" amount of redo.
Who does having more redo log groups improve performance? Does oracle paralelize the writes?redo logs are configured per instance, from experience you need atleast 3 redo log groups per thread to help switch over and sufficient time for archives to complete before reuse of the first redo log group. When you have a large redo log activity there is a potential that redo log groups will switch more often and it is important that archive has completed before an exisiting redo log group can be reused, else the database /instance may hang.
I think that is what the author is referencing here, have sufficient redo log groups (based on the acitivty of your environment) to allow switching and allowing sufficient time for archives to complete. -
Improving performance with IN clause
We use lot of those IN clauses for good or bad, and I am trying to improve the performance of those IN clauses.
I have looked at the documentation several times and can't seem to find a way to bind the values in a 'IN' clause. Is there any thing else that can be done to improve the IN clause performance in OCI?
Thanks a lotHi,
You can refer to the following URL on asktom website for detailed explanation about IN & Exists
http://asktom.oracle.com/pls/ask/f?p=4950:8:3465613697817080707::NO::F4950_P8_DISPLAYID,F4950_P8_B:953229842074,Y
HTH
Cheers,
Giridhar Kodakalla -
How to include case stmt in group by clause
Hi i have a question,
How do i include a case statement in the group by clause?
For example:
Select
(case when x.ctry is null then y.ctry else x.ctry end) as coo,
sum (x.in_amt)
from
tbl1 x,
tbl2 y
where
x.id = y.id
group by
(case when x.ctry is null then y.ctry else x.ctry end)
Assume, I have got millions of records in both the tables, then my guess is, the above query might take huge time to complete.
Any alternate method to do this?cd/ wrote:
To remove the expression from the GROUP BY clause. I didn't advocate any performance improvements, did I?No you didn't. And your advice can indeed remove the expression from the GROUP BY clause. But I'm still puzzled as to why that would be a goal in itself.
Regards,
Rob. -
Can i use an analytic function instead of a group by clause?
Can i use an analytic function instead of a group by clause? Will this help in any performance improvement?
analytic can sometimes avoid scanning the table more than once :
SQL> select ename, sal, (select sum(sal) from emp where deptno=e.deptno) sum from emp e;
ENAME SAL SUM
SMITH 800 10875
ALLEN 1600 9400
WARD 1250 9400
JONES 2975 10875
MARTIN 1250 9400
BLAKE 2850 9400
CLARK 2450 8750
SCOTT 3000 10875
KING 5000 8750
TURNER 1500 9400
ADAMS 1100 10875
JAMES 950 9400
FORD 3000 10875
MILLER 1300 8750
14 rows selected.
Execution Plan
Plan hash value: 3189885365
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 14 | 182 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
|* 2 | TABLE ACCESS FULL| EMP | 5 | 35 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |
Predicate Information (identified by operation id):
2 - filter("DEPTNO"=:B1)which could be rewritten as
SQL> select ename, sal, sum(sal) over (partition by deptno) sum from emp e;
ENAME SAL SUM
CLARK 2450 8750
KING 5000 8750
MILLER 1300 8750
JONES 2975 10875
FORD 3000 10875
ADAMS 1100 10875
SMITH 800 10875
SCOTT 3000 10875
WARD 1250 9400
TURNER 1500 9400
ALLEN 1600 9400
JAMES 950 9400
BLAKE 2850 9400
MARTIN 1250 9400
14 rows selected.
Execution Plan
Plan hash value: 1776581816
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 14 | 182 | 4 (25)| 00:00:01 |
| 1 | WINDOW SORT | | 14 | 182 | 4 (25)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 14 | 182 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------well, there is no group by and no visible performance enhancement in my example, but Oracle7, you must have written the query as :
SQL> select ename, sal, sum from emp e,(select deptno,sum(sal) sum from emp group by deptno) s where e.deptno=s.deptno;
ENAME SAL SUM
SMITH 800 10875
ALLEN 1600 9400
WARD 1250 9400
JONES 2975 10875
MARTIN 1250 9400
BLAKE 2850 9400
CLARK 2450 8750
SCOTT 3000 10875
KING 5000 8750
TURNER 1500 9400
ADAMS 1100 10875
JAMES 950 9400
FORD 3000 10875
MILLER 1300 8750
14 rows selected.
Execution Plan
Plan hash value: 2661063502
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 14 | 546 | 8 (25)| 00:00:01 |
|* 1 | HASH JOIN | | 14 | 546 | 8 (25)| 00:00:01 |
| 2 | VIEW | | 3 | 78 | 4 (25)| 00:00:01 |
| 3 | HASH GROUP BY | | 3 | 21 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL| EMP | 14 | 98 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EMP | 14 | 182 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------So maybe it helps -
In evaluating several query plans, I've discovered several sqlstop messages that read:
<groupBy preclustered="true" sqlstop="Cannot generate SQL for the 'group-by' clause because it is not equivalent to the relational GROUP BY clause" stable="true">
I've examined the XQuery code and find no group clause and I can't find a reference to this message in the documentation or by googling the text. I assume that the grouping is being done implicitly by ODSI since it is not in the XQuery source. I have tried restructuring the code to no avail.
My question is -- is this causing me a performance problem since the sql is not generated, and if so, what steps must I take in the xquery structure to avoid this issue.
Many thanks to anyone who can provide some insight into this.
Regards,
PBis this causing me a performance problem since the sql is not generatedMy question is - why are you asking this question? :) Do you have a performance problem? The "Best Practices" posted as an announcement in this forum might help. Otherwise engage customer support.
To answer your question - your xquery likely generates nested/hierarchical xml - and looks something like below - which can be implemented with a sql left-outer-join ordered by CUSTOMER_ID of the customer-table, and taking the rows returned and ... grouping by CUSTOMER_ID to eliminate the duplicate customer information. So there's your group-by. But you cannot write sql that has a left-outer join, and then a group-by on the left-hand side. So the group by is done in the engine. Since the results are already sorted, the group-by in the engine simply skips over the duplicates (i.e. it's basically free).
for $c in CUSTOMER()
where $c/LAST_NAME = $lastname
return
<CUSTOMER>
... $c/CUSTOMER_ID ... (: in a left-outer-join, the CUSTOMER_ID is duplicated for every ORDER :)
{ for $o in ORDER()
where $o/CUSTOMER_ID eq $c/CUSTOMER_ID
return
</CUSTOMER> -
How to improve performance of the attached query
Hi,
How to improve performance of the below query, Please help. also attached explain plan -
SELECT Camp.Id,
rCam.AccountKey,
Camp.Id,
CamBilling.Cpm,
CamBilling.Cpc,
CamBilling.FlatRate,
Camp.CampaignKey,
Camp.AccountKey,
CamBilling.billoncontractedamount,
(SUM(rCam.Impressions) * 0.001 + SUM(rCam.Clickthrus)) AS GR,
rCam.AccountKey as AccountKey
FROM Campaign Camp, rCamSit rCam, CamBilling, Site xSite
WHERE Camp.AccountKey = rCam.AccountKey
AND Camp.AvCampaignKey = rCam.AvCampaignKey
AND Camp.AccountKey = CamBilling.AccountKey
AND Camp.CampaignKey = CamBilling.CampaignKey
AND rCam.AccountKey = xSite.AccountKey
AND rCam.AvSiteKey = xSite.AvSiteKey
AND rCam.RmWhen BETWEEN to_date('01-01-2009', 'DD-MM-YYYY') and
to_date('01-01-2011', 'DD-MM-YYYY')
GROUP By rCam.AccountKey,
Camp.Id,
CamBilling.Cpm,
CamBilling.Cpc,
CamBilling.FlatRate,
Camp.CampaignKey,
Camp.AccountKey,
CamBilling.billoncontractedamount
Explain Plan :-
Description Object_owner Object_name Cost Cardinality Bytes
SELECT STATEMENT, GOAL = ALL_ROWS 14 1 13
SORT AGGREGATE 1 13
VIEW GEMINI_REPORTING 14 1 13
HASH GROUP BY 14 1 103
NESTED LOOPS 13 1 103
HASH JOIN 12 1 85
TABLE ACCESS BY INDEX ROWID GEMINI_REPORTING RCAMSIT 2 4 100
NESTED LOOPS 9 5 325
HASH JOIN 7 1 40
SORT UNIQUE 2 1 18
TABLE ACCESS BY INDEX ROWID GEMINI_PRIMARY SITE 2 1 18
INDEX RANGE SCAN GEMINI_PRIMARY SITE_I0 1 1
TABLE ACCESS FULL GEMINI_PRIMARY SITE 3 27 594
INDEX RANGE SCAN GEMINI_REPORTING RCAMSIT_I 1 1 5
TABLE ACCESS FULL GEMINI_PRIMARY CAMPAIGN 3 127 2540
TABLE ACCESS BY INDEX ROWID GEMINI_PRIMARY CAMBILLING 1 1 18
INDEX UNIQUE SCAN GEMINI_PRIMARY CAMBILLING_U1 0 1duplicate thread..
How to improve performance of attached query -
How to improve performance of attached query
Hi,
How to improve performance of the below query, Please help. also attached explain plan -
SELECT Camp.Id,
rCam.AccountKey,
Camp.Id,
CamBilling.Cpm,
CamBilling.Cpc,
CamBilling.FlatRate,
Camp.CampaignKey,
Camp.AccountKey,
CamBilling.billoncontractedamount,
(SUM(rCam.Impressions) * 0.001 + SUM(rCam.Clickthrus)) AS GR,
rCam.AccountKey as AccountKey
FROM Campaign Camp, rCamSit rCam, CamBilling, Site xSite
WHERE Camp.AccountKey = rCam.AccountKey
AND Camp.AvCampaignKey = rCam.AvCampaignKey
AND Camp.AccountKey = CamBilling.AccountKey
AND Camp.CampaignKey = CamBilling.CampaignKey
AND rCam.AccountKey = xSite.AccountKey
AND rCam.AvSiteKey = xSite.AvSiteKey
AND rCam.RmWhen BETWEEN to_date('01-01-2009', 'DD-MM-YYYY') and
to_date('01-01-2011', 'DD-MM-YYYY')
GROUP By rCam.AccountKey,
Camp.Id,
CamBilling.Cpm,
CamBilling.Cpc,
CamBilling.FlatRate,
Camp.CampaignKey,
Camp.AccountKey,
CamBilling.billoncontractedamount
Explain Plan :-
Description Object_owner Object_name Cost Cardinality Bytes
SELECT STATEMENT, GOAL = ALL_ROWS 14 1 13
SORT AGGREGATE 1 13
VIEW GEMINI_REPORTING 14 1 13
HASH GROUP BY 14 1 103
NESTED LOOPS 13 1 103
HASH JOIN 12 1 85
TABLE ACCESS BY INDEX ROWID GEMINI_REPORTING RCAMSIT 2 4 100
NESTED LOOPS 9 5 325
HASH JOIN 7 1 40
SORT UNIQUE 2 1 18
TABLE ACCESS BY INDEX ROWID GEMINI_PRIMARY SITE 2 1 18
INDEX RANGE SCAN GEMINI_PRIMARY SITE_I0 1 1
TABLE ACCESS FULL GEMINI_PRIMARY SITE 3 27 594
INDEX RANGE SCAN GEMINI_REPORTING RCAMSIT_I 1 1 5
TABLE ACCESS FULL GEMINI_PRIMARY CAMPAIGN 3 127 2540
TABLE ACCESS BY INDEX ROWID GEMINI_PRIMARY CAMBILLING 1 1 18
INDEX UNIQUE SCAN GEMINI_PRIMARY CAMBILLING_U1 0 1duplicate thread..
How to improve performance of attached query -
Alternate for inner join to improve performance
Hi all,
I have used an inner join query to fetch data from five different tables into an internal table with where clause conditions.
The execution time is almost 5-6 min for this particular query(I have more data in all five DB tables- more than 10 million records in every table).
Is there any alternate for inner join to improve performance.?
TIA.
Regards,
KarthikHi All,
Thanks for all your interest.
SELECT a~object_id a~description a~descr_language
a~guid AS object_guid a~process_type
a~changed_at
a~created_at AS created_timestamp
a~zzorderadm_h0207 AS cpid
a~zzorderadm_h0208 AS submitter
a~zzorderadm_h0303 AS cust_ref
a~zzorderadm_h1001 AS summary
a~zzorderadm_h1005 AS summary_uc
a~zzclose_date AS clsd_date
d~stat AS status
f~priority
FROM crmd_orderadm_h AS a INNER JOIN crmd_link AS b ON a~guid = b~guid_hi
INNER JOIN crmd_partner AS c ON b~guid_set = c~guid
INNER JOIN crm_jest AS d ON objnr = a~guid
INNER JOIN crmd_activity_h AS f ON f~guid = a~guid
INTO CORRESPONDING FIELDS OF TABLE et_service_request_list
WHERE process_type IN lt_processtyperange
AND a~created_at IN lt_daterange
AND partner_no IN lr_partner_no
AND stat IN lt_statusrange
AND object_id IN lt_requestnumberrange
AND zzorderadm_h0207 IN r_cpid
AND zzorderadm_h0208 IN r_submitter
AND zzorderadm_h0303 IN r_cust_ref
AND zzorderadm_h1005 IN r_trans_desc
AND d~inact = ' '
AND b~objtype_hi = '05'
AND b~objtype_set = '07'.
f~priority
FROM crmd_orderadm_h AS a INNER JOIN crmd_link AS b ON a~guid = b~guid_hi
INNER JOIN crmd_partner AS c ON b~guid_set = c~guid
INNER JOIN crm_jest AS d ON objnr = a~guid
INNER JOIN crmd_activity_h AS f ON f~guid = a~guid
INTO CORRESPONDING FIELDS OF TABLE et_service_request_list
WHERE process_type IN lt_processtyperange
AND a~created_at IN lt_daterange
AND partner_no IN lr_partner_no
AND stat IN lt_statusrange
AND object_id IN lt_requestnumberrange
AND zzorderadm_h0207 IN r_cpid
AND zzorderadm_h0208 IN r_submitter
AND zzorderadm_h0303 IN r_cust_ref
AND zzorderadm_h1005 IN r_trans_desc
AND d~inact = ' '
AND b~objtype_hi = '05'
AND b~objtype_set = '07'. -
How to tune this query for the improve performance ?
Hi All,
How to tune this query for the improve performance ?
select a.claim_number,a.pay_cd,a.claim_occurrence_number,
case
when sum(case
when a.payment_status_cd ='0'
then a.payment_est_amt
else 0
end
)=0
then 0
else (sum(case
when a.payment_status_cd='0'and a.payment_est_amt > 0
then a.payment_est_amt
else 0
end)
- sum(case
when a.payment_status_cd<>'0'
then a.payment_amt
else 0
end))
end as estimate
from ins_claim_payment a
where a.as_of_date between '31-jan-03' and '30-aug-06'
and ( a.data_source = '25' or (a.data_source between '27' and '29'))
and substr(a.pay_cd,1,1) IN ('2','3','4','8','9')
group by a.claim_number, a.pay_cd, a.claim_occurrence_number
Thank you,
MckaMcka
As well as EXPLAIN PLAN, let us know what proportion of rows are visited by this query. It may be that it is not using a full table scan when it should (or vice versa).
And of course we'd need to know what indexes are available, and how selective they are for the predicated you have in this query ...
Regards Nigel -
Improve performance select in parameterized cursor.
DECLARE
CURSOR cur_inv_bal_ship_from(
l_num_qty_multiplier gfstmr4_eop_transaction_type.
inventory_multiplier_num%TYPE,
l_str_inventory_type gfstmr9_eop_txn_rule.inventory_type_code%TYPE,
l_str_type_code gfstmr9_eop_txn_rule.txn_type_code%TYPE)IS
SELECT /*+ USE_NL(EPP PI EPC) */ epc.currency_code,
SUM(ROUND(l_num_qty_multiplier * pi.inventory_qty * epc.cost_amt,2)) cost_amt
FROM gfstm62_eop_plant_part epp,
gfstm64_plant_inventory pi,
gfstm60_eop_part_cost epc
WHERE epp.gsdb_site_code = i_str_gsdb_site_code
AND epp.end_of_period_date = i_dt_end_of_period_date
AND pi.inventory_type_code = l_str_inventory_type
AND pi.txn_type_code = l_str_type_code
AND pi.gsdb_shipped_from_code = i_str_gsdb_site_code
AND epc.rate_set_code = i_str_rate_set_code
AND epc.financial_element_type_code = i_str_financial_element_code
AND pi.plant_eop_part_sakey = epp.eop_plant_part_sakey
AND pi.plant_inventory_sakey = epc.plant_inventory_sakey
GROUP BY currency_code;
BEGIN
FOR l_num_index IN i_tab_inv_txn_rule.FIRST .. i_tab_inv_txn_rule.LAST
LOOP
--Checking for ship from flag equal to 'Y'
IF i_tab_inv_txn_rule(l_num_index).ship_from_flag = g_con_y THEN
--Looping through ship from cursor
FOR l_rec_inv_bal_from IN cur_inv_bal_ship_from(
i_tab_inv_txn_rule(l_num_index).qty_multiplier_num,
i_tab_inv_txn_rule(l_num_index).inventory_type_code,
i_tab_inv_txn_rule(l_num_index).txn_type_code)
LOOP
--Incrementing index value
l_num_index1 := (l_num_index1 + 1);
--Assigning cursor values to PLSQL table
l_tab_inv_bal(l_num_index1).currency_code :=
l_rec_inv_bal_from.currency_code;
l_tab_inv_bal(l_num_index1).cost_amt :=
l_rec_inv_bal_from.cost_amt;
--Loop closing for ship from cursor
END LOOP;
END LOOP;
END;
The select query in the parameterized cursor taking long time. Below is the link in which i have shown the trace. Please let me know the way to improve performance.
http://performancetuning1978.blogspot.com/p/performance-tuning.html
thanks,
VinodhHello,
your performance-tuning picture doesn't say much. How do your tables look like, how many rows, oracle version.
Why do you use nested-lööps as a hint? -
Hi everyone,
toplink support group by clause. if, yes. how....TopLink supports group by when performing projections using ReportQuery.
See JavaDocs for ReportQuery.addGrouping(...).
Doug -
Improve performance of an inline view query
All,
I have a unique situation where I have to limit the number of rows based on group. so I created an Inline view and applied a limit on it.
ex:
SELECT col_A, col_B FROM ( SELECT col_a, count(*) FROM tab_a WHERE col_a = 'XXX' GROUP BY col_a) ROWNUM <-10.
but this design was rejected, because it seems to have a great impact on performance due to inline view.
Also I cant set a rowlimit directly on the query because GROUP BY clause is used in the select.
When the rownum is directly applied in the WHERE, first it limits the rows and then it makes a GROUP, so when user asks to retrieve 10 records, it may show less than 10 rows because of the grouping.
please help to find a alternative solution which helps to get the expected result set and same time without loosing performance.Hi,
The sql you gave us is not valid. There is no "col_b" in your inline view, there is no "where" before "rownum<10", and the inline view returns only one row.
Try to produce a reproducible scenario with scott.emp and scott.dept generated by $ORACLE_HOME/rdbms/admin/utlsampl.sql
Regards
Laurent -
Improving Performance of Dynamic SQL in Pro*C
I am using Dynamic sql format 3 in pro*C,
i.e prepare sqlstmt from :sql_stmt
declare cursor c1 for sqlstmt
But the query is slow .I am also using
order by with ltrim(rtrim(colname))
& sometimes group by .
can anyone tell how to improve performance
of cursors in dynamic sql,or how can
I use the same in Pl/sql with simultaneous
printing of the rows returned,on a file.
nullManoj,
Typically, the slow performance is due to the SQL statement itself rather than the dynamic SQL execution code in Pro*C.
Check your explain plan to see if the query performs well. I suspect that you will find your problem here.
Maybe you are looking for
-
Dual boot, Mountain Lion and Mavericks and Migration Assistant
Hi all, I have a Macbook Pro Retina which cam with Mountain Lion installed, and I want to upgrade to Mavericks. However, I have a need to run some apps in Mountain Lion, as official support for those apps will not go past 10.8.5 (Avid's Pro Tools 10)
-
How can I obtain the value of a database column in a JSP variable ?
How can I obtain the value of a table column in a JSP variable? For example the <jbo:ShowValue datasource="cl" dataitem="Cont" ></jbo:ShowValue> I need to have it in a variable like this: <% String cnt=<jbo:ShowValue datasource="cl" dataitem="Cont" >
-
Personas - Themes have disappeared after update. Is this usual?
I started to use Personas but after last Firefox update they disappeared and are no longer in Tools>AddOns> Themes. Is this usual and do they have to be downloaded after each update? == This happened == Just once or twice == Firefox last updated
-
3.1 update on Windows bricked my iPhone
There are serious problems with the new iTunes 9 and 3.1 iPhone update. 1) The 3.1 update bricked my iPhone when iTunes hung at the "installing software" phase (100% CPU, no I/O for 2 hours). I could not kill the iTunes process and had to reboot. Aft
-
I am a bit new to this and want to get some advice on where to look or best ways to do this. Basically we have a client using our webapp that wants to access our data via xml so a voice response system can use it. They want to pass us an xml file req