Improving Performance of Group By clause

I'm a developer who needs to create an aggreagate, or roll-up, of a large table (tens of millions of rows) using a group by clause. One of the several items I am grouping by is a numeric column called YEAR. My DBA recommended I create an index on YEAR to improve the performance of the group by clause. I read that indexes only are used when referenced in the where clause, which I do not need. Will my DBA's reccomendation help? Can you recommend a technique? Thank you.

When you select millions of rows, grouped or not, the database has to fetch
each of them, so an index on the group column isn't useful.
If you have a performance problem that cannot be solved through an index on
columns used in your where-clause, perhaps a materialzed view with the
dimension(s) of your group clause will help.

Similar Messages

Improving performance while adding groups

Hello,
I've been monitoring my crystal reports from a week or so and the report performance is going for a toss. I would like to narrate this in little detail. I have created 3 groups to select dynamic parameters and each group has a formula for itself. In my parameters I have added one parameter with 7 entities (which is hard coded), now a user can select any 3 entity out of those seven when initiallly refreshing the document, each of the parameter entity is bundeled in a conditional formula (mentioned under formula fields) for each entity. The user may select any entity and may get the respective data for that entity.
For all this i have created 3 groups and same formula is pasted under all the 3 groups. I have then made the formula group to be selected under Group expert. The report works fine and yields me correct data. However, during the grouping of the formula's crystal selects all the database tables from the database field as these tables are mentioned under the group formula. Agreed all fine.
But when I run the report the "Show SQL query" selects all the database tables under Select clause which should not be the case. Due to this even if i have selected an entity which has got only 48 to 50 records, crystal tends to select all the 16,56,053 records from the database fields which is hampering the crystal performance big time. When I run the same query in SQL it retrives the data in just 8 seconds but as crystal selecting all the records gives me data after 90 seconds which is frustrating for the user.
Please suggest me a workaround for this. Please help.
Thank you.

Hi,
I suspect the problem isn't necessarily just your grouping but with your Record Selection Formula as well. If you do not see a complete Where clause is because your Record Selection Formula is too complicated for Crystal to translate to SQL.
The same would be said for your grouping. There are two suggestions I can offer:
1) Instead of linking the tables in Crystal, use a SQL Command and generate your query in SQL directly. You can use parameters and at the very least, get a working WHERE clause.
2) Create a Stored Procedure or view that can use the logic you need to retrieve the records.
At the very least you want to be able to streamline the query to improve performance. Grouping may not be possible but my guess it's more with the Selection formula than the grouping.
Good luck,
Brian

Multiple log groups per thread to improve performance with high redo writes

I am reading Pro Oracle 10g RAC on Linux (good book). On p.35 the authors state that they recommend 3-5 redo log groups per thread if there is a "large" amount of redo.
Who does having more redo log groups improve performance? Does oracle paralelize the writes?

redo logs are configured per instance, from experience you need atleast 3 redo log groups per thread to help switch over and sufficient time for archives to complete before reuse of the first redo log group. When you have a large redo log activity there is a potential that redo log groups will switch more often and it is important that archive has completed before an exisiting redo log group can be reused, else the database /instance may hang.
I think that is what the author is referencing here, have sufficient redo log groups (based on the acitivty of your environment) to allow switching and allowing sufficient time for archives to complete.

Improving performance with IN clause

We use lot of those IN clauses for good or bad, and I am trying to improve the performance of those IN clauses.
I have looked at the documentation several times and can't seem to find a way to bind the values in a 'IN' clause. Is there any thing else that can be done to improve the IN clause performance in OCI?
Thanks a lot

Hi,
You can refer to the following URL on asktom website for detailed explanation about IN & Exists
http://asktom.oracle.com/pls/ask/f?p=4950:8:3465613697817080707::NO::F4950_P8_DISPLAYID,F4950_P8_B:953229842074,Y
HTH
Cheers,
Giridhar Kodakalla

How to include case stmt in group by clause

Hi i have a question,
How do i include a case statement in the group by clause?
For example:
Select
(case when x.ctry is null then y.ctry else x.ctry end) as coo,
sum (x.in_amt)
from
tbl1 x,
tbl2 y
where
x.id = y.id
group by
(case when x.ctry is null then y.ctry else x.ctry end)
Assume, I have got millions of records in both the tables, then my guess is, the above query might take huge time to complete.
Any alternate method to do this?

cd/ wrote:
To remove the expression from the GROUP BY clause. I didn't advocate any performance improvements, did I?No you didn't. And your advice can indeed remove the expression from the GROUP BY clause. But I'm still puzzled as to why that would be a goal in itself.
Regards,
Rob.

Can i use an analytic function instead of a group by clause?

Can i use an analytic function instead of a group by clause? Will this help in any performance improvement?

analytic can sometimes avoid scanning the table more than once :
SQL> select ename, sal, (select sum(sal) from emp where deptno=e.deptno) sum from emp e;
ENAME             SAL        SUM
SMITH             800      10875
ALLEN            1600       9400
WARD             1250       9400
JONES            2975      10875
MARTIN           1250       9400
BLAKE            2850       9400
CLARK            2450       8750
SCOTT            3000      10875
KING             5000       8750
TURNER           1500       9400
ADAMS            1100      10875
JAMES             950       9400
FORD             3000      10875
MILLER           1300       8750
14 rows selected.
Execution Plan
Plan hash value: 3189885365
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT   |      |    14 |   182 |     3   (0)| 00:00:01 |
|   1 | SORT AGGREGATE    |      |     1 |     7 |            |          |
|* 2 |   TABLE ACCESS FULL| EMP |     5 |    35 |     3   (0)| 00:00:01 |
|   3 | TABLE ACCESS FULL | EMP |    14 |   182 |     3   (0)| 00:00:01 |
Predicate Information (identified by operation id):
   2 - filter("DEPTNO"=:B1)which could be rewritten as
SQL> select ename, sal, sum(sal) over (partition by deptno) sum from emp e;
ENAME             SAL        SUM
CLARK            2450       8750
KING             5000       8750
MILLER           1300       8750
JONES            2975      10875
FORD             3000      10875
ADAMS            1100      10875
SMITH             800      10875
SCOTT            3000      10875
WARD             1250       9400
TURNER           1500       9400
ALLEN            1600       9400
JAMES             950       9400
BLAKE            2850       9400
MARTIN           1250       9400
14 rows selected.
Execution Plan
Plan hash value: 1776581816
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT   |      |    14 |   182 |     4 (25)| 00:00:01 |
|   1 | WINDOW SORT       |      |    14 |   182 |     4 (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMP |    14 |   182 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------well, there is no group by and no visible performance enhancement in my example, but Oracle7, you must have written the query as :
SQL> select ename, sal, sum from emp e,(select deptno,sum(sal) sum from emp group by deptno) s where e.deptno=s.deptno;
ENAME             SAL        SUM
SMITH             800      10875
ALLEN            1600       9400
WARD             1250       9400
JONES            2975      10875
MARTIN           1250       9400
BLAKE            2850       9400
CLARK            2450       8750
SCOTT            3000      10875
KING             5000       8750
TURNER           1500       9400
ADAMS            1100      10875
JAMES             950       9400
FORD             3000      10875
MILLER           1300       8750
14 rows selected.
Execution Plan
Plan hash value: 2661063502
| Id | Operation            | Name | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT     |      |    14 |   546 |     8 (25)| 00:00:01 |
|* 1 | HASH JOIN           |      |    14 |   546 |     8 (25)| 00:00:01 |
|   2 |   VIEW               |      |     3 |    78 |     4 (25)| 00:00:01 |
|   3 |    HASH GROUP BY     |      |     3 |    21 |     4 (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMP |    14 |    98 |     3   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL | EMP |    14 |   182 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------So maybe it helps

Query Plan 'group-by' clause

In evaluating several query plans, I've discovered several sqlstop messages that read:
<groupBy preclustered="true" sqlstop="Cannot generate SQL for the 'group-by' clause because it is not equivalent to the relational GROUP BY clause" stable="true">
I've examined the XQuery code and find no group clause and I can't find a reference to this message in the documentation or by googling the text. I assume that the grouping is being done implicitly by ODSI since it is not in the XQuery source. I have tried restructuring the code to no avail.
My question is -- is this causing me a performance problem since the sql is not generated, and if so, what steps must I take in the xquery structure to avoid this issue.
Many thanks to anyone who can provide some insight into this.
Regards,
PB

is this causing me a performance problem since the sql is not generatedMy question is - why are you asking this question? :) Do you have a performance problem? The "Best Practices" posted as an announcement in this forum might help. Otherwise engage customer support.
To answer your question - your xquery likely generates nested/hierarchical xml - and looks something like below - which can be implemented with a sql left-outer-join ordered by CUSTOMER_ID of the customer-table, and taking the rows returned and ... grouping by CUSTOMER_ID to eliminate the duplicate customer information. So there's your group-by. But you cannot write sql that has a left-outer join, and then a group-by on the left-hand side. So the group by is done in the engine. Since the results are already sorted, the group-by in the engine simply skips over the duplicates (i.e. it's basically free).
for $c in CUSTOMER()
where $c/LAST_NAME = $lastname
return
<CUSTOMER>
... $c/CUSTOMER_ID ... (: in a left-outer-join, the CUSTOMER_ID is duplicated for every ORDER :)
{ for $o in ORDER()
where $o/CUSTOMER_ID eq $c/CUSTOMER_ID
return
</CUSTOMER>

How to improve performance of the attached query

Hi,
How to improve performance of the below query, Please help. also attached explain plan -
SELECT Camp.Id,
rCam.AccountKey,
Camp.Id,
CamBilling.Cpm,
CamBilling.Cpc,
CamBilling.FlatRate,
Camp.CampaignKey,
Camp.AccountKey,
CamBilling.billoncontractedamount,
(SUM(rCam.Impressions) * 0.001 + SUM(rCam.Clickthrus)) AS GR,
rCam.AccountKey as AccountKey
FROM Campaign Camp, rCamSit rCam, CamBilling, Site xSite
WHERE Camp.AccountKey = rCam.AccountKey
AND Camp.AvCampaignKey = rCam.AvCampaignKey
AND Camp.AccountKey = CamBilling.AccountKey
AND Camp.CampaignKey = CamBilling.CampaignKey
AND rCam.AccountKey = xSite.AccountKey
AND rCam.AvSiteKey = xSite.AvSiteKey
AND rCam.RmWhen BETWEEN to_date('01-01-2009', 'DD-MM-YYYY') and
to_date('01-01-2011', 'DD-MM-YYYY')
GROUP By rCam.AccountKey,
Camp.Id,
CamBilling.Cpm,
CamBilling.Cpc,
CamBilling.FlatRate,
Camp.CampaignKey,
Camp.AccountKey,
CamBilling.billoncontractedamount
Explain Plan :-
Description Object_owner Object_name Cost Cardinality Bytes
SELECT STATEMENT, GOAL = ALL_ROWS 14 1 13
SORT AGGREGATE 1 13
VIEW GEMINI_REPORTING 14 1 13
HASH GROUP BY 14 1 103
NESTED LOOPS 13 1 103
HASH JOIN 12 1 85
TABLE ACCESS BY INDEX ROWID GEMINI_REPORTING RCAMSIT 2 4 100
NESTED LOOPS 9 5 325
HASH JOIN 7 1 40
SORT UNIQUE 2 1 18
TABLE ACCESS BY INDEX ROWID GEMINI_PRIMARY SITE 2 1 18
INDEX RANGE SCAN GEMINI_PRIMARY SITE_I0 1 1
TABLE ACCESS FULL GEMINI_PRIMARY SITE 3 27 594
INDEX RANGE SCAN GEMINI_REPORTING RCAMSIT_I 1 1 5
TABLE ACCESS FULL GEMINI_PRIMARY CAMPAIGN 3 127 2540
TABLE ACCESS BY INDEX ROWID GEMINI_PRIMARY CAMBILLING 1 1 18
INDEX UNIQUE SCAN GEMINI_PRIMARY CAMBILLING_U1 0 1

duplicate thread..
How to improve performance of attached query

How to improve performance of attached query

Hi,
How to improve performance of the below query, Please help. also attached explain plan -
SELECT Camp.Id,
rCam.AccountKey,
Camp.Id,
CamBilling.Cpm,
CamBilling.Cpc,
CamBilling.FlatRate,
Camp.CampaignKey,
Camp.AccountKey,
CamBilling.billoncontractedamount,
(SUM(rCam.Impressions) * 0.001 + SUM(rCam.Clickthrus)) AS GR,
rCam.AccountKey as AccountKey
FROM Campaign Camp, rCamSit rCam, CamBilling, Site xSite
WHERE Camp.AccountKey = rCam.AccountKey
AND Camp.AvCampaignKey = rCam.AvCampaignKey
AND Camp.AccountKey = CamBilling.AccountKey
AND Camp.CampaignKey = CamBilling.CampaignKey
AND rCam.AccountKey = xSite.AccountKey
AND rCam.AvSiteKey = xSite.AvSiteKey
AND rCam.RmWhen BETWEEN to_date('01-01-2009', 'DD-MM-YYYY') and
to_date('01-01-2011', 'DD-MM-YYYY')
GROUP By rCam.AccountKey,
Camp.Id,
CamBilling.Cpm,
CamBilling.Cpc,
CamBilling.FlatRate,
Camp.CampaignKey,
Camp.AccountKey,
CamBilling.billoncontractedamount
Explain Plan :-
Description Object_owner Object_name Cost Cardinality Bytes
SELECT STATEMENT, GOAL = ALL_ROWS 14 1 13
SORT AGGREGATE 1 13
VIEW GEMINI_REPORTING 14 1 13
HASH GROUP BY 14 1 103
NESTED LOOPS 13 1 103
HASH JOIN 12 1 85
TABLE ACCESS BY INDEX ROWID GEMINI_REPORTING RCAMSIT 2 4 100
NESTED LOOPS 9 5 325
HASH JOIN 7 1 40
SORT UNIQUE 2 1 18
TABLE ACCESS BY INDEX ROWID GEMINI_PRIMARY SITE 2 1 18
INDEX RANGE SCAN GEMINI_PRIMARY SITE_I0 1 1
TABLE ACCESS FULL GEMINI_PRIMARY SITE 3 27 594
INDEX RANGE SCAN GEMINI_REPORTING RCAMSIT_I 1 1 5
TABLE ACCESS FULL GEMINI_PRIMARY CAMPAIGN 3 127 2540
TABLE ACCESS BY INDEX ROWID GEMINI_PRIMARY CAMBILLING 1 1 18
INDEX UNIQUE SCAN GEMINI_PRIMARY CAMBILLING_U1 0 1

duplicate thread..
How to improve performance of attached query

Alternate for inner join to improve performance

Hi all,
I have used an inner join query to fetch data from five different tables into an internal table with where clause conditions.
The execution time is almost 5-6 min for this particular query(I have more data in all five DB tables- more than 10 million records in every table).
Is there any alternate for inner join to improve performance.?
TIA.
Regards,
Karthik

Hi All,
Thanks for all your interest.
SELECT a~object_id a~description a~descr_language
            a~guid AS object_guid a~process_type
            a~changed_at
            a~created_at AS created_timestamp
            a~zzorderadm_h0207 AS cpid
            a~zzorderadm_h0208 AS submitter
            a~zzorderadm_h0303 AS cust_ref
            a~zzorderadm_h1001 AS summary
            a~zzorderadm_h1005 AS summary_uc
            a~zzclose_date     AS clsd_date
            d~stat AS status
            f~priority
            FROM crmd_orderadm_h AS a INNER JOIN crmd_link AS b ON a~guid = b~guid_hi
            INNER JOIN crmd_partner AS c ON b~guid_set = c~guid
            INNER JOIN crm_jest AS d ON objnr = a~guid
            INNER JOIN crmd_activity_h AS f ON f~guid = a~guid
            INTO CORRESPONDING FIELDS OF TABLE et_service_request_list
            WHERE process_type IN lt_processtyperange
            AND   a~created_at IN lt_daterange
            AND   partner_no IN lr_partner_no
            AND   stat IN lt_statusrange
            AND   object_id IN lt_requestnumberrange
            AND   zzorderadm_h0207 IN r_cpid
            AND   zzorderadm_h0208 IN r_submitter
            AND   zzorderadm_h0303 IN r_cust_ref
            AND   zzorderadm_h1005 IN r_trans_desc
            AND   d~inact = ' '
            AND   b~objtype_hi = '05'
            AND   b~objtype_set = '07'.
            f~priority
            FROM crmd_orderadm_h AS a INNER JOIN crmd_link AS b ON a~guid = b~guid_hi
            INNER JOIN crmd_partner AS c ON b~guid_set = c~guid
            INNER JOIN crm_jest AS d ON objnr = a~guid
            INNER JOIN crmd_activity_h AS f ON f~guid = a~guid
            INTO CORRESPONDING FIELDS OF TABLE et_service_request_list
            WHERE process_type IN lt_processtyperange
            AND   a~created_at IN lt_daterange
            AND   partner_no IN lr_partner_no
            AND   stat IN lt_statusrange
            AND   object_id IN lt_requestnumberrange
            AND   zzorderadm_h0207 IN r_cpid
            AND   zzorderadm_h0208 IN r_submitter
            AND   zzorderadm_h0303 IN r_cust_ref
            AND   zzorderadm_h1005 IN r_trans_desc
            AND   d~inact = ' '
            AND   b~objtype_hi = '05'
            AND   b~objtype_set = '07'.

How to tune this query for the improve performance ?

Hi All,
How to tune this query for the improve performance ?
select a.claim_number,a.pay_cd,a.claim_occurrence_number,
case
when sum(case
when a.payment_status_cd ='0'
then a.payment_est_amt
else 0
end
)=0
then 0
else (sum(case
when a.payment_status_cd='0'and a.payment_est_amt > 0
then a.payment_est_amt
else 0
end)
- sum(case
when a.payment_status_cd<>'0'
then a.payment_amt
else 0
end))
end as estimate
from ins_claim_payment a
where a.as_of_date between '31-jan-03' and '30-aug-06'
and ( a.data_source = '25' or (a.data_source between '27' and '29'))
and substr(a.pay_cd,1,1) IN ('2','3','4','8','9')
group by a.claim_number, a.pay_cd, a.claim_occurrence_number
Thank you,
Mcka

Mcka
As well as EXPLAIN PLAN, let us know what proportion of rows are visited by this query. It may be that it is not using a full table scan when it should (or vice versa).
And of course we'd need to know what indexes are available, and how selective they are for the predicated you have in this query ...
Regards Nigel

Improve performance select in parameterized cursor.

DECLARE
CURSOR cur_inv_bal_ship_from(
l_num_qty_multiplier gfstmr4_eop_transaction_type.
inventory_multiplier_num%TYPE,
l_str_inventory_type gfstmr9_eop_txn_rule.inventory_type_code%TYPE,
l_str_type_code gfstmr9_eop_txn_rule.txn_type_code%TYPE)IS
SELECT /*+ USE_NL(EPP PI EPC) */ epc.currency_code,
SUM(ROUND(l_num_qty_multiplier * pi.inventory_qty * epc.cost_amt,2)) cost_amt
FROM gfstm62_eop_plant_part epp,
gfstm64_plant_inventory pi,
gfstm60_eop_part_cost epc
WHERE epp.gsdb_site_code = i_str_gsdb_site_code
AND epp.end_of_period_date = i_dt_end_of_period_date
AND pi.inventory_type_code = l_str_inventory_type
AND pi.txn_type_code = l_str_type_code
AND pi.gsdb_shipped_from_code = i_str_gsdb_site_code
AND epc.rate_set_code = i_str_rate_set_code
AND epc.financial_element_type_code = i_str_financial_element_code
AND pi.plant_eop_part_sakey = epp.eop_plant_part_sakey
AND pi.plant_inventory_sakey = epc.plant_inventory_sakey
GROUP BY currency_code;
BEGIN
FOR l_num_index IN i_tab_inv_txn_rule.FIRST .. i_tab_inv_txn_rule.LAST
LOOP
--Checking for ship from flag equal to 'Y'
IF i_tab_inv_txn_rule(l_num_index).ship_from_flag = g_con_y THEN
--Looping through ship from cursor
FOR l_rec_inv_bal_from IN cur_inv_bal_ship_from(
i_tab_inv_txn_rule(l_num_index).qty_multiplier_num,
i_tab_inv_txn_rule(l_num_index).inventory_type_code,
i_tab_inv_txn_rule(l_num_index).txn_type_code)
LOOP
--Incrementing index value
l_num_index1 := (l_num_index1 + 1);
--Assigning cursor values to PLSQL table
l_tab_inv_bal(l_num_index1).currency_code :=
l_rec_inv_bal_from.currency_code;
l_tab_inv_bal(l_num_index1).cost_amt :=
l_rec_inv_bal_from.cost_amt;
--Loop closing for ship from cursor
END LOOP;
END LOOP;
END;
The select query in the parameterized cursor taking long time. Below is the link in which i have shown the trace. Please let me know the way to improve performance.
http://performancetuning1978.blogspot.com/p/performance-tuning.html
thanks,
Vinodh

Hello,
your performance-tuning picture doesn't say much. How do your tables look like, how many rows, oracle version.
Why do you use nested-lööps as a hint?

Group by clause in toplink

Hi everyone,
toplink support group by clause. if, yes. how....

TopLink supports group by when performing projections using ReportQuery.
See JavaDocs for ReportQuery.addGrouping(...).
Doug

Improve performance of an inline view query

All,
I have a unique situation where I have to limit the number of rows based on group. so I created an Inline view and applied a limit on it.
ex:
SELECT col_A, col_B FROM ( SELECT col_a, count(*) FROM tab_a WHERE col_a = 'XXX' GROUP BY col_a) ROWNUM <-10.
but this design was rejected, because it seems to have a great impact on performance due to inline view.
Also I cant set a rowlimit directly on the query because GROUP BY clause is used in the select.
When the rownum is directly applied in the WHERE, first it limits the rows and then it makes a GROUP, so when user asks to retrieve 10 records, it may show less than 10 rows because of the grouping.
please help to find a alternative solution which helps to get the expected result set and same time without loosing performance.

Hi,
The sql you gave us is not valid. There is no "col_b" in your inline view, there is no "where" before "rownum<10", and the inline view returns only one row.
Try to produce a reproducible scenario with scott.emp and scott.dept generated by $ORACLE_HOME/rdbms/admin/utlsampl.sql
Regards
Laurent

Improving Performance of Dynamic SQL in Pro*C

I am using Dynamic sql format 3 in pro*C,
i.e prepare sqlstmt from :sql_stmt
declare cursor c1 for sqlstmt
But the query is slow .I am also using
order by with ltrim(rtrim(colname))
& sometimes group by .
can anyone tell how to improve performance
of cursors in dynamic sql,or how can
I use the same in Pl/sql with simultaneous
printing of the rows returned,on a file.
null

Manoj,
Typically, the slow performance is due to the SQL statement itself rather than the dynamic SQL execution code in Pro*C.
Check your explain plan to see if the query performs well. I suspect that you will find your problem here.

Improving Performance of Group By clause

Similar Messages

Maybe you are looking for