Count Distinct
Hi @all,
the question might be answered already but I can't think of what to search for.
I've got a dimension with Active Directory attributes. And another dimension with groupnames.
One AD-account can be in many groups.
The facttable (snowflake Schema) contains the ID of the AD-Dimension and the groupname.
It could look like this:
ID GroupName GroupAlias
1 Test1 Test
1 Test2 Test
1 Test3 Test
1 hello1 Hello
1 hello2 Hello
I am actually talking about the GroupAlias which should be counted distinct.
The ID 1 is in 3 different "Test-Groups", but the alias is always "Test". So the Count should be 1.
How does the MDX should look like?
Thanks!
something i grabbed from technet. this gives the distinct count of dim members with internet sales. if you are not able to get your mdx, post it.
WITH SET MySet AS
{[Customer].[Customer Geography].[Country].&[Australia],[Customer].[Customer Geography].[Country].&[Australia],
[Customer].[Customer Geography].[Country].&[Canada],[Customer].[Customer Geography].[Country].&[France],
[Customer].[Customer Geography].[Country].&[United Kingdom],[Customer].[Customer Geography].[Country].&[United Kingdom]}
{[Measures].[Internet Sales Amount] }
MEMBER MEASURES.SETDISTINCTCOUNT AS
DISTINCTCOUNT(MySet)
SELECT {MEASURES.SETDISTINCTCOUNT} ON 0
FROM [Adventure Works]
Similar Messages
-
How to display the count distinct in a report
hi,
i have a report with multiple columns in it and with column, say A; i need to display in a calculated column B how many distinct values there are in A across the entire report; how to do that?Hi.
For example:
CALENDAR_YEAR
CALENDAR_MONTH_DESC
count(distinct TIMES.CALENDAR_MONTH_DESC by TIMES.CALENDAR_YEAR)
Count will give you how many distinct months are in year.
Regards
Goran
http://108obiee.blogspot.com -
Performance problem with more than one COUNT(DISTINCT ...) in a query
Hi,
(I hope this is the good forum).
In the following query, I have 2 Count Distinct on 2 different fields of the same table. Execution time is okay (2 s) with one or the other COUNT(DISCTINCT ...) in the SELECT clause, but is not tolerable (12 s) with both together in the query! I have
a similar case with 3 counts: 4 s each, 36 s when together!
I've looked at the execution plan, and it seems that with two count distinct, SQL server sorts the table twice before joining the results.
I do not have much experience with SQL server optimization, and I don't know what to improve and how. The SQL is generated by Business Objects, I have few possibilities to tune it. The most direct way would be to execute 2 different queries, but I'd like
to avoid it.
Any advice?
SELECT
DIM_MOIS.DATE_DEBUT_MOIS,
DIM_MOIS.NUM_ANNEE_MOIS,
DIM_DEMANDE_SCD.CAT_DEMANDE,
DIM_APPLICATION.LIB_APPLICATION,
DIM_DEMANDE_SCD.CAT_DEMANDE ,
count(distinct FAITS_DEMANDE.NB_DEMANDE_FLUX),
count(distinct FAITS_DEMANDE.NB_DEMANDE_RESOL_NIV1)
FROM
ALIM_SID.DIM_MOIS INNER JOIN ALIM_SID.DIM_JOUR ON (DIM_JOUR.SEQ_MOIS=DIM_MOIS.SEQ_MOIS)
INNER JOIN ALIM_SID.FAITS_DEMANDE ON (FAITS_DEMANDE.SEQ_JOUR=DIM_JOUR.SEQ_JOUR)
INNER JOIN ALIM_SID.DIM_APPLICATION ON (FAITS_DEMANDE.SEQ_APPLICATION=DIM_APPLICATION.SEQ_APPLICATION)
INNER JOIN ALIM_SID.DIM_DEMANDE_SCD ON (FAITS_DEMANDE.SEQ_DEMANDE_SCD=DIM_DEMANDE_SCD.SEQ_DEMANDE_SCD)
WHERE
( ( DIM_MOIS.NUM_ANNEE_MOIS ) >201301
GROUP BY
DIM_MOIS.DATE_DEBUT_MOIS,
DIM_MOIS.NUM_ANNEE_MOIS,
DIM_DEMANDE_SCD.CAT_DEMANDE,
DIM_APPLICATION.LIB_APPLICATIONHere is the script, nothing original. Hope this helps.
-- Fact table :
-- foreign keys begin by FK_,
-- measures to counted (COUNT DISTINCT) begin with NB_
CREATE TABLE [ALIM_SID].[FAITS_DEMANDE](
[SEQ_JOUR] [int] NOT NULL,
[SEQ_DEMANDE] [int] NOT NULL,
[SEQ_DEMANDE_SCD] [int] NOT NULL,
[SEQ_APPLICATION] [int] NOT NULL,
[SEQ_INTERVENANT] [int] NOT NULL,
[SEQ_SERVICE_RESPONSABLE] [int] NOT NULL,
[NB_DEMANDE_FLUX] [int] NULL,
[NB_DEMANDE_STOCK] [int] NULL,
[NB_DEMANDE_RESOLUE] [int] NULL,
[NB_DEMANDE_LIVREE] [int] NULL,
[NB_DEMANDE_MEP] [int] NULL,
[NB_DEMANDE_RESOL_NIV1] [int] NULL,
CONSTRAINT [PK_FAITS_DEMANDE] PRIMARY KEY CLUSTERED
[SEQ_JOUR] ASC,
[SEQ_DEMANDE] ASC,
[SEQ_DEMANDE_SCD] ASC,
[SEQ_APPLICATION] ASC,
[SEQ_INTERVENANT] ASC,
[SEQ_SERVICE_RESPONSABLE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [AK_AK_FAITS_DEMANDE_FAITS_DE] UNIQUE NONCLUSTERED
[SEQ_JOUR] ASC,
[SEQ_DEMANDE] ASC,
[SEQ_DEMANDE_SCD] ASC,
[SEQ_APPLICATION] ASC,
[SEQ_INTERVENANT] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [ALIM_SID].[FAITS_DEMANDE] WITH CHECK ADD CONSTRAINT [FK_FAITS_DEMANDE_DIM_APPLICATION] FOREIGN KEY([SEQ_APPLICATION])
REFERENCES [ALIM_SID].[DIM_APPLICATION] ([SEQ_APPLICATION])
GO
ALTER TABLE [ALIM_SID].[FAITS_DEMANDE] CHECK CONSTRAINT [FK_FAITS_DEMANDE_DIM_APPLICATION]
GO
ALTER TABLE [ALIM_SID].[FAITS_DEMANDE] WITH CHECK ADD CONSTRAINT [FK_FAITS_DEMANDE_DIM_DEMANDE] FOREIGN KEY([SEQ_DEMANDE])
REFERENCES [ALIM_SID].[DIM_DEMANDE] ([SEQ_DEMANDE])
GO
ALTER TABLE [ALIM_SID].[FAITS_DEMANDE] CHECK CONSTRAINT [FK_FAITS_DEMANDE_DIM_DEMANDE]
GO
ALTER TABLE [ALIM_SID].[FAITS_DEMANDE] WITH CHECK ADD CONSTRAINT [FK_FAITS_DEMANDE_DIM_DEMANDE_SCD] FOREIGN KEY([SEQ_DEMANDE_SCD])
REFERENCES [ALIM_SID].[DIM_DEMANDE_SCD] ([SEQ_DEMANDE_SCD])
GO
ALTER TABLE [ALIM_SID].[FAITS_DEMANDE] CHECK CONSTRAINT [FK_FAITS_DEMANDE_DIM_DEMANDE_SCD]
GO
ALTER TABLE [ALIM_SID].[FAITS_DEMANDE] WITH CHECK ADD CONSTRAINT [FK_FAITS_DEMANDE_DIM_INTERVENANT] FOREIGN KEY([SEQ_INTERVENANT])
REFERENCES [ALIM_SID].[DIM_INTERVENANT] ([SEQ_INTERVENANT])
GO
ALTER TABLE [ALIM_SID].[FAITS_DEMANDE] CHECK CONSTRAINT [FK_FAITS_DEMANDE_DIM_INTERVENANT]
GO
ALTER TABLE [ALIM_SID].[FAITS_DEMANDE] WITH CHECK ADD CONSTRAINT [FK_FAITS_DEMANDE_DIM_JOUR] FOREIGN KEY([SEQ_JOUR])
REFERENCES [ALIM_SID].[DIM_JOUR] ([SEQ_JOUR])
GO
ALTER TABLE [ALIM_SID].[FAITS_DEMANDE] CHECK CONSTRAINT [FK_FAITS_DEMANDE_DIM_JOUR]
GO
ALTER TABLE [ALIM_SID].[FAITS_DEMANDE] WITH CHECK ADD CONSTRAINT [FK_FAITS_DEMANDE_DIM_SERVICE_RESPONSABLE] FOREIGN KEY([SEQ_SERVICE_RESPONSABLE])
REFERENCES [ALIM_SID].[DIM_SERVICE] ([SEQ_SERVICE])
GO
ALTER TABLE [ALIM_SID].[FAITS_DEMANDE] CHECK CONSTRAINT [FK_FAITS_DEMANDE_DIM_SERVICE_RESPONSABLE]
GO
-- not shown : extended properties
-- One of the dimension tables (they all have a primary key named SEQ_)
CREATE TABLE [ALIM_SID].[DIM_JOUR](
[SEQ_JOUR] [int] IDENTITY(1,1) NOT NULL,
[SEQ_ANNEE] [int] NOT NULL,
[SEQ_MOIS] [int] NOT NULL,
[DATE_JOUR] [date] NULL,
[CODE_ANNEE] [varchar](25) NULL,
[CODE_MOIS] [varchar](25) NULL,
[CODE_SEMAINE_ISO] [varchar](25) NULL,
[CODE_JOUR_ANNEE] [varchar](25) NULL,
[CODE_ANNEE_JOUR] [varchar](25) NULL,
[LIB_JOUR] [varchar](25) NULL,
[LIB_JOUR_COURT] [varchar](25) NULL,
[JOUR_OUVRE] [tinyint] NULL,
[JOUR_CHOME] [tinyint] NULL,
CONSTRAINT [PK_DIM_JOUR] PRIMARY KEY CLUSTERED
[SEQ_JOUR] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [ALIM_SID].[DIM_JOUR] WITH CHECK ADD CONSTRAINT [FK_DIM_JOUR_DIM_ANNEE] FOREIGN KEY([SEQ_ANNEE])
REFERENCES [ALIM_SID].[DIM_ANNEE] ([SEQ_ANNEE])
GO
ALTER TABLE [ALIM_SID].[DIM_JOUR] CHECK CONSTRAINT [FK_DIM_JOUR_DIM_ANNEE]
GO
ALTER TABLE [ALIM_SID].[DIM_JOUR] WITH CHECK ADD CONSTRAINT [FK_DIM_JOUR_DIM_MOIS] FOREIGN KEY([SEQ_MOIS])
REFERENCES [ALIM_SID].[DIM_MOIS] ([SEQ_MOIS])
GO
ALTER TABLE [ALIM_SID].[DIM_JOUR] CHECK CONSTRAINT [FK_DIM_JOUR_DIM_MOIS]
GO -
Count distinct in case statement
SELECT A.P_ID,
B.P_NAME,
C.P_DESC,
SUM(CASE
WHEN A.DATE BETWEEN TRUNC(ADD_MONTHS(LAST_DAY(SYSDATE),-4) + 1) AND ADD_MONTHS(LAST_DAY(TO_DATE(SYSDATE)),-1)
AND A.M_ID IS NOT NULL
THEN 1
ELSE 0
END) AS COUNT,
SUM(CASE
WHEN A.DATE BETWEEN TRUNC(ADD_MONTHS(LAST_DAY(SYSDATE),-4) + 1) AND ADD_MONTHS(LAST_DAY(TO_DATE(SYSDATE)),-1)
AND A.M_ID IS NOT NULL
THEN COUNT(DISTINCT A.M_ID)
ELSE 0
END) AS UNIQUE_COUNT, /* Not possible */
SUM(CASE
WHEN A.DATE BETWEEN TRUNC(SYSDATE,'YEAR') AND ADD_MONTHS(LAST_DAY(TO_DATE(SYSDATE)),-1)
THEN A.AMT_1
ELSE 0
END) AS TOTAL_AMT_1,
SUM(CASE
WHEN A.DATE BETWEEN TRUNC(SYSDATE,'YEAR') AND ADD_MONTHS(LAST_DAY(TO_DATE(SYSDATE)),-1)
THEN A.AMT_2
ELSE 0
END) AS TOTAL_AMT_2
FROM TABLE_A A,
TABLE_B B,
TABLE_C C
WHERE A.P_ID = B.P_ID
AND B.PT_ID = C.PT_ID
GROUP BY A.P_ID,
B.P_NAME,
C.P_DESC
Hi,
This is a simplified version of my query.
I am trying to do 4 things here,
1. count A.M_ID
2. count distinct A.M_ID, this is where I have a problem.
3. and 4. Its just the sum from 2 diff columns.
Note that the dates for count and amt are different and I can't hard code them.
Can any one help me in the distinct count step?
This query is also running kinda slow.
So any suggestions, comments are very welcome.
Note: TABLE_A has 700 million recs, TABLE_B 4 million and TABLE_c is just 500 recs
Thanks!Taking advantage of the fact that most aggregate functions ignore nulls, you could do something like:
SELECT a.p_id, b.p_name, c.p_desc,
COUNT(CASE WHEN a.date BETWEEN TRUNC(ADD_MONTHS(LAST_DAY(sysdate),-4) + 1) AND
ADD_MONTHS(LAST_DAY(TO_DATE(sysdate)),-1) AND
a.m_id IS NOT NULL THEN m_id END) AS countall,
COUNT(DISTINCT CASE WHEN a.date BETWEEN TRUNC(ADD_MONTHS(LAST_DAY(sysdate),-4) + 1) AND
ADD_MONTHS(LAST_DAY(TO_DATE(sysdate)),-1) AND
a.m_id IS NOT NULL THEN a.m_id END) AS unique_count, /* entirely possible */
SUM(CASE WHEN a.date BETWEEN TRUNC(sysdate,'YEAR') AND
ADD_MONTHS(LAST_DAY(TO_DATE(sysdate)),-1) THEN a.amt_1
ELSE 0 END) AS total_amt_1,
SUM(CASE WHEN A.DATE BETWEEN TRUNC(sysdate,'YEAR') AND
ADD_MONTHS(LAST_DAY(TO_DATE(sysdate)),-1) THEN A.AMT_2
ELSE 0 END) AS TOTAL_AMT_2
FROM table_a a, table_b b, table_c c
WHERE a.p_id = b.p_id and
b.pt_id = c.pt_id
GROUP BY a.p_id, b.p_name, c.p_descThe two case statements inside the COUNT return either a.m_id or NULL. A simplified test case is:
SQL> WITH t as (
2 SELECT 1 m_id, 9 dt FROM dual UNION ALL
3 SELECT 1 m_id, 6 dt FROM dual UNION ALL
4 SELECT 2 m_id, 9 dt FROM dual UNION ALL
5 SELECT 2 m_id, 6 dt FROM dual UNION ALL
6 SELECT 1 m_id, 5 dt FROM dual UNION ALL
7 SELECT 2 m_id, 5 dt FROM dual UNION ALL
8 SELECT null m_id, 9 dt FROM dual)
9 SELECT count(CASE WHEN dt BETWEEN 6 and 9 THEN m_id end) cid,
10 count(distinct CASE WHEN dt BETWEEN 6 and 9 THEN m_id end) cdid
11 FROM t;
CID CDID
4 2I'm not entirely sure that you actually need the a.m_id IS NOT NULL predicate in the CASE statements, but I left it to be safe.
John -
Hi, Anybody can help !
I have problem with select count distinct.
example :
select distinct custid from order_h
total result : 141 rows selected.
but :
select count(distinct custid) from order_h
result :
COUNT(DISTINCTCUSTID)
140
Why the total difference, for listing 141 but for count 140 ?
Is my statement wrong ? How to use count and distinct ?
Thank'sLook here..
http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14200/functions032.htm#i82697
Bye
Acr -
Grand Total on Count Distinct - Crosstab
Hello
I use Discoverer 9.0.2.39.01.
On a crosstab layout: the data point is a count distinct item ,
I use Grand total at bottom and also Grand total at right.
Both totals are displayed correct but the cell combining between them is blank.
The join between the two tables is one to one and I use NVL on the count distinct item. how can I overcome the problem and cuase th blank cell to display the result of both totals?I'd just reiterate a couple of things to try.
1. reverse the join - I realize you mentioned it seems to work for simple total, but this has to be one of the most popular errors
2. check for NULLs in the data (ie: to_number(NVL(item, '0')), etc.
If still not working, then logically, what would be causing a count of distinct items not to display (ie: NULL would explain it as I would think the COUNT would screw up with not knowing how to handle a NULL).
By why count(item) would work and count_distinct(item) wouldn't is an interesting problem.
Russ -
COUNT(DISTINCT) WITH ORDER BY in an analytic function
-- I create a table with three fields: Name, Amount, and a Trans_Date.
CREATE TABLE TEST
NAME VARCHAR2(19) NULL,
AMOUNT VARCHAR2(8) NULL,
TRANS_DATE DATE NULL
-- I insert a few rows into my table:
INSERT INTO TEST ( TEST.NAME, TEST.AMOUNT, TEST.TRANS_DATE ) VALUES ( 'Anna', '110', TO_DATE('06/01/2005 08:00:00 PM', 'MM/DD/YYYY HH12:MI:SS PM') );
INSERT INTO TEST ( TEST.NAME, TEST.AMOUNT, TEST.TRANS_DATE ) VALUES ( 'Anna', '20', TO_DATE('06/01/2005 08:00:00 PM', 'MM/DD/YYYY HH12:MI:SS PM') );
INSERT INTO TEST ( TEST.NAME, TEST.AMOUNT, TEST.TRANS_DATE ) VALUES ( 'Anna', '110', TO_DATE('06/02/2005 08:00:00 PM', 'MM/DD/YYYY HH12:MI:SS PM') );
INSERT INTO TEST ( TEST.NAME, TEST.AMOUNT, TEST.TRANS_DATE ) VALUES ( 'Anna', '21', TO_DATE('06/03/2005 08:00:00 PM', 'MM/DD/YYYY HH12:MI:SS PM') );
INSERT INTO TEST ( TEST.NAME, TEST.AMOUNT, TEST.TRANS_DATE ) VALUES ( 'Anna', '68', TO_DATE('06/04/2005 08:00:00 PM', 'MM/DD/YYYY HH12:MI:SS PM') );
INSERT INTO TEST ( TEST.NAME, TEST.AMOUNT, TEST.TRANS_DATE ) VALUES ( 'Anna', '110', TO_DATE('06/05/2005 08:00:00 PM', 'MM/DD/YYYY HH12:MI:SS PM') );
INSERT INTO TEST ( TEST.NAME, TEST.AMOUNT, TEST.TRANS_DATE ) VALUES ( 'Anna', '20', TO_DATE('06/06/2005 08:00:00 PM', 'MM/DD/YYYY HH12:MI:SS PM') );
INSERT INTO TEST ( TEST.NAME, TEST.AMOUNT, TEST.TRANS_DATE ) VALUES ( 'Bill', '43', TO_DATE('06/01/2005 08:00:00 PM', 'MM/DD/YYYY HH12:MI:SS PM') );
INSERT INTO TEST ( TEST.NAME, TEST.AMOUNT, TEST.TRANS_DATE ) VALUES ( 'Bill', '77', TO_DATE('06/02/2005 08:00:00 PM', 'MM/DD/YYYY HH12:MI:SS PM') );
INSERT INTO TEST ( TEST.NAME, TEST.AMOUNT, TEST.TRANS_DATE ) VALUES ( 'Bill', '221', TO_DATE('06/03/2005 08:00:00 PM', 'MM/DD/YYYY HH12:MI:SS PM') );
INSERT INTO TEST ( TEST.NAME, TEST.AMOUNT, TEST.TRANS_DATE ) VALUES ( 'Bill', '43', TO_DATE('06/04/2005 08:00:00 PM', 'MM/DD/YYYY HH12:MI:SS PM') );
INSERT INTO TEST ( TEST.NAME, TEST.AMOUNT, TEST.TRANS_DATE ) VALUES ( 'Bill', '73', TO_DATE('06/05/2005 08:00:00 PM', 'MM/DD/YYYY HH12:MI:SS PM') );
commit;
/* I want to retrieve all the distinct count of amount for every row in an analytic function with COUNT(DISTINCT AMOUNT) sorted by name and ordered by trans_date where I get only calculate for the last four trans_date for each row (i.e., for the row "Anna 110 6/5/2005 8:00:00.000 PM," I only want to look at the previous dates from 6/2/2005 to 6/5/2005 and get the distinct count of how many amounts there are different for Anna). Note, I cannot use the DISTINCT keyword in this query because it doesn't work with the ORDER BY */
select NAME, AMOUNT, TRANS_DATE, COUNT(/*DISTINCT*/ AMOUNT) over ( partition by NAME
order by TRANS_DATE range between numtodsinterval(3,'day') preceding and current row ) as COUNT_AMOUNT
from TEST t;
This is the results I get if I just count all the AMOUNT without using distinct:
NAME AMOUNT TRANS_DATE COUNT_AMOUNT
Anna 110 6/1/2005 8:00:00.000 PM 2
Anna 20 6/1/2005 8:00:00.000 PM 2
Anna 110 6/2/2005 8:00:00.000 PM 3
Anna 21 6/3/2005 8:00:00.000 PM 4
Anna 68 6/4/2005 8:00:00.000 PM 5
Anna 110 6/5/2005 8:00:00.000 PM 4
Anna 20 6/6/2005 8:00:00.000 PM 4
Bill 43 6/1/2005 8:00:00.000 PM 1
Bill 77 6/2/2005 8:00:00.000 PM 2
Bill 221 6/3/2005 8:00:00.000 PM 3
Bill 43 6/4/2005 8:00:00.000 PM 4
Bill 73 6/5/2005 8:00:00.000 PM 4
The COUNT_DISTINCT_AMOUNT is the desired output:
NAME AMOUNT TRANS_DATE COUNT_DISTINCT_AMOUNT
Anna 110 6/1/2005 8:00:00.000 PM 1
Anna 20 6/1/2005 8:00:00.000 PM 2
Anna 110 6/2/2005 8:00:00.000 PM 2
Anna 21 6/3/2005 8:00:00.000 PM 3
Anna 68 6/4/2005 8:00:00.000 PM 4
Anna 110 6/5/2005 8:00:00.000 PM 3
Anna 20 6/6/2005 8:00:00.000 PM 4
Bill 43 6/1/2005 8:00:00.000 PM 1
Bill 77 6/2/2005 8:00:00.000 PM 2
Bill 221 6/3/2005 8:00:00.000 PM 3
Bill 43 6/4/2005 8:00:00.000 PM 3
Bill 73 6/5/2005 8:00:00.000 PM 4
Thanks in advance.you can try to write your own udag.
here is a fake example, just to show how it "could" work. I am here using only 1,2,4,8,16,32 as potential values.
create or replace type CountDistinctType as object
bitor_number number,
static function ODCIAggregateInitialize(sctx IN OUT CountDistinctType)
return number,
member function ODCIAggregateIterate(self IN OUT CountDistinctType,
value IN number) return number,
member function ODCIAggregateTerminate(self IN CountDistinctType,
returnValue OUT number, flags IN number) return number,
member function ODCIAggregateMerge(self IN OUT CountDistinctType,
ctx2 IN CountDistinctType) return number
create or replace type body CountDistinctType is
static function ODCIAggregateInitialize(sctx IN OUT CountDistinctType)
return number is
begin
sctx := CountDistinctType('');
return ODCIConst.Success;
end;
member function ODCIAggregateIterate(self IN OUT CountDistinctType, value IN number)
return number is
begin
if (self.bitor_number is null) then
self.bitor_number := value;
else
self.bitor_number := self.bitor_number+value-bitand(self.bitor_number,value);
end if;
return ODCIConst.Success;
end;
member function ODCIAggregateTerminate(self IN CountDistinctType, returnValue OUT
number, flags IN number) return number is
begin
returnValue := 0;
for i in 0..log(2,self.bitor_number) loop
if (bitand(power(2,i),self.bitor_number)!=0) then
returnValue := returnValue+1;
end if;
end loop;
return ODCIConst.Success;
end;
member function ODCIAggregateMerge(self IN OUT CountDistinctType, ctx2 IN
CountDistinctType) return number is
begin
return ODCIConst.Success;
end;
end;
CREATE or REPLACE FUNCTION CountDistinct (n number) RETURN number
PARALLEL_ENABLE AGGREGATE USING CountDistinctType;
drop table t;
create table t as select rownum r, power(2,trunc(dbms_random.value(0,6))) p from all_objects;
SQL> select r,p,countdistinct(p) over (order by r) d from t where rownum<10 order by r;
R P D
1 4 1
2 1 2
3 8 3
4 32 4
5 1 4
6 16 5
7 16 5
8 4 5
9 4 5buy some good book if you want to start at writting your own "distinct" algorythm.
Message was edited by:
Laurent Schneider
a simpler but memory killer algorithm would use a plsql table in an udag and do the count(distinct) over that table to return the value -
"group by" slow for using "count(distinct some_column)" - a better way?
Hi all,
i have an
select
count(distinct some_column),
from [...]
group by [...];
Which is slowed down for the "*count(distinct some_column)*".
The "group by" aggregates base records.
But the base records have 1:n for some #1 event #n records each.
Some of the #n records fall into group by result record (A), some other into group by result record (B).
But each shall only count +1 per event - disregarding how many of the #n record have fallen into that category.
Is there another (faster) way to count for this?
- thanks!
best regards,
Frank
Edited by: user8704911 on Jun 29, 2011 1:30 AMHi Dom,
incidentally i went in the direction you proposed:
I replaced the pl/sql collection with the global temporary table.
But the reason for doing this was a different one:
I recognized, that the group by is much faster, if applied on table or global temporary table.
However i first just moved the data from pl/sql collection to global temporary table in order to apply the group by there.
Then the group by is much faster - but the moving of data from pl/sql collection to global temporary table then took away the time.
So it was not the group by, but in general the read-access to the pl/sql collection (btw, around #65,000 records).
Now having completely replaced the pl/sql collection with global temporary table everything is fine.
cheers,
Frank -
Hi everyone,
An analyst on my team heard of a new metric called a "Stickiness" metric. It basically measures how often users are coming to your website overtime.
The definition is as follows:
# Unique Users Today/#Unique users Over Last 7 days
and also
# Unique Users Today/#Unique users Over Last 30 days
We have visit information stored in a table W_WEB_VISIT_F. For the sake of simplicity say it has columns VISIT_ID, VISIT_DATE and USER_ID (there are several more dimensional columns it has but I want to keep this exercise simple).
I want to create an aggregate table called W_WEB_VISIT_A that pre-aggregates the three values I need per day: # Unique Users Today, #Unique users Over Last 7 days and #Unique users Over Last 30 days. The only way I can think of building the aggregate table is as follows
WITH AGG AS (
SELECT
VISIT_DATE,
USER_ID
FROM W_WEB_VISIT_F
GROUP BY
VISIT_DATE,
USER_ID
select
VISIT_DATE
COUNT(DISTINCT USER_ID) UNIQUE_TODAY,
(select count(distinct hist.USER_ID) from agg hist where hist.VISIT_DATE between src.VISIT_DATE - 6 and src.VISIT_DATE) SEVEN_DAYS,
(select count(distinct hist.USER_ID) from agg hist where hist.VISIT_DATE between src.VISIT_DATE - 29 and src.VISIT_DATE) THIRTY_DAYS
from agg
group by visit_date
The problem I am having is that W_WEB_VISIT_F has several million records in it and I can't get it the above query to complete. It ran over night and didn't complete.
Is there a fancy 11g function I can use to do this for me? Is there a more efficient method?
Thanks everyone for the help!
-Joe
Edited by: user9208525 on Jan 13, 2011 6:24 AM
You guys are right. I missed the group by I had in the WITH Clause.Hi,
Haven't used the windowing clause a lot, so I wanted to give a try.
I made up some data with this query :create table t as select sysdate-dbms_random.value(0,10) visit_date, mod(level,5)+1 user_id
from dual
connect by level <= 20;Which gave me following rows :Scott@my10g SQL>select * from t order by visit_date;
VISIT_DATE USER_ID
03/01/2011 13:17:10 1
04/01/2011 05:30:30 4
04/01/2011 08:08:13 5
04/01/2011 14:42:24 3
04/01/2011 20:20:58 3
05/01/2011 17:29:24 2
05/01/2011 17:40:20 4
05/01/2011 18:32:56 2
06/01/2011 04:12:53 5
06/01/2011 08:59:18 2
06/01/2011 09:04:26 3
06/01/2011 10:14:20 1
06/01/2011 14:22:54 1
06/01/2011 19:39:04 1
08/01/2011 14:44:18 5
08/01/2011 21:38:04 5
11/01/2011 04:56:05 4
11/01/2011 18:52:29 2
11/01/2011 23:57:30 4
13/01/2011 07:24:22 3
20 rows selected.I came up to that query :select
v.*,
case
when unq_l3d is null then -1
else trunc(unq_today/unq_l3d,2)
end ratio
from (
select distinct trcdt, unq_today, unq_l3d
from (
select
trcdt,
count(user_id)
over (
order by trcdt
range between numtodsinterval(1,'DAY') preceding and current row
) unq_today,
count(user_id)
over (
order by trcdt
range between numtodsinterval(3,'DAY') preceding and current row
) unq_l3d
from (
select distinct trunc(visit_date) trcdt, user_id from t
) v
order by trcdtWith my sample data, it gives me :TRCDT UNQ_TODAY UNQ_L3D RATIO
03/01/2011 00:00:00 1 1 1.00
04/01/2011 00:00:00 4 4 1.00
05/01/2011 00:00:00 5 6 0.83
06/01/2011 00:00:00 6 10 0.60
08/01/2011 00:00:00 1 7 0.14
11/01/2011 00:00:00 2 3 0.66
13/01/2011 00:00:00 1 3 0.33
7 rows selected.where :
- UNQ_TODAY is the number of distinct user_id in the day
- UNQ_L3D is the number of distinct user_id in the last 3 days
- RATIO is UNQ_TODAY divided by UNQ_L3D +(when UNQ_L3D is not zero)+
It seems quite correct, but you would have to modify the query to fit to your needs and double-check the results !
Just noticed that my query is all wrong*... must have been missing coffeine, or sleep.... but I'm still trying !
Edited by: Nicosa on Jan 13, 2011 5:29 PM -
Set Aggregation type of Count Distinct to use correct table aggregation in
Hi there,
Currently I use OBIEE 10.1.3.4.1 , and there is a case where a fact table consist of 2 logical table source: detail and aggregate table, which has some measure using count distinct as aggregation type. The problem is everytime I browse the measure with no dimension at all , it always use detail table not aggegation one..
Really appreciate for any suggestion ..
thanks a lotHi,
I don't think it's the same case as mine. Let say I have 2 table : detail and aggegate
Detail Table consists 4 fields:
*) Period
*) Market
*) Region
*) Measure : Customer ID, Sales
Aggregate Table consists 3 fields :
*) Period
*) Region
*) Measure : Customer ID, Sales
in the measure I set aggregation type for each field:
*) Sales >> set as Sum
*) Customer ID >> copy as "Number of Customer" and set as Count Distinct
In each LTS' contents I set the level of aggregation using "Get Levels" feature..
Then I try to browse via Presentation and do some querys belows:
a) only choose single field of measure : Sales, the session shows that the value is taken from aggregation table and just as I expected.
b) choose period and sales, the session shows that the values are taken from aggregation table, and still just as I expected.
c) choose period, sales , and market, the session shows that the values are taken from detail table, just as I expected.
d) only choose single field of measure : "Number of Customer", the session shows that the value is taken from detail table , this is NOT as I expected. It suppose to take the value from aggregation table..
e) choose period and "Number of Customer", the session shows that the value is taken from detail table , this is also NOT as I expected. It suppose to take the value from aggregation table..
I've tried to override the aggregation , but still confuse how to apply in measure "Number of Customer" and did not work at all..
any idea ?
thanks a lot -
OBIEE 10G count distinct problem
Hi,
I am really new to OBI now runs into this problem.
I have a fact and three dimension tables as follows:
fact:
1. sales:
sold_vlaue (sum)
transactions (count distinct receipt_id)
branch_id (foreign key)
daykey (foreign key)
receipt_id (foreign key)
product_key (foreign key)
dimensions
1. branch
branch_id (key)
2. time
daykey (key)
3. product
product_key (key)
These tables are joined as star schema by keys mentioned above. sales.sold_value is aggregated by 'sum', transactions is by (count distinct receipt_id). I don't have a dimension for receipt_id since it's only for the calculation of transaction.
So how can I set up to make the transactions correct (count distinct receipt_id)?
I tried to set transactions as count distinct in Default aggregation rule. But the result is wrong (all 1)All right. I figured it out.
The fact table should be modelled as:
1. sales:
physical layer:
sold_vlaue
branch_id (foreign key)
daykey (foreign key)
receipt_id (foreign key)
product_key (foreign key)
The underlying query is:
select
branch_id, daykey, receipt_id, product_key
, sum(sold_value)
from table
group by
branch_id, daykey, receipt_id, product_key
BMM layer:
sold_value (sum)
transactions (count distinct receipt_id)
branch_id (foreign key)
daykey (foreign key)
receipt_id (foreign key) (removed)
product_key (foreign key) -
Logical Aggregate Column (count(distinct)) Does Not Group for SQL Server DB
When utilizing the count(distinct column_name) aggregate function within a Logical Fact source in the Business Model and Mapping layer in the RPD file the output in BI Answers is not grouping correctly for SQL Server 2008 database sources only. All Oracle database sources represent the same aggregate column correctly within BI Answers.
I am using OBIEE version 10.1.3.3.3
Does anyone know how to resolve this issue?
Thanks in advance,
KyleI thought that I would update my current findings with this issue. If you display the report in BI Answers as a Pivot Table view the aggregate column displays properly, it does not in a Table or Compound Layout view for some reason. I am still working with Oracle Support on this.
-
Count distinct values in report builder
i have a situation where i have to count distinct number of customers.
i have a query which returns the list of values of bill_to_customer_id from ra_customer_trx_all table and i have to display only the number of distinct customers. i cant do this in the query because it has to be grouped and i am doing it in an aging report. i have to list the number of distinct customers in each aging period. can anybody please help me how to achieve this in reports 6i.
thankshow can i count distinct values in reports?
the situation is like this
i have a query which lists customer_id, invoice number, amount due
so what i want is to count the distinct customer_id and display the number of distinct customers. one customer_id can be repeated any number of times but i should count it only once. -
OBIEE 10G Total by in answers not correct for count distinct fields. Is this a bug?
For example:
Sales fact has receipt no and line no as key. It has data like:
receipt no, line no, value
1, 1, 30
1, 2, 40
2, 1, 10
2, 2, 10
There is also a transaction field defined as count distinct of receipt no (in BMM)
In answers, I set to show Total.
without any filters:
receipt no, value, transactions
1, 70, 1
2, 20, 1
total: 90, 2
Transactions is 2, which is correct.
If apply filter of transaction value greater than 50.
Then transactions in total will still show 2
1, 70, 1
total: 70, 2
Is this a bug? It looks only SUM works no problem in the total by.I did look at the physical query and saw how it calculated the Total transactions and it didn't take into account of the filter of transaction value greater than 50. Don't know why though. I don't know why you want to count line no. The result would be still 2.
-
OLAP Analysis Count Distinct?
If this query is better suited to the OLAP forum, please let me know.
I am creating an Enrollment cube that has a dimension of Student with a Student_ID attribute. The fact table contains a measure column called Students, with each record having a value of 1. This results in getting a total SUM of students for a specific semester in an analysis in BI. However, this SUM aggregation does not distinctly identify students, resulting in a student that attends 4 semesters being counted as 4 students for the entire academic year. Adding COUNT(DISTINCT Student.Student_ID) to the analysis worked with an earlier test cube that I had created, but when I try to perform it on my updated cube it will only give me a COUNT(DISTINCT) for All Time, even when looking at the Semester or Academic Year levels. The only appreciable difference in my updated cube is that it has more dimensions.Yes, you can post your query on the OLAP forum because this forum is on Oracle BI Applications (pre packages applications using OBEE + DAC + Informatica).
Regards,
Benoit -
Count distinct derived measure on SCD type 2 dimension
Hi,
I have 2 dimension tables with SCD type 2 and one fact table :
DIM1 :
DIM1_SURR_KEY
DIM1_NAT_KEY
DIM1_PROPERTY1
DIM1_PROPERTY2
EFFECTIVE_DATE
EXPIRATION_DATE
DIM2 :
DIM2_SURR_KEY
DIM2_NAT_KEY
DIM2_PROPERTY1
DIM2_PROPERTY2
EFFECTIVE_DATE
EXPIRATION_DATE
FACT :
DIM1_SURR_KEY
DIM2_SURR_KEY
MEA1
MEA2
Dimension and fact tables are joined with : DIM1_SURR_KEY and DIM2_SURR_KEY.
In my business layer fact table, I would like to define this derived measure : count distinct of DIM1_NAT_KEY.
I tried to add new source for the fact table. I also tried an alias of DIM1 in physical layer.
Nothing works as I want : In Answers, if I select the fact and the count distinct, it works. Even if I select property of DIM1. But if I select property of DIM2, my count distinct return 0 (in SQL sent to Oracle DB, the formula is replaced with NULL).
Is it possible (and how) to count the number of Nat_Key with a derived measure defined in business layer ?
If not, I’ll define materialized view on fact table with natural key and dimension ID.
My first goal is to avoid end user to redefine derived column in Answers for each reports.
Thanks for your helpHi,
my advise is to map the DIM1_NAT_KEY iside the Fact Table of the Business Model, so you have a new Logical Table Source inside the Logical Fact Table that maps the DIM1_NAT_KEY as a measure. Define the Level for this Logical Table Source and set the COUNT DISTINC aggregation. In this way OBIEE knows that that measure is inside a fact an it treat like that.
I hope it helps.
Regards,
Gianluca
Maybe you are looking for
-
How to wipe all the data on a broke iPhone 4? it won't turn on or connect to iTunes
Hi there, my iphone4 does not turn on or connect to iTunes since it broke, however it was still working till the battery ran out ( I noticed it was still working because my app was on and the status showed up on my computer as I was online through my
-
How can I add new artwork like backgrounds & graphics, to the content list??
I would like to add some of the images that i have created to the list of backgrounds, graphics etc under the Content-> Artwork lists. How can i do this?
-
Performance issues while opening business rule
Hi, we're working with Hyperion version 9.2.1 and we're having some performance problems while opening business rules. I analyzed the issue and found out that it has something to do with assigning access privileges to the rule. The authorization plan
-
No-NAT on SRP527W Version 1.01.23 (006)
Hello, I'm trying to get inbound traffic working on the SRP527W in a NO-NAT deployment. The connection seams OK for a few seconds after sync, but then inbound connections fail. Is there a known working configiration setting for this setup? I have not
-
Database design - Better approach than XML/ XSD
Hi, I am designing web based application. The scenario I am working on is - I have around 50+ odd objects. They have few common things but other fields will change (Say e.g. Employee / Customers/ Assets, etc). We are also providing a facility where i