COUNT DISTINCT for measure

Hello!
I know that I am repeating this question, but I did not find any clear answer on metalink, newsletter or this forum.
There is an article on Metalink: How To Determine The Number Of Members Per Hierarchy Level Of a Dimension: Note:429614.1. This is not exactly what I need.
Also I found some replies from Keith to use SUM and MAX aggregations - this also does not help.
I will give my example:
Assume we have
Two dimensions ITEMS and STORES each having two levels
ITEMS: CATEGORY-PRODUCT
STORES: REGION-STORE
one measure REFERENCE
Assume that we loaded REFERENCE with 1 on each ITEM/STORE where we had sales.
Now if we aggregate ITEMS using SUM we receive unique references on level CATEGORY, but how could we receive unique references per REGION/CATEGORY?
If we do MAX across STORES when we just receive the best store references for CATEGORY. That is not what I want.
Are there any solution? I have seen that P_S asked Keith for magic file which propose 3 approaches. Could I pretend to one copy of this file?
BIG THANKS in advance!
Regards,
Kirill Boyko ([email protected])

Hi Kirill
I will send you the file and hopefully it will help you create a solution. If not, email me directly and I will work with you to see if we can resolve this specific requirement.
Keith
Keith Laker
Oracle EMEA Consulting
OLAP Blog: http://oracleOLAP.blogspot.com/
OLAP Wiki: http://wiki.oracle.com/page/Oracle+OLAP+Option
DM Blog: http://oracledmt.blogspot.com/
OWB Blog : http://blogs.oracle.com/warehousebuilder/
OWB Wiki : http://wiki.oracle.com/page/Oracle+Warehouse+Builder
DW on OTN : http://www.oracle.com/technology/products/bi/db/11g/index.html

Similar Messages

Count distinct derived measure on SCD type 2 dimension

Hi,
I have 2 dimension tables with SCD type 2 and one fact table :
DIM1 :
DIM1_SURR_KEY
DIM1_NAT_KEY
DIM1_PROPERTY1
DIM1_PROPERTY2
EFFECTIVE_DATE
EXPIRATION_DATE
DIM2 :
DIM2_SURR_KEY
DIM2_NAT_KEY
DIM2_PROPERTY1
DIM2_PROPERTY2
EFFECTIVE_DATE
EXPIRATION_DATE
FACT :
DIM1_SURR_KEY
DIM2_SURR_KEY
MEA1
MEA2
Dimension and fact tables are joined with : DIM1_SURR_KEY and DIM2_SURR_KEY.
In my business layer fact table, I would like to define this derived measure : count distinct of DIM1_NAT_KEY.
I tried to add new source for the fact table. I also tried an alias of DIM1 in physical layer.
Nothing works as I want : In Answers, if I select the fact and the count distinct, it works. Even if I select property of DIM1. But if I select property of DIM2, my count distinct return 0 (in SQL sent to Oracle DB, the formula is replaced with NULL).
Is it possible (and how) to count the number of Nat_Key with a derived measure defined in business layer ?
If not, I’ll define materialized view on fact table with natural key and dimension ID.
My first goal is to avoid end user to redefine derived column in Answers for each reports.
Thanks for your help

Hi,
my advise is to map the DIM1_NAT_KEY iside the Fact Table of the Business Model, so you have a new Logical Table Source inside the Logical Fact Table that maps the DIM1_NAT_KEY as a measure. Define the Level for this Logical Table Source and set the COUNT DISTINC aggregation. In this way OBIEE knows that that measure is inside a fact an it treat like that.
I hope it helps.
Regards,
Gianluca

OLAP Analysis Count Distinct?

If this query is better suited to the OLAP forum, please let me know.
I am creating an Enrollment cube that has a dimension of Student with a Student_ID attribute. The fact table contains a measure column called Students, with each record having a value of 1. This results in getting a total SUM of students for a specific semester in an analysis in BI. However, this SUM aggregation does not distinctly identify students, resulting in a student that attends 4 semesters being counted as 4 students for the entire academic year. Adding COUNT(DISTINCT Student.Student_ID) to the analysis worked with an earlier test cube that I had created, but when I try to perform it on my updated cube it will only give me a COUNT(DISTINCT) for All Time, even when looking at the Semester or Academic Year levels. The only appreciable difference in my updated cube is that it has more dimensions.

Yes, you can post your query on the OLAP forum because this forum is on Oracle BI Applications (pre packages applications using OBEE + DAC + Informatica).
Regards,
Benoit

Distinct count inside a measure group with other measures

Hello,
I have 1 distinct count inside a measure group with other measures, sum, count etc. I know this is not recommended due to poor processing performance and query response time.
Processing performance I can live with if it means not having another measure group, which increases processing time anyway.
I have used the recommended approach before and it generated many questions about what this second measure group is for (visible via excel), even though I made the distinct count appear in the main measure group via a calculated measure.
(it would be nice if you could hide measure groups)
However my question is: is query response time only effected when the distinct count is used in the query? Or is query response time effected regardless if the distinct count is used or not??
Below is an extract from the 2005 distinct count optimizer white paper. It’s not completely clear but I assume if effects queries regardless if distinct count is used or not?
"By adding other measures to the measure group holding a distinct count measure, all of the other measures will be at the same granularity as the distinct count measure, resulting in inefficient data structures and suboptimal
queries."

You might also be interested in reading this blog post, which deals with a similar scenario, to get a feeling for some of the things that might be going on behind the scenes:
http://cwebbbi.wordpress.com/2012/11/27/storage-engine-caching-measures-and-measure-groups/
Chris
Check out my MS BI blog I also do
SSAS, PowerPivot, MDX and DAX consultancy
and run public SQL Server and BI training courses in the UK

Totals for Count Distinct

I need to display totals for Count Distinct measures. I want to display these above a table view.
We have done this before by creating hidden columns with level-based measures for totals and then displaying the first row of these hidden columns in a narrative view above the table. We have also used MAX(RSUM()) within requests, sometimes.
These solutions won't work, because I need Count Distinct() measures (so simple sums and counts will give inaccurate results) and I may navigate to the request with filters at different levels (so LBMs won't work, either).
The only solution I can think of is to have LBMs for each level and have duplicate dashboards that differ only in which variation of this request with which level's LBMs are displayed for the totals. That seems like too much of a kluge. There should be a simpler, better way to do this.

I was trying to reproduce your issue with "Sample Sales" - but can't figure out which columns you'd like to see. Can you please post couple columns - and which count distinct you need? That would make it easier to reproduce the issue.
I was thinking that it might be difficult to pull it in 1 report (since you can't completely exclude columns in table view). I have two suggestions:
a) did you try to create a separate report and combine it with existing one (same Dashboard page)?
b) did you try Pivot Table and its calculated column feature? I've had some success with it when I needed to combine measures at different levels on the same report (i needed to see daily totals for 3 specific days, monthly values for specific months, and couple annual totals). This way you could have it on the same report.
I just tried A. And it worked (again, not sure if this is applicable to your situation). I used "Server Complex Aggregate" in column options. The formula is showing: SELECT "D5 Employee"."E01 Employee Name" saw_0, COUNT(DISTINCT "D1 Customer"."C1 Cust Name") saw_1 FROM "Sample Sales" ORDER BY saw_0
Edited by: wildmight on Oct 30, 2009 9:35 AM

"group by" slow for using "count(distinct some_column)" - a better way?

Hi all,
i have an
select
count(distinct some_column),
from [...]
group by [...];
Which is slowed down for the "*count(distinct some_column)*".
The "group by" aggregates base records.
But the base records have 1:n for some #1 event #n records each.
Some of the #n records fall into group by result record (A), some other into group by result record (B).
But each shall only count +1 per event - disregarding how many of the #n record have fallen into that category.
Is there another (faster) way to count for this?
- thanks!
best regards,
Frank
Edited by: user8704911 on Jun 29, 2011 1:30 AM

Hi Dom,
incidentally i went in the direction you proposed:
I replaced the pl/sql collection with the global temporary table.
But the reason for doing this was a different one:
I recognized, that the group by is much faster, if applied on table or global temporary table.
However i first just moved the data from pl/sql collection to global temporary table in order to apply the group by there.
Then the group by is much faster - but the moving of data from pl/sql collection to global temporary table then took away the time.
So it was not the group by, but in general the read-access to the pl/sql collection (btw, around #65,000 records).
Now having completely replaced the pl/sql collection with global temporary table everything is fine.
cheers,
Frank

Logical Aggregate Column (count(distinct)) Does Not Group for SQL Server DB

When utilizing the count(distinct column_name) aggregate function within a Logical Fact source in the Business Model and Mapping layer in the RPD file the output in BI Answers is not grouping correctly for SQL Server 2008 database sources only. All Oracle database sources represent the same aggregate column correctly within BI Answers.
I am using OBIEE version 10.1.3.3.3
Does anyone know how to resolve this issue?
Thanks in advance,
Kyle

I thought that I would update my current findings with this issue. If you display the report in BI Answers as a Pivot Table view the aggregate column displays properly, it does not in a Table or Compound Layout view for some reason. I am still working with Oracle Support on this.

OBIEE 10G Total by in answers not correct for count distinct fields. Is this a bug?

For example:
Sales fact has receipt no and line no as key. It has data like:
receipt no, line no, value
1, 1, 30
1, 2, 40
2, 1, 10
2, 2, 10
There is also a transaction field defined as count distinct of receipt no (in BMM)
In answers, I set to show Total.
without any filters:
receipt no, value, transactions
1, 70, 1
2, 20, 1
total: 90, 2
Transactions is 2, which is correct.
If apply filter of transaction value greater than 50.
Then transactions in total will still show 2
1, 70, 1
total: 70, 2
Is this a bug? It looks only SUM works no problem in the total by.

I did look at the physical query and saw how it calculated the Total transactions and it didn't take into account of the filter of transaction value greater than 50. Don't know why though. I don't know why you want to count line no. The result would be still 2.

Sum Distinct for a measure object

Hi,
Can anybody let me know how to go ahead and get a sum distinct for a particular measure object in a Universe?

If your database supports the syntax, you can simply use SUM(DISTINCT table.measure_column) and it should apply the distinct function. Note, however, that this can result in under-reporting certain values. For example, assume you have two sales orders with the same amount: $100. As long as you include the order number (or primary key order_id or otherwise) you will get a total of $200. But if you remove order number, the sum(distinct) will see two equal values (both $100) and your total will only be $100 instead of the correct answer $200.
What is the real problem you are trying to solve?

Query rewrite for COUNT(DISTINCT)

Hi,
I am having fact table with different dimension keys.
CREATE TABLE FACT
TIME_SKEY NUMBER
REGION_SKEY NUMBER,
AC_SKEY NUMBER
I need to take COUNT(DISTINCT(AC_SKEY) for TIME_SKEY and REGION_SKEY. There are oracle dimension defined for time and region which are using TIME_SKEY and REGION_SKEY. I have created MV with query rewrite with COUNT(DISTINCT) but it is not using dimension if I am using any other level and MV can't be fast refreshed as it was build using COUNT(DISTINCT).
CREATE MATERIALIZED VIEW AC_MV
NOCACHE
NOLOGGING
NOCOMPRESS
NOPARALLEL
BUILD IMMEDIATE
REFRESH COMPLETE ON DEMAND
WITH PRIMARY KEY
ENABLE QUERY REWRITE
AS
SELECT
TIME_SKEY ,
REGION_SKEY,
COUNT (DISTINCTAC_SKEY)
FROM FACT
GROUP BY TIME_SKEY, REGION_SKEY;
Query used to retrieve data is as below
SELECT TIME_SKEY, COUNT(DISTINCT AC_SKEY) OVER (PARTITION BY TIME_SKEY) UNIQ_AC, COUNT(DISTINCT AC_SKEY) OVER () UNIQ_AC1
FROM FACT;
There can be other queries based on time / region dimension.
Can you please provide help in solving above issue?
Thanks,
Pritesh

What version of the Oracle database?

Count Distinct over a Window

Hi everyone,
An analyst on my team heard of a new metric called a "Stickiness" metric. It basically measures how often users are coming to your website overtime.
The definition is as follows:
# Unique Users Today/#Unique users Over Last 7 days
and also
# Unique Users Today/#Unique users Over Last 30 days
We have visit information stored in a table W_WEB_VISIT_F. For the sake of simplicity say it has columns VISIT_ID, VISIT_DATE and USER_ID (there are several more dimensional columns it has but I want to keep this exercise simple).
I want to create an aggregate table called W_WEB_VISIT_A that pre-aggregates the three values I need per day: # Unique Users Today, #Unique users Over Last 7 days and #Unique users Over Last 30 days. The only way I can think of building the aggregate table is as follows
WITH AGG AS (
SELECT
VISIT_DATE,
USER_ID
FROM W_WEB_VISIT_F
GROUP BY
VISIT_DATE,
USER_ID
select
VISIT_DATE
COUNT(DISTINCT USER_ID) UNIQUE_TODAY,
(select count(distinct hist.USER_ID) from agg hist where hist.VISIT_DATE between src.VISIT_DATE - 6 and src.VISIT_DATE) SEVEN_DAYS,
(select count(distinct hist.USER_ID) from agg hist where hist.VISIT_DATE between src.VISIT_DATE - 29 and src.VISIT_DATE) THIRTY_DAYS
from agg
group by visit_date
The problem I am having is that W_WEB_VISIT_F has several million records in it and I can't get it the above query to complete. It ran over night and didn't complete.
Is there a fancy 11g function I can use to do this for me? Is there a more efficient method?
Thanks everyone for the help!
-Joe
Edited by: user9208525 on Jan 13, 2011 6:24 AM
You guys are right. I missed the group by I had in the WITH Clause.

Hi,
Haven't used the windowing clause a lot, so I wanted to give a try.
I made up some data with this query :create table t as select sysdate-dbms_random.value(0,10) visit_date, mod(level,5)+1 user_id
from dual
connect by level <= 20;Which gave me following rows :Scott@my10g SQL>select * from t order by visit_date;
VISIT_DATE             USER_ID
03/01/2011 13:17:10          1
04/01/2011 05:30:30          4
04/01/2011 08:08:13          5
04/01/2011 14:42:24          3
04/01/2011 20:20:58          3
05/01/2011 17:29:24          2
05/01/2011 17:40:20          4
05/01/2011 18:32:56          2
06/01/2011 04:12:53          5
06/01/2011 08:59:18          2
06/01/2011 09:04:26          3
06/01/2011 10:14:20          1
06/01/2011 14:22:54          1
06/01/2011 19:39:04          1
08/01/2011 14:44:18          5
08/01/2011 21:38:04          5
11/01/2011 04:56:05          4
11/01/2011 18:52:29          2
11/01/2011 23:57:30          4
13/01/2011 07:24:22          3
20 rows selected.I came up to that query :select
        v.*,
        case
                when unq_l3d is null then -1
                else trunc(unq_today/unq_l3d,2)
        end ratio
from (
        select distinct trcdt, unq_today, unq_l3d
        from (
                select
                trcdt,
                count(user_id)
                over (
                        order by trcdt
                        range between numtodsinterval(1,'DAY') preceding and current row
                ) unq_today,
                count(user_id)
                over (
                        order by trcdt
                        range between numtodsinterval(3,'DAY') preceding and current row
                ) unq_l3d
                from (
                        select distinct trunc(visit_date) trcdt, user_id from t
) v
order by trcdtWith my sample data, it gives me :TRCDT                UNQ_TODAY    UNQ_L3D RATIO
03/01/2011 00:00:00          1          1 1.00
04/01/2011 00:00:00          4          4 1.00
05/01/2011 00:00:00          5          6 0.83
06/01/2011 00:00:00          6         10 0.60
08/01/2011 00:00:00          1          7 0.14
11/01/2011 00:00:00          2          3 0.66
13/01/2011 00:00:00          1          3 0.33
7 rows selected.where :
- UNQ_TODAY is the number of distinct user_id in the day
- UNQ_L3D is the number of distinct user_id in the last 3 days
- RATIO is UNQ_TODAY divided by UNQ_L3D +(when UNQ_L3D is not zero)+
It seems quite correct, but you would have to modify the query to fit to your needs and double-check the results !
Just noticed that my query is all wrong*... must have been missing coffeine, or sleep.... but I'm still trying !
Edited by: Nicosa on Jan 13, 2011 5:29 PM

Set Aggregation type of Count Distinct to use correct table aggregation in

Hi there,
Currently I use OBIEE 10.1.3.4.1 , and there is a case where a fact table consist of 2 logical table source: detail and aggregate table, which has some measure using count distinct as aggregation type. The problem is everytime I browse the measure with no dimension at all , it always use detail table not aggegation one..
Really appreciate for any suggestion ..
thanks a lot

Hi,
I don't think it's the same case as mine. Let say I have 2 table : detail and aggegate
Detail Table consists 4 fields:
*) Period
*) Market
*) Region
*) Measure : Customer ID, Sales
Aggregate Table consists 3 fields :
*) Period
*) Region
*) Measure : Customer ID, Sales
in the measure I set aggregation type for each field:
*) Sales >> set as Sum
*) Customer ID >> copy as "Number of Customer" and set as Count Distinct
In each LTS' contents I set the level of aggregation using "Get Levels" feature..
Then I try to browse via Presentation and do some querys belows:
a) only choose single field of measure : Sales, the session shows that the value is taken from aggregation table and just as I expected.
b) choose period and sales, the session shows that the values are taken from aggregation table, and still just as I expected.
c) choose period, sales , and market, the session shows that the values are taken from detail table, just as I expected.
d) only choose single field of measure : "Number of Customer", the session shows that the value is taken from detail table , this is NOT as I expected. It suppose to take the value from aggregation table..
e) choose period and "Number of Customer", the session shows that the value is taken from detail table , this is also NOT as I expected. It suppose to take the value from aggregation table..
I've tried to override the aggregation , but still confuse how to apply in measure "Number of Customer" and did not work at all..
any idea ?
thanks a lot

Count Distinct Wtih CASE Statement - Does not follow aggregation path

All,
I have a fact table, a day aggregate and a month aggregate. I have a time hierarchy and the month aggregate is set to the month level, the day aggregate is set to the day level within the time hierarchy.
When using any measures and a field from my time dimension .. the appropriate aggregate is chosen, ie month & activity count .. month aggregate is used. Day & activity count .. day aggregate is used.
However - when I use the count distinct aggregate rule .. the request always uses the lowest common denominator. The way I have found to get this to work is to use a logical table source override in the aggregation tab. Once I do this .. it does use the aggregates correctly.
A few questions
1. Is this the correct way to use aggregate navigation for the count distinct aggregation rule (using the source override option)? If yes, why is this necessary for count distinct .. what is special about it?
2. The main problem I have now is that I need to create a simple count measure that has a CASE statement in it. The only way I see to do this is to select the Based on Dimensions checkbox which then allows me to add a CASE statement into my count distinct clause. But now the aggregation issue comes back into play and I can't do the logical table source override when the based on dimensions checkbox is checked .. so I am now stuck .. any help is appreciated.
K

Ok - I found a workaround (and maybe the preferred solution for my particular issue), which is - Using a CASE Statement with a COUNT DISTINCT aggregation and still havine AGGREGATE AWARENESS
To get all three of the requirements above to work I had to do the following:
- Create the COUNT DISTINCT as normal (counting on a USERID physically mapped column in my case)
- Now I need to map my fact and aggregates to this column. This is where I got the case statement to work. Instead of trying to put the case statement inside of the Aggregate definition by using the checkbox 'Base on Dimension' (which didnt allow for aggregate awareness for some reason) .. I instead specified the case statement in the Column Mapping section of the Fact and Aggregate tables.
- Once all the LTS's (facts and aggregates) are mapped .. you still have to define the Logical Table Source overrides in the aggregate tab of the count distinct definition. Add in all the fact and aggregates.
Now the measure will use my month aggregate when i specify month, the day aggregate when i specify day, etc..
If you are just trying to use a Count Distinct (no CASE satement needed) with Aggregate Awareness, you just need to use the Logical Table Source override on the aggregate tab.
There is still a funky issue when using the COUNT aggregate type. As long as you dont map multiple logical table sources to the COUNT column it works fine and as expected. But, if you try to add in multiple sources and aggregate awareness it randomly starts SUMMING everything .. very weird. The blog in this thread says to check the 'Based on Dimension' checkbox to fix the problem but that did not work for me. Still not sure what to do on this one .. but its not currently causing me a problem so I will ignore for now ;)
Thanks for all the help
K

Incorrect GRAND TOTAL (with COUNT DISTINCT)

Hi,
I'm getting wrong results in the GRAND TOTAL of a COUNT DISTINCT measure column.
I have 5 distinct customers in Paris and 10 distinct customers in NYC, I want the grand total to retrieve the sum of both, that's 15.
But OBIEE is calculating the distinct customers for all cities, so if there are customers in both Paris and NYC the result is wrong.
This is the result I'm getting:
City Number_Distinct_Customers
Paris 5
NYC 10
GRAND TOTAL 12
12 is the number of all the distinct customers.
The correct GRANT TOTAL should be 5+10=15
Thanks
Regards

So just as a weird question...from a business standpoint, what does the "15" mean?
It's not a count of distinct customers, which is what this measure is supposed to be.
If I were to come up with a business description for what's being described, it's "a count of distinct customers by city....summed up so that it is no longer a count of distinct customers". What and/or how exactly would anyone use such a number?
Not trying to be a pain, just trying to figure out how this would be used.
Thanks,
Scott

Count Distinct

Hi @all,
the question might be answered already but I can't think of what to search for.
I've got a dimension with Active Directory attributes. And another dimension with groupnames.
One AD-account can be in many groups.
The facttable (snowflake Schema) contains the ID of the AD-Dimension and the groupname.
It could look like this:
ID    GroupName     GroupAlias
1      Test1              Test
1      Test2              Test
1      Test3              Test
1      hello1             Hello
1      hello2             Hello
I am actually talking about the GroupAlias which should be counted distinct.
The ID 1 is in 3 different "Test-Groups", but the alias is always "Test". So the Count should be 1.
How does the MDX should look like?
Thanks!

something i grabbed from technet. this gives the distinct count of dim members with internet sales. if you are not able to get your mdx, post it.
WITH SET MySet AS
{[Customer].[Customer Geography].[Country].&[Australia],[Customer].[Customer Geography].[Country].&[Australia],
[Customer].[Customer Geography].[Country].&[Canada],[Customer].[Customer Geography].[Country].&[France],
[Customer].[Customer Geography].[Country].&[United Kingdom],[Customer].[Customer Geography].[Country].&[United Kingdom]}
{[Measures].[Internet Sales Amount] }
MEMBER MEASURES.SETDISTINCTCOUNT AS
DISTINCTCOUNT(MySet)
SELECT {MEASURES.SETDISTINCTCOUNT} ON 0
FROM [Adventure Works]

COUNT DISTINCT for measure

Similar Messages

Maybe you are looking for