Data Warehouse: general info

Hi all!!! I'm new to data warehousing, so DON'T SMILE; I'm reading some general guides concerning data warehousing but I've not understood which is the method to export data from OLTP database to data warehouse... Someone can help me?
Thank you very much!
Stefano.

Hi,
A DataWarehouse is not a different thing than an OLTP environnement.
It's just the way to use the database which is different so the configuration of your database need to be different from an OLTP to a DW.
In an OLTP environnement, the users will make hundreds or billions insert/update/delete on the tables each hour/minute, ... (big transactionnal activity)
In a DataWarehouse, u will have some BIIIIIIIG transaction but few. And the DW will be updated (by the datas from the production for example) each night, week-end, ...
So the configuration of the database will not be the same...
Fred

Similar Messages

  • Difference between general DB and Data Warehouse DB

    Hi,
    We have a server on which Oracle Database was already installed. We want to use it as a data wareshouse. I had a question that if this database would be sufficient for a data warehouse or i would have to create a new database for data warehouse. Is it possible to find out if the installation was general purpose or Data Warehouse?
    Also if i go ahead then would it impact if i directly install the new database without uninstalling previous oracle database.
    Appreciate your help
    regards,
    Edited by: user10243788 on Mar 23, 2010 2:09 AM

    While installing you can select any one 'General Purpose' or 'Dataware house', the only difference in those two while installation is that the parameters for init.ora will be having high values for dataware house database, which can also be updated later manually. So you can go ahead and install general purpose database aswell but later you need to modify the init.ora parameters for specifying higher memory values for parameters like shared_pool_size, java_pool_size, db_buffer_cache etc.

  • Data warehouse modeling

    i am stuck at some of the points and have no clue to what i should do. Please, if you could find it out from someone already there they should know somthing.
    1. What do you do with flag indicators and different code attributes in your entity tables. I mean do you include them in your dimension.
    2. How do you handle dependent or weak entities when transferring from ERD to dimensional star schema. For example, My Account table has dependent (Aggreement, Suitability,Qualification, Name Address) with one to many relationship how do i handle them. Should i include these entities inside account dimension or directly to the fact.
    3. My dimensions are User, Account, Account activity, Time how do we identify which are slowly changing dimensions.
    Because when an account goes through steps there are changes made to it. After being saved it can again be put into the cycle again for more changes. i mean very frequent changes. Where do i put the start date and end date if indeed it is a Slowly changing Dimension.
    I would really appreciate your help.
    plz reply at [email protected]

    My 2 cents, I think Gopi is it about rigth on the OLTP side, but I have to disagreewith Gopi on some the data warehouse points.
    Data warehouse is generally a broader concept than just OLAP /multi-dimensional model, that would be regarded as just a component of most DWs.
    Data warehouses run SQL queries all the time.  I would bet the overwhelming majority of BW queries are SQL queries, even for querying OLAP cubes, although MDX is starting to be used more.  Operationally, SAP uses SQL to perform a lots of the procesess in BW - loading data, rollup, compress, etc.
    The majority of data warehouses are perhaps in the hundreds of GBs, although large enterprises can easiy have TBs of data.
    BW can incorporate real time data from R3 with remote cubes.
    BW has transactional InfoCubes where users enter data for budgeting, forecasting etc.
    You can google hese topics and finds lots of info on data warehouse design.

  • Data Warehouse in Virtual Machine

    Hi all!
    I'm looking info about pros and cons of using an Oracle Data Warehouse in a Virtual Machine.
    I've been looking at google but I can't find something worthy.
    Do you know where can find a document or tips?
    Thanks!

    I am curious to know why you would want to run a Data Warehouse within a VM image. It would seem to me the benefits would be limited for most data warehouses unless you are using this as a way to quickly spin off QA, training, unit/integration testing environments using limited resources.
    There is nothing specific you need to do for data warehousing when running a database inside a VM but obviously you will need to setup your VM environment carefully. You can find general information on virtualization here: http://www.oracle.com/technology/tech/virtualization/index.html
    I would be interested in getting more information you about this requirement.
    Regards
    Keith Laker
    Senior Principal Product Manager, Data Warehousing

  • Best practice of metadata table in data warehouse environment ?

    Hi guru's,
    In datawarehouse, we have 1. Stage schema 2. DWH(Data warehouse reporting schema). In stageing we have about 300 source tables. In DWH schema, we are creating the tables which are only required from reporting prespective . some of the tables in stageing schema, have been created in DWH schema as well with different table name and column names. The naming convention for these same tables and columns in DWH schema is more based on business names.
    In order to keep track of these tables we are creating metadata table in DWH schema say for example
    Stage                DWH_schema
    Table_1             Table_A         
    Table_2             Table_b
    Table_3             Table_c
    Table_4              Table_DMy question is how do we handle the column names in each of these tables. The stage_1, stage_2 and stage_3 column names have been renamed in DWH_schema which are part of Table_A, Table_B, Table_c.
    As said earlier, we have about 300 tables in stage and may be around 200 tables in DWH schema. Lot of the column names have been renamed in DWH schema from stage tables. In some of the tables we have 200 column's
    so my concern is how do we handle the column names in metadata table ? Do we need to keep only table names in metadata table not column names ?
    Any idea will be greatly appriciated.
    Thanks!

    hi
    seems quite a buzzing question.
    In our project we designed a hub and spoke like architecture.
    Thus we have 3 layer, L0 is the one closest to the source and L0 table's name are linked to the corresponding sources names by mean of naming standard (like tabA EXT_tabA tabA_OK1 so on based on implementation of load procedures).
    At L1 we have the ODS , normalized model , we use business names for table there and standard names for temporary structures and artifacts
    Both L0 an L1 keep source's column names as general rule, new columns like calculated one are business driven and metadata are standard driven.
    Datamodeler fits perfect for modelling L1 purpose.
    L2 is the dimensional schema business names take place for tables and columns eventually rewritten at presentation layer ( front end tool )
    hope this helps D.

  • Tablespaces and block size in Data Warehouse

    We are preparing to implement Data Warehouse on Oracle 11g R2 and currently I am trying to set up some storage strategy - unfortunately I have very little experience with that. The question is what are general advices in such considerations according table spaces and block size? I made some research and it is hard to find some clear answer, there are resources advising that block size is not important and can be left small (8 KB), others state that it is crucial and should be the biggest possible (64KB). The other thing is what part of data should be placed where? Many resources state that keeping indexes apart from its data is a myth and a bad practice as it may lead to decrease of performance, others say that although there is no performance benefit, index table spaces do not need to be backed up and thats why it should be split. The next idea is to have separate table spaces for big tables, small tables, tables accessed frequently and infrequently. How should I organize partitions in terms of table spaces? Is it a good idea to have "old" data (read only) partitions on separate table spaces?
    Any help highly appreciated and thank you in advance.

    Wojtus-J wrote:
    We are preparing to implement Data Warehouse on Oracle 11g R2 and currently I am trying to set up some storage strategy - unfortunately I have very little experience with that. With little experience, the key feature is to avoid big mistakes - don't try to get too clever.
    The question is what are general advices in such considerations according table spaces and block size? If you need to ask about block sizes, use the default (i.e. 8KB).
    I made some research and it is hard to find some clear answer, But if you get contradictory advice from this forum, how would you decide which bits to follow ?
    A couple of sensible guidelines when researching on the internet - look for material that is datestamped with recent dates (last couple of years), or references recent - or at least relevant - versions of Oracle. Give preference to material that explains WHY an idea might be relevant, give greater preference to material that DEMONSTRATES why an idea might be relevant. Check that any explanations and demonstrations are relevant to your planned setup.
    The other thing is what part of data should be placed where? Many resources state that keeping indexes apart from its data is a myth and a bad practice as it may lead to decrease of performance, others say that although there is no performance benefit, index table spaces do not need to be backed up and thats why it should be split. The next idea is to have separate table spaces for big tables, small tables, tables accessed frequently and infrequently. How should I organize partitions in terms of table spaces? Is it a good idea to have "old" data (read only) partitions on separate table spaces?
    It is often convenient, and sometimes very important, to separate data into different tablespaces based on some aspect of functionality. The performance thing was mooted (badly) in an era when discs were small and (disk) partitions were hard; but all your other examples of why to split are potentially valid for administrative. Big/Small, table/index, old/new, read-only/read-write, fact/dimension etc.
    For data warehouses a fairly common practice is to identify some sort of aging pattern for the data, and try to pick a boundary that allows you to partition data so that a large fraction of the data can eventually be made read-only: using tablespaces to mark time-boundaries can be a great convenience - note that the tablespace boundary need not match the partition boudary - e.g. daily partitions in a monthly tablespace. If you take this type of approach, you might have a "working" tablespace for recent data, and then copy the older data to "time-specific" tablespace, packing it and making it readonly as you do so.
    Tablespaces are (broadly speaking) about strategy, not performance. (Temporary tablespaces / tablespace groups are probably the exception to this thought.)
    Regards
    Jonathan Lewis

  • Table and Index compression in data warehouse - thoughts?

    Hi,
    We have a data warehouse with large fact tables and materialized views of this data.
    Approx 3 million inserts per day week-ends about 12 million.
    The fact tables we have expected to have 200 million, and couple with 1-3 billion.
    Tables partitioned and have bitmap indexes.
    Just wondered what thoughts were about compressing large fact tables and mviews both from point of view of ETL into them and reporting from them afterwards.
    I take it, can compress/uncompress accordingly without any problem?
    Many Thanks

    After compression, most SELECT statements would not get slower. Actually, many can get faster due to reduced IO and buffer needs.
    The situation with DMLs is more complex. It depends on the exact compression options (basic or advanced) and the DML (INSERT,UPDATE, direct load,..),but generally DML are negatively affected by compression.
    In a Data Warehouses (DWs), it is usually quite beneficial to compress partitions or tables that contain data that is not supposed to be modified (read only or read mostly). Please note that in many cases you do not have to compress while you are loading the data – you can do that later.
    You can also consider compressing some of your B-tree indexes (if you use them in your DW system).
    Iordan Iotzov
    http://iiotzov.wordpress.com/

  • Only Alert Data is not being inserted in SCOM 2012 Data Warehouse database

    Hi All,
    Alert data is not getting inserted into SCOM Data Warehouse database since 10 days though I could see latest Performance data in DW DB.  No changes were made as far I know on SCOM servers or DB's. I had this issue few months back
    and got resolved by executing a qiery to create an entry for Data Warehouse Synchronisation server entry.
    Now I have checked the discovered inventory and could see an entry present and it is healthy. Still, latest Alert data is not getting inserted into DW DB. Please help me out.
    http://social.technet.microsoft.com/Forums/en-US/2dac4f45-4911-40dc-a220-702993188832/alert-data-is-not-present-in-scom-2012-data-warehouse-database-since-two-weeks?forum=operationsmanagergeneral
    Regards, Suresh

    Hi,
    Generally, data warehouse store a long-term data, and by default, it would keep 400 days data, I suggest check your configuration:
    How to Configure Grooming Settings for the Reporting Data Warehouse Database
    http://technet.microsoft.com/en-us/library/hh212806.aspx
    Alex Zhao
    TechNet Community Support

  • Alert data is not present in SCOM 2012 Data Warehouse database since two weeks

    Alert data is not present in SCOM 2012 Data Warehouse database since a week though I could see Performance data for the latest dates. Old Alert data is present but I think the latest Alert data is not being inserted to Data warehouse. No activity was done
    on the day from where we are missing data.
    I could see 31554 events on all my Management servers and this proves that Data Insertion is happening. I am not sure why only Alert data is missing (or not getting inserted) in DW database. I am trying to use SQL queries to fetch the data as I dont have reporting
    currently. The same query is working for other dates, so there is no issue with this query.
    I have noticed that I could see the Alert Data present in SCOM OperationsmNager Db but NOT present in OperationsManagerDW database.
    In SCOM 2007, data will be inserted in both Ops DB and DW simultaneously. I believe the same methodology in 2012 too.
    Please help me to fetch Alert data from DW. Any suggestion pls?
    Regards, Suresh

    Hi,
    Generally, data warehouse store a long-term data, and by default, it would keep 400 days data, I suggest check your configuration:
    How to Configure Grooming Settings for the Reporting Data Warehouse Database
    http://technet.microsoft.com/en-us/library/hh212806.aspx
    Alex Zhao
    TechNet Community Support

  • Could you please recommend a book or two about data warehouse designing?

    Want to read some books about data warehouse and how to build or deal with the problems during the data warehousing process.
    Anyone could recommend any book regard to this?
    I want the book to mainly talk about the common case scenarios in data warehouse area and the general solutions to those scenarios.
    I am quite new in this area, so any recommendation would be highly appreciated.
    Thanks.

    Perhaps also these resources, if you've not already seen them
    DW Best Practices Whitepaper
    http://www.oracle.com/technetwork/database/features/bi-datawarehousing/twp-dw-best-practies-11g11-2008-09-132076.pdf
    Greg Rahn on the core performance fundamentals of Oracle data warehousing
    http://structureddata.org/2009/12/14/the-core-performance-fundamentals-of-oracle-data-warehousing-introduction/

  • MSS- Team- EmpInfo- General Info- EMPSearch is not displaying Search Result

    Hello All,
    Currently we are using MSS BP 1.3.1 & SAP_MSS 600 SP16...
    I am going to the navigation MSS->Team->EmpInfo->General Info->Employee Search [Under Employee Selection Dropdown i am selecting Employee Serach Option], here i am getting the view with lastname, fistname, Personal No:, Even i am using any of the Search Critiria like Serach With LName,FName And PerNo:, No Employee Search result is Displaying Simply No Data Found Message is coming.
    To active this EMp search in MSS do i need configure any steps from SPRO. Any Steps from Portal Front.
    To display Team Calender , i configured IGS, what else i need to do to get Team Calender iview in Team Under MSS.
    Please list out the steps and provide some help to fix the issue.
    Thanks & Regards
    Adapag

    Bala,
    Thanks for ur response, Debug in the Sense we need to Maintain any parameters in the Employee Search Parameters w.r.t MSS in the Particular Method or What?
    I checked in the SPRO--> Integration with other mySAP Components -->Business Packages > Manager self Services (mySAPERP)>Object and Data Provider--> Define Object Selection / Group Parameters for OBJ Search / Define Rules for Object Selection..
    Here we need to create any rule and pass those Group Parameters into that???
    I told my ABAPers to debug those two methods, apart from that any HR side SPRO Configuration pending?
    Please reply me and list out the Steps, Where and What exact config need to do from SPRO apart from that Debugging????
    Thanks in Advance
    Adapag

  • Unable to register my data warehouse in Service Manager

    I have been trying to register my data warehouse but keep getting the same error message each time - "Invalid URI:  the hostname could
    not be parsed."  I know the issue is on the ServiceManager database side of things, but there is not a lot of information related to this error message and the info I do find is unrelated to ServiceManager.  I have gone as far as building a
    brand new data warehouse server and reinstalling all of the data warehouse databases from scratch.  It doesn't matter if I tried to register with the original data warehouse environment or the new environment, I get the same error message.  I also
    received a powershell command from Microsoft Support to help clean up any residual entries in the ServiceManager database that might be left over.  At this point I'm at a loss on how to proceed.  Anyone ever run into this issue?
    Here's the event log message:
    Unable to register Service Manager installation with Data Warehouse installation.
     Data Warehouse Server: DW_SCSM
     Service Manager Management Server: FlowSMRC
     Exception: Microsoft.EnterpriseManagement.UI.Core.Shared.PowerShell.PSServiceException: Invalid URI: The hostname could not be parsed.
       at Microsoft.EnterpriseManagement.UI.Core.Shared.PowerShell.PSHostService.executeHelper(String cmd, Object[] input)
       at Microsoft.EnterpriseManagement.UI.Core.Shared.PowerShell.PSHostService.Invoke(String cmd, Object[] input)
       at Microsoft.EnterpriseManagement.ServiceManager.UI.Administration.DWRegistration.Registration.DWRegistrationHelper.AcceptChanges(WizardMode wizardMode)
    Also, I tried registering the data warehouse using powershell Add-SCDWMgmtGroup and get an error saying that Cannot associate Service Manager installation on scsm_server with Service Manager Data Warehouse installation on dw_server.

    Just got additional information about this error. If you run the following query in SQL pointing to the ServiceManager database you should see the name of the Management Server.  The query is:
    select * from MT_Microsoft$SystemCenter$ResourceAccessLayer$SdkResourceStore
    There is a column named Server_<some GUID> that should contain the Management Sever NetBIOS name.  If it has some other name, run an update query to ensure the column has the NetBIOS name of the Management Server.  Restart the SCSM Console
    and retry the Database Registration again and it should succeed.
    Microsoft has said that this is normally caused by either moving the database or database server to another server and the value is never updated.
    Try registering the DW Server again and see if you have better luck.

  • Foreign keys in SCD2 dimensions and fact tables in data warehouse

    Hello.
    I have datawarehouse in snowflake schema. All dimensions are SCD2, the columns are like that:
    ID (PK) SID NAME ... START_DATE END_DATE IS_ACTUAL
    1 1 XXX 01.01.2000 01.01.2002 0
    2 1 YYX 02.01.2002 01.01.2004 1
    3 2 SYX 02.01.2002 1
    4 3 AYX 02.01.2002 01.01.2004 0
    5 3 YYZ 02.01.2004 1
    On this table there are relations from other dimension and fact table.
    Need I create foreign keys for relation?
    And if I do, on what columns? SID (serial ID) is not unique. If I create on ID, I have to get SID and actual row in any query.

    >
    I have datawarehouse in snowflake schema. All dimensions are SCD2, the columns are like that:
    ID (PK) SID NAME ... START_DATE END_DATE IS_ACTUAL
    1 1 XXX 01.01.2000 01.01.2002 0
    2 1 YYX 02.01.2002 01.01.2004 1
    3 2 SYX 02.01.2002 1
    4 3 AYX 02.01.2002 01.01.2004 0
    5 3 YYZ 02.01.2004 1
    On this table there are relations from other dimension and fact table.
    Need I create foreign keys for relation?
    >
    Are you still designing your system? Why did you choose NOT to use a Star schema? Star schema's are simpler and have some performance benefits over snowflakes. Although there may be some data redundancy that is usually not an issue for data warehouse systems since any DML is usually well-managed and normalization is often sacrificed for better performance.
    Only YOU can determine what foreign keys you need. Generally you will create foreign keys between any child table and its parent table and those need to be created on a primary key or unique key value.
    >
    And if I do, on what columns? SID (serial ID) is not unique. If I create on ID, I have to get SID and actual row in any query.
    >
    I have no idea what that means. There isn't any way to tell from just the DDL for one dimension table that you provided.
    It is not clear if you are saying that your fact table will have a direct relationship to the star-flake dimension tables or only link to them through the top-level dimensions.
    Some types of snowflakes do nothing more than normalize a dimension table to eliminate redundancy. For those types the dimension table is, in a sense, a 'mini' fact table and the other normalized tables become its children. The fact table only has a relation to the main dimension table; any data needed from the dimensions 'child' tables is obtained by joining them to their 'parent'.
    Other snowflake types have the main fact table having relations to one or more of the dimensions 'child' tables. That complicates the maintenance of the fact table since any change to the dimension 'child' table impacts the fact table also. It is not recommended to use that type of snowflake.
    See the 'Snowflake Schemas' section of the Data Warehousing Guide
    http://docs.oracle.com/cd/B28359_01/server.111/b28313/schemas.htm
    >
    Snowflake Schemas
    The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake.
    Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. Figure 19-3 presents a graphical representation of a snowflake schema.

  • How do you link attributes in data warehouse

    I am new to data warehouse and I am trying to get some direction on how tables, data is linked in the data warehouse archieture. For example in a rational database inorder to get data from several tables to print on a report I would have to join tables, or maybe have a unique field in each table that retrieves the information for that record. How does this work in Data Warehousing. I will appreciate some insight on this subject.
    thanks

    11616,
    Welcome to the exciting world of Data Warehousing! The relational concepts you are referring to still apply in data warehousing. There are also dimensional concepts, for the data that has inherent hierarchy. And too much more to mention here.
    The following resources are great places to start. They are widely recognized in the industry and are vendor independent:
    1. Bill Inmon
    http://www.billinmon.com//library/articles/article2.asp#General
    2. Ralph Kimball
    http://www.rkimball.com/html/articlesfolder/DWfundamental.html
    Almost vendor independent:
    3. "Oracle9i Data Warehousing Guide Release 2", Chapter 1: "Data Warehousing Concepts".
    http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96520/concept.htm#43555
    Nikolai

  • Data Warehouse Infrastructure

    I have a requirement to build a Data Warehouse and Analytics / Reporting capability with the following requirements...
    Maximum of 1TB for Production Data + DR + Test/Dev Env.
    SSIS (up to 25 sources), SSAS (cubes, 5 concurrent users) and SSRS (2 concurrent users, max 500 reports).
    I needs a Production, DR and Test/Dev Environment 
    I have been told that I will require 12 servers each having 4 cores and 12GB of storage (4 for Prod, 4 DR and 4 Test/Dev).
    To give you an idea of load we plan to have 1 full time ETL developer, 5 Data Analysts, 2 Reporting Analysts. We are quite a small business and don't have a particularly large
    amount of data. 
    The model has SQL Server, SSIS, SSAS, SSRS on different servers across each Environment. 
    Any idea if this is overkill? I also have an estimate of 110 days for Setting up the Servers, Installing the SQL Server software and general Infrastructure design activity.

    Agree. Overkill. Big overkill.
    I would recommend production/DR/Dev each have 2 servers. I'd put SSAS, SSRS and SSIS one one and the DB on the other.
    In production, SSAS/SSRS will be active during the daytime; SSIS will likely be active off hours. So putting all that on one box should be fine for sharing the load. The DB on a second box would be good since it will likely be busy during the daytime
    and night time. Four processors may be heavy depending on the types of queries and usage patterns. I suspect you can get by with 2 processor servers, but would recommend buying the 4 processor boxes for dev and production, get them configured and run
    some performance baselines before putting in the DR environment. Then, if you find the CPUs idling, you can always cut the DR environment to 2 processor boxes. Not sure it's worth the minor cost savings to save 2 processors on 2 boxes with that effort, but
    if you're looking to cut corners, you may find that a 2 processor per server DR environment is within your performance comfort zone.
    For the dev environment, one box may well handle it all, but I'd go for 2. On average, a Dev environment isn't all that busy, but when you need the horsepower, you need it. And since it's Development AND Test, you help yourself by having realistic production
    level performance on what you're testing. Four processors is fine, but max it out on memory.
    As for hard drives, be careful about configuration. You need the space on your DW server and maybe for the SSAS server depending on how the cubes are built (ROLAP/MOLAP). When you speak about amounts of data, be careful since you'll want a lot of indexes,
    and that can double the DB size for a DW. Your DW will also run faster if you have different filegroups for data/indexes/temp DB, but only if those different filegroups are on different physical media that work well in parallel. You can always get fancier
    with more filegroups to have different ones for staging tables, for segregating fact & dimension tables etc. But for this size DB, that's overkill as well.
    Mainly, I'd look at spending hardware $s on memory for the servers, but get less of them.
    Now... two questions...
    1) Can you clarify the disk space needs? How much total data space in one environment, without indexes? Based on that, add the same for indexes, add half as much (?) for TempDB and you have the core disk needs. Depending on how much it is,
    you can decide on RAID, filegroup configuration, etc. And if the disk space with indexes is small enough that it all fits in memory, then disk and filegroup configuration becomes inconsequential except for ETL loads.
    2) The 25 sources... can you clarify that? 25 source systems? Total of 25 source applications? Total of 25 tables? Curious, because I'm wondering about how long you'd keep 1 full time ETL developer busy.

Maybe you are looking for