Data Warehouse Question

Couldn't find the right topic to post this under but if anyone could answer - I'd be really grateful.
We are working on a new project that is a data warehousing type system for transactional information. I spent several days of time pouring over the Data Warehousing guide and some of the other documentation but I couldn't find the answers...
The project will use Sun Servers and an EMC SAN.
We are considering using partitioning to segregate the data from month-to-month, but we'd like to have the data, once it passes a year of age, to be migrated to a different storage facility , possibly a NAS solution. This solution will have a 7-10 year retention and then after the data can be legally destroyed. I'm calling this facility near-line storage.
Can paritioning support this type of migration (just copy the files and switch the pointers in the database)?
Can the partitions be offlined and onlined when requested? (I'm thinking of a scenario similar to offline read-only tablespaces).
Are there any whitepapers or solutions that describe how to handle schema change and database upgrades through this 10 year retention without loss or impact to the data? Basically what I mean is if we add a new column to a table, do we have to add it to all paritions or can we just leave them be? I was looking at the Workspace (for schema versioning) option to help with this, but I'm not sure if its a good fit.
Should we use a media management product (i.e. Legato) to help with this. I've seen it used with RMAN and tape backups but is it used for the above application?

There are 2 parts that you need to manage in order to handle the above.
Partition is on how you separate the data logically.
Tablespace is how you handle the data storage physically.
Partitions are built on tablespace.
If a table is partitioned, altering the table will change all partitions.
Theoretically, if you make a query into a partitioned table, base on the conditions it will access the right partition for the data, therefore the users will not know that you store the first 12 months on EMC while the others on NAS except the query performance as I believe it take longer to retrieve data from NAS.
You can move partition from TableSpace1(SAN) to TableSpace2(NAS) online using the following command (I/O intensive)-
Alter Table TableName Move Partition PartitionName Tablespace TableSpaceName Nologging;
Transportable Tablespace as it stated is to bring the tablespace offline and copied it from SAN to NAS and re-attach it back.

Similar Messages

  • Data warehouse questions plz

    Thank You so much. My question is i have an account table and an account history table in the operational data base. When ever a change is made to the account table a trigger is fired and new row with old valu and a new value is inserted into the account history table in the operational data base but in account history table only that old and new value is inserted not the full detail of an account. I have designed an account dimension from which table would i start loading the table and if i want to also include the changes is that possible. I am new working hard so need help. When we for the first time load the dimension do we load all the previous years data first and then implement slowly changing dimension theory or we just do it after the dimension is loaded and when the new data arrives from that point we implement slowly changing dimension.
    Thank you reply also if you could [email protected]

    If there is an auditing column in your account_history table to capture the timestamp of the changes, then you can implement the SCD2 very easily using the strategy described below. Otherwise, there is no way to load the SCD2 data for the past 4 years.
    SCD2 Implementation Strategy:
    From the description of your problem, we can make the following assumptions:
    (1) The account table and history_table have an unique ID for each account record, say acccount_ID
    (2) The value_col is the only one trigger column which fires the SCD2 changes. All changes of other columns is treated as SCD1.
    Step 1: Join the account_table with the account_history table on the account_ID column to get the records:
    Account_ID, Old_value, New_value, Change_DateTime, Other_current_Columns
    Step 2 (optional): Sort the records above on Change_DateTime
    Step 3: Load the sorted records into the DIM_Account_SCD2 using the account_id as natural key and set the Start_date and End_date of DIM_Account_SCD2 with the approriate values of the Change_DateTime.
    For example, you have the following records:
    Account_ID, Old_value, New_value, Change_DateTime, Other_current_Columns
    1 100 200 01/01/2005 ................
    1 200 150 02/10/2005 ...............
    1 150 300 03/20/2005 .............
    2 ................................
    Then the DIM_Account_SCD2 should have the following records
    Account_WHID, Account_ID, Value_Col, Start_date, End_date, Other_current_Columns
    100, 1, 100, 01/01/1900, 12/31/2004, ....
    101 1, 200, 01/01/2005, 02/09/2005,....
    102 1, 150, 02/10/2005, 03/19/2005,....
    103 1, 300, 03/20/2005, 12/31/9999,....
    It is not difficult to implement this strategy using OWB or manually. You can work out the details on the first load and refresh load.
    Hope this will be helpful.
    Lushu

  • Question for integration star and snow flake schema in data warehouse

    Dear Reader,
    I facing a problem like that
    I have two data warehouse, one use star schema, other use snow flake schema. I would like to integrate both of them into one data warehouse. What is the strategy should these two data warehouse adopt in order to integrate int one data warehouse?
    Should I scrap both data warehouse and build a new one instead, or scrap one of them and use the other?
    What factors should be considered in order for me to more easily resolve the differences between the two data warehouses.
    Please advise. Thank you very much.

    Hi Mallis,
    This is a very broad question and the answer depends on so many factors. Please go through the following articles to get an
    understanding of what the differences are when to use which.
    When do you use a star schema and when to use a snowflake schema -
    http://www.information-management.com/news/10000243-1.html
    Star vs Snowflake Schemas – what’s your belief? –
    http://www.networkworld.com/community/blog/star-vs-snowflake-schemas-%E2%80%93-what%E2%80%99s-your-belie
    Hope this helps!

  • Data Warehouse Archive logging questions

    Hi all,
    I'd like some opinions/advice on archive logging and OWB 10.2 with a 10.2 database.
    Do you use archive logging on your non-production OWB instances? I have a development system that only has "on demand" backups done and the archive logs fill frequently. In this scenario, should I disable archive logging? I realize that this limits my recovery options to cold backups but on a development environment, this seems sufficient for me. Would I be messing up any OWB features by turning off archive logging?
    For production instances, how large do you make your archive log (as a percentage of your total DW size perhaps)?
    How do you manage them? With Flash recovery areas? Manually? RMAN or other tools?
    Thanks in Advance,
    Mike

    Usualy, I don't set any DW tables to log. Since it's a data warehouse, I believe it's better to make cold backups. In some cases, ETL Mappings may work like backup procedures themselves.
    In OWB, select the object you need (table or index) to create. Right-click it, select Configuration -> Performance Parameters -> Logging Mode -> NOLOGGING
    Flash RecoveryDon't think it's going to help you, since most of your data manipulation is based on batch jobs.
    RMANIf you want to make hot backups, this is something that can really help you manage backup procedures.
    ManuallyMaybe... Why not?
    I don't take hot backups from DW databases. I prefer to take cold backups. In a recovery scenario, you restore the cold backup and if it's 3 days late, I execute the ETL mappings for the last 3 days.
    Regards,
    Marcos

  • Best practice of metadata table in data warehouse environment ?

    Hi guru's,
    In datawarehouse, we have 1. Stage schema 2. DWH(Data warehouse reporting schema). In stageing we have about 300 source tables. In DWH schema, we are creating the tables which are only required from reporting prespective . some of the tables in stageing schema, have been created in DWH schema as well with different table name and column names. The naming convention for these same tables and columns in DWH schema is more based on business names.
    In order to keep track of these tables we are creating metadata table in DWH schema say for example
    Stage                DWH_schema
    Table_1             Table_A         
    Table_2             Table_b
    Table_3             Table_c
    Table_4              Table_DMy question is how do we handle the column names in each of these tables. The stage_1, stage_2 and stage_3 column names have been renamed in DWH_schema which are part of Table_A, Table_B, Table_c.
    As said earlier, we have about 300 tables in stage and may be around 200 tables in DWH schema. Lot of the column names have been renamed in DWH schema from stage tables. In some of the tables we have 200 column's
    so my concern is how do we handle the column names in metadata table ? Do we need to keep only table names in metadata table not column names ?
    Any idea will be greatly appriciated.
    Thanks!

    hi
    seems quite a buzzing question.
    In our project we designed a hub and spoke like architecture.
    Thus we have 3 layer, L0 is the one closest to the source and L0 table's name are linked to the corresponding sources names by mean of naming standard (like tabA EXT_tabA tabA_OK1 so on based on implementation of load procedures).
    At L1 we have the ODS , normalized model , we use business names for table there and standard names for temporary structures and artifacts
    Both L0 an L1 keep source's column names as general rule, new columns like calculated one are business driven and metadata are standard driven.
    Datamodeler fits perfect for modelling L1 purpose.
    L2 is the dimensional schema business names take place for tables and columns eventually rewritten at presentation layer ( front end tool )
    hope this helps D.

  • Manual data corrections in data-Warehouse/OLAP

    Hi to all,
    is it somewhere in essbase(hyperion) possible Manual data corrections in data-Warehouse/OLAP?
    Thank you
    G.

    Hi NareshV,
    thanks for your reply. In fact, I have also some difficulties tu understand (I am not autor) this question;-), but ok - lets say u have some records of data retrieved from essbase (or Hyperion) like for instance an open excel sheet with data. In excel is possible to delete or override any value in any column, is it possible to do this ia essbase olap server?
    Thank, G.

  • Tablespaces and block size in Data Warehouse

    We are preparing to implement Data Warehouse on Oracle 11g R2 and currently I am trying to set up some storage strategy - unfortunately I have very little experience with that. The question is what are general advices in such considerations according table spaces and block size? I made some research and it is hard to find some clear answer, there are resources advising that block size is not important and can be left small (8 KB), others state that it is crucial and should be the biggest possible (64KB). The other thing is what part of data should be placed where? Many resources state that keeping indexes apart from its data is a myth and a bad practice as it may lead to decrease of performance, others say that although there is no performance benefit, index table spaces do not need to be backed up and thats why it should be split. The next idea is to have separate table spaces for big tables, small tables, tables accessed frequently and infrequently. How should I organize partitions in terms of table spaces? Is it a good idea to have "old" data (read only) partitions on separate table spaces?
    Any help highly appreciated and thank you in advance.

    Wojtus-J wrote:
    We are preparing to implement Data Warehouse on Oracle 11g R2 and currently I am trying to set up some storage strategy - unfortunately I have very little experience with that. With little experience, the key feature is to avoid big mistakes - don't try to get too clever.
    The question is what are general advices in such considerations according table spaces and block size? If you need to ask about block sizes, use the default (i.e. 8KB).
    I made some research and it is hard to find some clear answer, But if you get contradictory advice from this forum, how would you decide which bits to follow ?
    A couple of sensible guidelines when researching on the internet - look for material that is datestamped with recent dates (last couple of years), or references recent - or at least relevant - versions of Oracle. Give preference to material that explains WHY an idea might be relevant, give greater preference to material that DEMONSTRATES why an idea might be relevant. Check that any explanations and demonstrations are relevant to your planned setup.
    The other thing is what part of data should be placed where? Many resources state that keeping indexes apart from its data is a myth and a bad practice as it may lead to decrease of performance, others say that although there is no performance benefit, index table spaces do not need to be backed up and thats why it should be split. The next idea is to have separate table spaces for big tables, small tables, tables accessed frequently and infrequently. How should I organize partitions in terms of table spaces? Is it a good idea to have "old" data (read only) partitions on separate table spaces?
    It is often convenient, and sometimes very important, to separate data into different tablespaces based on some aspect of functionality. The performance thing was mooted (badly) in an era when discs were small and (disk) partitions were hard; but all your other examples of why to split are potentially valid for administrative. Big/Small, table/index, old/new, read-only/read-write, fact/dimension etc.
    For data warehouses a fairly common practice is to identify some sort of aging pattern for the data, and try to pick a boundary that allows you to partition data so that a large fraction of the data can eventually be made read-only: using tablespaces to mark time-boundaries can be a great convenience - note that the tablespace boundary need not match the partition boudary - e.g. daily partitions in a monthly tablespace. If you take this type of approach, you might have a "working" tablespace for recent data, and then copy the older data to "time-specific" tablespace, packing it and making it readonly as you do so.
    Tablespaces are (broadly speaking) about strategy, not performance. (Temporary tablespaces / tablespace groups are probably the exception to this thought.)
    Regards
    Jonathan Lewis

  • Update data automatically in fact table in Data Warehouse

    Hi,
    I'm working on the creation of a data warehouse that include different data source like SQL Server performance (more than one), Active Directory users, Server performance (more than one), Exchange server mailboxes. The problem is that performance data change
    frequently (like CPU and Memory), so my question is how to update data in fact table every 5 seconds automatically with SSIS.
    Thank you for any advice  

    I'm assuming you have already figured out how to capture the data e.g. Powershell, extended events, MDW etc. and just need to know what dimensions or fact tables do you need.
    You need to decide how often you are going to capture this data and based on that you will have dimensions with appropriate grain. Don't try to cram everything in the same fact table if it not of the same granularity. Also, separate process usually
    have separate fact tables.
    In addition to the Date dimension, you will need a Time dimension with a grain of 1 second (or maybe 5 seconds if that is when you get your data) then run the SSIS every 5 seconds to capture and append that data in the fact table.
    - Aalamjeet Rangi | (Blog)

  • Difference between general DB and Data Warehouse DB

    Hi,
    We have a server on which Oracle Database was already installed. We want to use it as a data wareshouse. I had a question that if this database would be sufficient for a data warehouse or i would have to create a new database for data warehouse. Is it possible to find out if the installation was general purpose or Data Warehouse?
    Also if i go ahead then would it impact if i directly install the new database without uninstalling previous oracle database.
    Appreciate your help
    regards,
    Edited by: user10243788 on Mar 23, 2010 2:09 AM

    While installing you can select any one 'General Purpose' or 'Dataware house', the only difference in those two while installation is that the parameters for init.ora will be having high values for dataware house database, which can also be updated later manually. So you can go ahead and install general purpose database aswell but later you need to modify the init.ora parameters for specifying higher memory values for parameters like shared_pool_size, java_pool_size, db_buffer_cache etc.

  • Imported sealed Management Pack and its not showing up in Data Warehouse Job MPSyncJob

    Hello,
    When i try to Import a sealed Management Pack (with defined new Dimensions for Data Warehouse) it works fine, and it shows up under Management Packs under the Administration Pane. But its not showing up in the Data Warehouse Job "MPSyncJob".
    No errors in the Eventlog on the DW Management Server and cant see any errors in failed Data Warehouse Jobs either.
    I imported the MP three Days ago and still nothing.
    Now what?
    /Maekee

    I have a smaller SCSM Lab that i can import this MP in and the classes are created in the DW. Now is the question why its not working in my PreProd env and why i dont get any errors in the Event Log? Anyone that can Point me in the right direction for
    this?
    /Maekee

  • Export and Import in Data warehouse

    Hi,
    I have built a data warehouse project with few dimensions and a cube on a server machine, say server1. Now i have another server machine on which i want to develop this project for bulk production.
    I want to import all the developments from server1 to new server machine, say server2, To save my time. I exported data from server1 using export functionality and imported the same file to server2 using import function. My questions is.......
    Is it possible to use all the imported objects like cube, dimension, mappings, tables and sequences in the same way in server2 as i was using in server1. If yes then what are the necessary configurations, i have to made.???? because imported dimensions are not able to deploy in new server2 machine giving different errors especially invalid location and invalid table errors for mapping and dimensions.
    thanks in advance
    imran

    you can sue the MDL file on your new server but you have to do the following:
    1. Register locations.
    2. Configure the mapping to use the new locations registered.
    4. Configure Control Center locations.
    Register the location in Control center
    Deploy the mappings/Process flows
    The Invalid location error is occuring becaue the location is not configured and registered in your connection explorer.

  • Reinstall Data Warehouse to remove test data

    We are wanting to go live with SCSM soon, hopefully in a couple of weeks.
    Unfortunately, we don't have the infrastructure in which I was able to setup a test and production environment. I've managed to remove the test data out of the ServiceManager database via the SMLets, but we are wanting to remove the test data out of the
    data warehouse as well. I've attempting manually deleting from the database (I know unsupported), though this did not work and just gave many database errors (as expected).
    What I'm wondering is if it's possible to uninstall/reinstall the DW. Upon reinstall, I know I would need to likely re-register it with SCSM.
    In doing this, will all of the custom classifications/statuses/templates/etc stay in place?
    Once installed, would all of the sync jobs pick up as expected and sync the current data in the Service Manager (No incidents/Changes/etc) to the DW?
    We have all databases/Service Manager/Portal running on one server. DW running on a 2nd server.
    If anyone could provide some insight around this, it would be much appreciated. Any direction towards documentation would also be great!
    Thanks

    The best scenario (and my recomendation) would be to export and take backup of all your management packs, reinstall the entire Service Manager environment and import them again. In that way you should be able to get a fresh, functional Service
    Manager and all your settings would be retained.
    However, to answer your question, you would need to:
    - Unregister with SCSM DW
    - Uninstall DW
    - Delete the three DW databases
    - Install DW
    - Register with DW
    This would not affect any settings in your SCSM environment and you would have an empty DW.
    Regards
    //Anders
    Anders Asp | Lumagate | www.lumagate.com | Sweden

  • Database and Data Warehouse, SAP BW Vs Oracle

    Hello Gurus,
    I would like to know the differences between Database and Data Warehouse.
    Oracle acts as a Database for SAP BW. I understand it this way, that all the data is stored in Oracle and BW tell the Database how to store, with all the links etc.
    Please tell me whether I am correct.
    It’s my pleasure to award points,
    Thanks and best wishes,
    i-bi

    hi,
    A data warehouse is, primarily, a record of an enterprise's past transactional and operational information, stored in a database designed to favour efficient data analysis and reporting (especially OLAP). Data warehousing is not meant for current, "live" data.
    A database is a collection of information stored in a computer in a systematic way, such that a computer program can consult it to answer questions. The software used to manage and query a database is known as a database management system (DBMS). The properties of database systems are studied in information science.
    http://www.webopedia.com/TERM/D/data_warehouse.html
    Hope this helps.
    Regards,
    yunus

  • Oracle Development Survey: Data Warehouses Customers

    At the start of most data warehouse projects, or even during a project, I am sure you as customers try to find answers to the following questions to help you plan and manage your environments:
    * Where can I find trend and comparison information to help me plan for future growth of my data warehouse?
    * How many cpu's do other customers use per terabyte?
    * How many partitions are typically used in large tables? How many indexes?
    * How much should I allocate for memory for buffer cache?
    * How does my warehouse compare to others of similar and larger scale?
    The data warehouse development team, here at Oracle would like to help provide answers to these questions. However, to do this we need your help. If you have an existing data warehouse environment, we would like to obtain more technical information about your environment(s) by running a simple measurement script and returning the output files to us, here at Oracle. This will allow our developers to provide comprehensive documents that explain best practices and get a better understanding of which features our customers use the most. This will also allow you as Customers, to benchmark your environments compared to other customers’ environments.
    From a Company perspective we are also interested to get feedback on features we have added to the database, are these features used, how are they used etc. For example we are keen to understand:
    * Which initialization parameters are most frequently used at what values?
    * How many Oracle data warehouses run on RAC? on single nodes?
    * Is there a trend one-way or the other, especially as data volumes increase?
    * Does this change with newer releases of the database?
    All results from these scripts will be held confidential. No customers will be mentioned by name; only summaries and trends will be reported (e.g., “X percent of tables are partitioned and Y percent are indexed in data warehouses that are Z terabytes and larger in size.” or “X percent of Oracle9i and Y percent of Oracle10g data warehouses surveyed run RAC”). Results will be written up as a summarized report. Every participating customer will receive a copy of the report.
    Terabyte and larger DW are the primary interest, but information on any data warehouse environment is useful. We would like to have as many customers as possible submit results, ideally by the end of this week. However, this will be an on going process so regular feedback after this week is extremely useful.
    To help our developers and product management team please download and run the DW measurement script kit from OTN which is available from the following link:
    http://www.oracle.com/technology/products/bi/db/10g/dw_survey_0206.html
    Please return the script outputs using the link shown on the above web page, see the FAQ section, or alternatively mail them directly to me: [email protected].
    Thank you and we look forward to your responses.
    Message was edited by:
    klaker

    At the start of most data warehouse projects, or even during a project, I am sure you as customers try to find answers to the following questions to help you plan and manage your environments:
    * Where can I find trend and comparison information to help me plan for future growth of my data warehouse?
    * How many cpu's do other customers use per terabyte?
    * How many partitions are typically used in large tables? How many indexes?
    * How much should I allocate for memory for buffer cache?
    * How does my warehouse compare to others of similar and larger scale?
    The data warehouse development team, here at Oracle would like to help provide answers to these questions. However, to do this we need your help. If you have an existing data warehouse environment, we would like to obtain more technical information about your environment(s) by running a simple measurement script and returning the output files to us, here at Oracle. This will allow our developers to provide comprehensive documents that explain best practices and get a better understanding of which features our customers use the most. This will also allow you as Customers, to benchmark your environments compared to other customers’ environments.
    From a Company perspective we are also interested to get feedback on features we have added to the database, are these features used, how are they used etc. For example we are keen to understand:
    * Which initialization parameters are most frequently used at what values?
    * How many Oracle data warehouses run on RAC? on single nodes?
    * Is there a trend one-way or the other, especially as data volumes increase?
    * Does this change with newer releases of the database?
    All results from these scripts will be held confidential. No customers will be mentioned by name; only summaries and trends will be reported (e.g., “X percent of tables are partitioned and Y percent are indexed in data warehouses that are Z terabytes and larger in size.” or “X percent of Oracle9i and Y percent of Oracle10g data warehouses surveyed run RAC”). Results will be written up as a summarized report. Every participating customer will receive a copy of the report.
    Terabyte and larger DW are the primary interest, but information on any data warehouse environment is useful. We would like to have as many customers as possible submit results, ideally by the end of this week. However, this will be an on going process so regular feedback after this week is extremely useful.
    To help our developers and product management team please download and run the DW measurement script kit from OTN which is available from the following link:
    http://www.oracle.com/technology/products/bi/db/10g/dw_survey_0206.html
    Please return the script outputs using the link shown on the above web page, see the FAQ section, or alternatively mail them directly to me: [email protected].
    Thank you and we look forward to your responses.
    Message was edited by:
    klaker

  • Oracle Development Survey on Data Warehouses: How Does Yours Compare?

    At the start of most data warehouse projects, or even during a project, I am sure you as customers try to find answers to the following questions to help you plan and manage your environments:
    * Where can I find trend and comparison information to help me plan for future growth of my data warehouse?
    * How many cpu's do other customers use per terabyte?
    * How many partitions are typically used in large tables? How many indexes?
    * How much should I allocate for memory for buffer cache?
    * How does my warehouse compare to others of similar and larger scale?
    The data warehouse development team, here at Oracle, would like to help provide answers to these questions. However, to do this we need your help. If you have an existing data warehouse environment, we would like to obtain more technical information about your environment(s) by running a simple measurement script and returning the output files to us, here at Oracle. This will allow our developers to provide comprehensive documents that explain best practices and get a better understanding of which features our customers use the most. This will also allow you as Customers, to benchmark your environments compared to other customers’ environments.
    From a Company perspective we are also interested to get feedback on features we have added to the database, are these features used, how are they used etc. For example we are keen to understand:
    * Which initialization parameters are most frequently used at what values?
    * How many Oracle data warehouses run on RAC? on single nodes?
    * Is there a trend one-way or the other, especially as data volumes increase?
    * Does this change with newer releases of the database?
    All results from these scripts will be held confidential. No customers will be mentioned by name; only summaries and trends will be reported (e.g., “X percent of tables are partitioned and Y percent are indexed in data warehouses that are Z terabytes and larger in size.” or “X percent of Oracle9i and Y percent of Oracle10g data warehouses surveyed run RAC”). Results will be written up as a summarized report. Every participating customer will receive a copy of the report.
    Terabyte and larger DW are the primary interest, but information on any data warehouse environment is useful. We would like to have as many customers as possible submit results, ideally by the end of this week. However, this will be an on going process so regular feedback after this week is extremely useful.
    To help our developers and product management team please download and run the DW measurement script kit from OTN which is available from the following link:
    http://www.oracle.com/technology/products/bi/db/10g/dw_survey_0206.html
    Please return the script outputs using the link shown on the above web page, see the FAQ section, or alternatively mail them directly to me: [email protected].

    969224 wrote:
    Hi Guys, just a quick question. when we have a primary key on 4 coloumns and we have, say 20 million rows and we want to add one extra row. How does oracle check whether the data on the primary key is unique to the record being added compared to the 20 million rows. Does it actually compare the record being added to all the rows present in the table?
    Edited by: 969224 on May 10, 2013 8:14 AMNot the whole row, it compares the 4 columns in the INDEX against the 4 columns in the new row.

Maybe you are looking for