Saving big data efficiently for processing?

Every 15 minutes we read 250 XML files. Each XML file is an
element. Each element (xml file) is composed of 5
sub-elements. Each sub-element has
400 counters. So every xml file has 2,000 scores. Since there are 250 xml files, then there are a total of 500K counters.
Data can look like this. This is one XML file, there are 249 more like this:
ELEM1
- ELEM1_1
- Counter1: 54
- Counter2: 12
- Counter3: 6
- Counter400: 9
- ELEM1_2
- Counter1: 43
- Counter2: 65
- Counter3: 98
- Counter400: 12
- ELEM1_3
- Counter1: 43
- Counter2: 23
- Counter3: 64
- Counter400: 1
- ELEM1_4
- Counter1: 4
- Counter2: 2
- Counter3: 8
- Counter400: 12
- ELEM1_5
- Counter1: 43
- Counter2: 98
- Counter3: 2
- Counter400: 12
The first, most common, thought was to create a table with the columns being the counter names. But this was done with similar data, and performance was sub-par, to say the least.
So my question is, what would be the best way to store all this scores in a database?
Thanks.
VM

Classic design conundrum/s when working with XML.
Where to perform my XML shredding in the Application or Database?
Should I store the data in native XML format or shred it out into relational
form?
The database engine is a RDBMS and is designed for working with relational data. This should give you a steer as to how you might wish to store your data.
If you need to query the contents of the XML fragments, then you're probably better off shredding them out into relational data structures but it really depends on your specific use case.
There are optimizations that can be done to improve the performance of querying the XML data types via various Indexes but accessing the same information via a relational structure is almost always faster than XML processing, XQUERY, XPATH etc.
If you are just wanting a document storage system for XML, then an RDBMS may not be the most suitable technology to use.
Can you expand upon your comment, "performance was sub-par". Specifically what was not performing as required?
John Sansom | SQL Server MCM
Blog |
Twitter | LinkedIn |
SQL Consulting

Similar Messages

  • IU Elim. "No data found for processing using current selection conditions""

    Dear Experts,
    While Executing task of Interunit elimination  in Consolidation Montior  I am getting Message "No data found for processing using current selection conditions"
    Ex. is
    A)
    In Unit X
    GL (399999) Account     Dr. 65000 (Customer Recon. Ac.)   (with Trading Parter X)
    GL (499999) Rev.A/c           65000 (with Trading Parter X)
    In Unit Y
    GL (199999) Exp. A/c     Dr. 65000   (with Trading Parter Y)
    GL (299999).Account           65000 (Vendor Recon. Ac.) (with Trading Parter Y)
    B) GLs in info cube in 0FIGL_C01 are :-
    GL Account---CCode--Trading PartnerDebit--
    Credit
    199999--YY65000-----00000
    299999--YY00000-----65000
    399999--XX65000-----00000
    499999--XX00000-----65000
    In COnsolidation WorkBench
    1) I have created Document Type
    2) Method-
      In  General Tab
       a) Two SIded Selection
        b) Per Transaction Currency Selected
    In Selection Tab
    1St Selection
    GL Account = 299999 (Customer Recon. Account)
    Company    = X
    Trading Partner = X
    1St Selection
    GL Account = 399999 (Vendor Recon. Account)
    Company    = Y
    Trading Partner = Y
    Difference Tab
    a) Post Diff to "Unit from Selection 1"
    b) Key Figure "Period Value GC
    c) Check Limit Per Difference Row
    Other Differnce
    GL Account = 100099 (Other GL)
    Currencyce Diff
    GL Account = 100510 (Other GL)
    Question :-
    1) Is the posting is appropriate and does it attracts IU Elimination?
    2) The Infocube Details are correct?
    3) Any Config issue
    May any one suggest, Why Am I not able to get the data?
    Thanks
    Rakesh Shrivastav

    Dear Sir,
    Following are the View at my end in context to your suggestion
    1. check the BCS totals data to ensure trading partner is included.
    THis is the view of Source Info Cube 0FIGL_C01
    GL Account---CCode--Trading PartnerDebit--
    Credit
    199999--YY65000-----00000
    299999--YY00000-----65000
    399999--XX65000-----00000
    499999--XX00000-----65000
    2. Execute the task for the cons group that includes both cons units X and Y
    The COns Group Is XYZ
    X- Cons Unit
    Y- Cons Unit
    In Cons Monitor I am executing Test run at XYZ level
    3. Although the trading partner for cons unit X should be Y and vice versa, the elimination should still occur with the cons unit X and trading partner X records.
    Same as the query description
    4. make sure that the items 199999, 299999, 399999 and 499999 are included in the elimination method for either selection 1 or selection 2.
    The Method Selection Tab View is
    1St Selection
    GL Account = 299999,199999
    Company = Y
    TP = Y
    2Nd Selection
    GL Account = 399999,499999
    Company = X
    TP = X
    What is your View On That?

  • J1INUT Error - No data exist for processing with the given selection option

    Hi Guru's,
    I am using transaction J1INUT utilization of provision of TDS on Services for which in have made the provision with the help of J1INPR, But when I am executing J1INUT transaction .The following error message is displaying:
    No data exist for processing with the given selection options
    I have followed the below steps.
    1) ME21N - OP Creation
    2) ML81N - Service Entry
    3) J1INPR - Provision of TDS
    4) MIRO -  Invoice Posting
    I have checked the Table J_1IEWTPROV In that system is updating the table also. Even I have activated table TRWCA for field IND
    But still I am getting the same error. Any suggestions to resolve this.
    Appreciate your inputs. Thanks in Advance
    Regards,
    DeepaK

    Hi Deepak,
    Refer the below link and follow the steps - Provision for Taxes on Service Recieved.
    Re: Provision for Taxes on Service Recieved
    Hope it may useful to you.
    Regards,
    Govind Bhaskaran

  • GoldenGate for Big Data 12c for Win x64?

    I was looking for the GoldenGate for Big Data download for Win x64 and all I found on edelivery was Linux, Solaris, HP-UX and AIX platforms, but no Windows at all (see the screenshot below). I wonder if it's been released yet? Or, is it just an unfortunate omission?
    Thanks
    Andy

    Thanks, for your reply, Karan!
    I tried following your advice, but bumped into yet another similar problem. I've installed OGG 12c and now I can't seem to be able to find the matching version of the GG Application Adapters for JMS and Flat File for the Win x64 platform. The latest version of Application Adapter available on edelivery is 11.1.1.0.0 which means I need to downgrade OGG to the same version. No big deal but I wanted to make sure I'm not missing anything.
    I wonder if anybody has any idea as to whether Application Adapters 12c for JMS and Flat File is available for the Win x64 platform, and if so, where can I download it from?
    Thanks
    Andy

  • What is the best big data solution for interactive queries of rows with up?

    0 down vote favorite
    We have a simple table such as follows:
    | Name | Attribute1 | Attribute2 | Attribute3 | ... | Attribute200 |
    | Name1 | Value1 | Value2 | null | ... | Value3 |
    | Name2 | null | Value4 | null | ... | Value5 |
    | Name3 | Value6 | null | Value7 | ... | null |
    | ... |
    But there could be up to hundreds of millions of rows/names. The data will be populated every hour or so.
    The goal is to get results for interactive queries on the data within a couple of seconds.
    Most queries look like:
    select count(*) from table
    where Attribute1 = Value1 and Attribute3 = Value3 and Attribute113 = Value113;
    The where clause contains arbitrary number of attribute name-value pairs.
    I'm new in big data and wondering what the best option is in terms of data store (MySQL, HBase, Cassandra, etc) and processing engine (Hadoop, Drill, Storm, etc) for interactive queries like above.

    Hi,
    As always, the correct answer is "it depends".
    - Will there be more reads (queries) or writes (INSERTs)?
    - Will there be any UPDATEs?
    - Does the use case require any of the ACID guarantees, or would "eventual consistency" be fine?
    At first glance, Hadoop (HDFS + MapReduce) doesn't look like a viable option, since you require "interactive queries". Also, if you require any level of ACID guarantees or UPDATE capabilities the best (and arguably only) solution is a RDBMS. Also, keep in mind that Millions of rows is pocket change for modern RDBMSs on average hardware.
    On the other hand, if there'll be a lot more queries than inserts, VERY few or no updates at all, and eventual consistency will not be a problem, I'd probably recommend you to test a Key-Value store (such as Oracle NoSQL Database). The idea would be to use (AttributeX,ValueY) as the Key, and a Sorted List of Names that have ValueY for their AttributeX. This way you only do as many reads as attributes you have in the WHERE clause, and then compute the intersection (very easy and fast with sorted lists).
    Also, I'd do this computation manually. SQL may be comfortable, but I don't think It's Big Data ready yet (unless you chose the RDBMS way, of course).
    I hope it helped,
    Joan
    Edited by: JPuig on Apr 23, 2013 1:45 AM

  • Data Type for Process Flow... PB with Date?

    I've got a problem by passing parameters in process flow.
    I have a mapping with a parameter DATE_EXEC (data type : DATE) and a default value that is TO_DATE('20/01/2007' , 'dd/mm/yyyy') . My mapping is working good when i launch it.
    I have a process flow which contains the mapping. This process has a parameter DATE_EXEC (data type : DATE). I bind the 2 DATE_EXEC. But when i launch my mapping the value is not recognized, I try with :
    - TO_DATE('20/01/2007' , 'dd/mm/yyyy')
    - 20/01/2007
    - 2007.01.20
    - 2007-01-20
    My question is what are the data type in process flow? They are not ORACLE TYPE.
    For example , a parameter in a mapping which is a VARCHAR2 must be input between quotes but if you bind it to a parameter of a process flow which is a STRING (not ORACLE Data type) , you must input it without quotes?
    Anybody has some rules about that?
    I apologize for my english, i'm a french people.

    Here is some information on the literal quote or not quote query and what I think you need to do at the end, hope it helps. Not exactly intuitive...since the flow designer (you) have to know what is a PLSQL object and what is not.
    1. Literal = FALSE
    When Literal = FALSE is set then the value entered must be a valid PL/SQL expression which is evaluated at the Control Center e.g.
    'Hello World!'
    22 / 7
    2. Literal = TRUE
    When Literal = TRUE then the value is dependent on the the type of Activity. If the activity is a PL/SQL object i.e. Mapping or Transformation, then the value is PL/SQL snippet. The critical difference here is that the value is macro substituted into the call for the object. The format of the value is identical to that entered as default value in the Mapping editor. e.g.
    'Hello World!'
    sysdate()
    If the activity type is not a PL/SQL object then the value is language independent. e.g.
    Hello World
    3.1427571
    What you should try......
    Check the map activity parameter in your process flow to see if literal is false (an expression), set it to false and then try using your TO_DATE('20/01/2007' , 'dd/mm/yyyy') expression, deploy your flow and execute. Alternatively the user guide defines the DATE type for flow with the format YYYY-MM-DD so you can have the parameter value as '2007-01-20' use literal equal to true and remember and quote your value.
    Cheers
    David

  • Problem when saving the Data basis for the Consolidation

    Hi Gurus,
    I am having 2 problems.
    1) When i try to execute the UCWB transaction . It gives an information message
    " Data basis DB needs to be generated (after upgrade) . I am following the procedure given in that "<b>Run maintenance of data basis DB in display mode. Go to the "Data Streams" tab page. Choose the "Generate" button.
    If changes to Customizing settings are permitted in the current system or client, as an alternative you can maintain the data basis in change mode and save. In the case of systems supplied with Customizing transports, as an alternative you can generate the data basis in the source system and then transport it again.</b>
    Even after doing that procedure when i again execute the UCWB. it give me the same message.
    2) when is try to save data basis for the consolidation . it give me this error message.
    "Field 0HC_ATCCODE: This compound differs from that of basic field 0HC_MEDCTG"
    I have checked the referenced Info objects . Checked the compound info objects. Activated them again. They dont differ in any way. but still i get this problem.
    Please help me out. If someone has come across this error and solved. Please help me.
    Regards
    satish

    Hi satish,
    I constantly receive the message like your #2 saying that it was a critical change in X infoobject. It's just a warning and I found several OSS notes saying that this message is not correct. Just ignore it.
    In case of your Q #1. The data basis might be generated in two ways: by pressing Save icon and by clicking the Generate icon (the system refers just to this very option) and then - Save. The last way is used in Productive environment. Try it.
    Hope this helps.

  • Saving the data fetched for scheduled Crystal Reports

    We have requirement to schedule report generation . An rpt will be provided as input to scheduler and whenever the schedule is triggered , the rpt should run to fetch data from database (without viewing the report ) and save rpt with fetched data on the file system in rpt format. Is there an API in SDK that provides this functionality?

    "Scheduling" is available only in CR Server or BOE suite of products. You can "export" the report to disk instead. This will run the report to fetch data and save it to disk with data. Here is an example code:
    import com.crystaldecisions.reports.sdk.*;
    import com.crystaldecisions.sdk.occa.report.lib.*;
    import com.crystaldecisions.sdk.occa.report.exportoptions.*;
    import java.io.*;
    public class JRCExportReport {
         static final String REPORT_NAME = "JRCExportReport.rpt";
         static final String EXPORT_FILE = "C:\\myExportedReport.pdf";
         public static void main(String[] args) {
              try {
                   //Open report.               
                   ReportClientDocument reportClientDoc = new ReportClientDocument();               
                   reportClientDoc.open(REPORT_NAME, 0);
                   //NOTE: If parameters or database login credentials are required, they need to be set before.
                   //calling the export() method of the PrintOutputController.
                   //Export report and obtain an input stream that can be written to disk.
                   //See the Java Reporting Component Developer's Guide for more information on the supported export format enumerations
                   //possible with the JRC.
                   ByteArrayInputStream byteArrayInputStream = (ByteArrayInputStream)reportClientDoc.getPrintOutputController().export(ReportExportFormat.PDF);
                   //Release report.
                   reportClientDoc.close();
                   //Use the Java I/O libraries to write the exported content to the file system.
                   byte byteArray[] = new byte [byteArrayInputStream.available()];
                   //Create a new file that will contain the exported result.
                   File file = new File(EXPORT_FILE);
                   FileOutputStream fileOutputStream = new FileOutputStream(file);
                   ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(byteArrayInputStream.available());
                   int x = byteArrayInputStream.read(byteArray, 0, byteArrayInputStream.available());
                   byteArrayOutputStream.write(byteArray, 0, x);
                   byteArrayOutputStream.writeTo(fileOutputStream);
                   //Close streams.
                   byteArrayInputStream.close();
                   byteArrayOutputStream.close();
                   fileOutputStream.close();
                   System.out.println("Successfully exported report to " + EXPORT_FILE);
              catch(ReportSDKException ex) {
                   ex.printStackTrace();
              catch(Exception ex) {
                   ex.printStackTrace();
    Edited by: Aasavari Bhave on Oct 29, 2009 2:26 PM
    Edited by: Aasavari Bhave on Oct 29, 2009 2:29 PM

  • Working with R packages for Big Data

    Hi ,
    I wonder which R package from it big data an parallel processing family are relevent to work with in ML Studio ?
    It depends on if ML Studio uses Map Reduce during R script ? If yes , RHadoop package seems not useful .
    If using snowfall package for parallel processing will help for high volume datasets . If it will exploit several CPU ?
    Thanks in advance

    Currently, the R scripts are executed on single VM. You can manually set up map-reduce pattern by splitting the data and having multiple Execute R Script modules in parallel in your experiment graph.
    -Roope

  • Scheduled finish date for process orders not updating in BW

    We are having problems with the scheduled finish date (GLTRS) for process orders. When the process order is already released, changes done in R/3 on the scheduled finish date will not result to a delta hence data in BW is not updated. Table AFKO, where data is coming from, is always updated with the changes though. 2LIS_04_P_MATNR is used in extracting the data.
    Has anyone experienced the same problem? We are looking for possible ways on how to have a delta whenever there are changes on the process order even after it is already released.
    Any help would be greatly appreciated...

    Hi Donna,
    If I've understood your problem correctly, you can increase the load frequency as Sriee has pointed out.
    If you want latest data frequently and if you are on BI 7.0 then you can look at Real Time Data Acquisition(RDA)
    Regards,
    Tom.

  • Where is the data saved in the srtucture for a particular tcode

    Where is the data saved in the srtucture for a particular tcode.
    For eg in MM01 there is a field with RMMG1
    where dioes the data get saved from this field as in backend it shows the stucture and the structure can hold data only at runtime,
    Please help me
    Thanks in advance
    chinnu

    Hi,
    The strcuture : RMMG1 holds data at run time only that to it can hold only one record at a time.
    so we use this structure to hold data at run time and then we insert those values into database tables for that program.
    Here you can the values stored and processed in the program in the tables : MARA (for matnr ,mbrsh,mtart..),MARC (for Plant ),MARD( for storage location : lgort )...etc.
    so if you want to fecth some data means first you have to choose right table to fetch information by passing values in the selection screen.
    Regds
    Sivaparvathi
    Please reward points if helpful...

  • If I use Informatica Big Data Edition do I still need to use Hadoop or MapR or any other similar systems to process data? What all activities will be specific to Informatica BD or Hadoop?

    If I use Informatica Big Data Edition do I still need to use Hadoop or MapR or any other similar systems to process data? What all activities will be specific to Informatica BD and Hadoop?1. My query is to process both structured and unstructured data in real time, so is Informatica Big Data Edition suffecient of do I have to use Hadoop or MapR or Cloudera etc to process data. If so then what all activities will be performed be Informatica and what all activities will require Hadoop.2. Also for scheduling do I need to go for ActiveBatch or I can perform it be native Informatica Scheduler itself?

    If I use Informatica Big Data Edition do I still need to use Hadoop or MapR or any other similar systems to process data? What all activities will be specific to Informatica BD and Hadoop?1. My query is to process both structured and unstructured data in real time, so is Informatica Big Data Edition suffecient of do I have to use Hadoop or MapR or Cloudera etc to process data. If so then what all activities will be performed be Informatica and what all activities will require Hadoop.2. Also for scheduling do I need to go for ActiveBatch or I can perform it be native Informatica Scheduler itself?

  • I am using the big date calendar template and when I submit it to apple for printing I lose the name of two months. These names are not text boxes. I see the names when I send it in but something happens during the transmission to apple. It was suggested

    I am using the big date calendar template in iPhoto. I am on Lion 10.7.2, macbook air. The names of the months are on each calendar page but something happens when I send the data to Apple. The names are part of the template. They are not text boxes. I lose two names on the calendar after it is sent to Apple. Apple suggested I make a pdf file of my calendar before sending it in and check to make sure every name shows. I did this with a calendar I just sent in. The calendar was correct. All names of the months were showing. After sending the data two month names disappeard because when it arrived by mail, it was incorrect. Apple looked at my calendar via a pdf file and it was incorrect.  This is second time this has happened. I called Apple and they had me delete several folders in the Library folder, some preferences and do a complete reinstall of iPhoto.  I have not yet remade the defective calendar. I am wondering if anyone else has had this problem?
    kathy

    Control-click on the background of the view all pages window and select "Preview Calendar" from the contextual menu.
    You can also save the pdf as a file to compare to the printed calendar.  If the two names are visible in the pdf file then the printed copy should show them.  Contact Apple for a refund.  Apple Print Products - Apple Store (U.S.)

  • Good for processing data from a web application?

    It seems like all the examples provided for ASA are about processing data streaming in from IoT or mobile apps. Is ASA appropriate for processing data from websites? For example, I have a multi-tenant web API and I need to roll up usage and calculate billing
    for my clients. My clients can upload resources with me which I store in blob storage. But blob storage gives me no way of knowing how many resources have been uploaded and how long I have stored them. Would ASA be a good fit for calculating these figures?

    I have solved this task in folowing way:
    I have add ADF read only form to my page (which I need anyway). The form displays data selected in the graph (using another VO, which is linked to graph VO). Command button calls my managed bean, which handles the data via the bindings executables (view iterators).

  • What are the setting for Save button for saving plan data in WAD template

    Hi,
    i created SAVE button in Webtemplate for saving plan data.But this button is not
    active.is there any setting for this?
    thanks in advance
    chandu

    Hi Chandu,
    The settings to Configure the  Save function.
    Right click the Button Group Item ® Properties ®  * Button    ® 
    Type Caption  ‘Save’ ® Command    ® Select ‘All Commands’ Tab ® 
    De-select ‘Execute Planning Function’ and then select checkbox ‘Save
    Changed Data’ ® Double click on ‘Save Changed Data’ ® OK ® OK ® Save your template.
    *pls assign points,if info is useful*
    Regards
    CSM Reddy

Maybe you are looking for