Avoid duplicates processing

Hi All
I have a requirement in which I get a flat from source system everyday
I load this flat file into a staging table. This flat file can contain upto 6 million records
Next day when the flat file comes it will come with changed and unchanged records.
ie even if a record has not changed it will again come up in the file, the next day.
From the staging table, this data is processed and goes to other tables.
Since 75% of data will be duplicate, I do not want to process all the data everyday.
How do I do this? Getting only the unchanged records from source system is ruled out and also
I cant do a "diff" command on Unix between new and old files to load only changed records
I am thinking of 2 ways to do it
1) Create a duplicate set of table similar to staging table and load the previous days data in
this. While selecting the data use the MINUS operator on the table and do the processing
2) Retain the previous days data in the same table and while processing SELECT DISTINCT
from the table .
Which is the best way to do this, or any other alternate solutions to this
Performance is a major factor here and am using Oracle 8i.
Thanks
Ashwin N.

If you are going to pre-process the rows to eliminate the duplicates, I would suspect that the MINUS approach would be faster. I would do something like:
load into new
SELECT * FROM new
MINUS
SELECT * FROM old;
processing to real table
DROP TABLE old;
RENAME new TO old;
CREATE TABLE new AS
SELECT * FROM old
WHERE 1=0;However, I'm not entirely convinced that the processing into the real table would be enough faster to offset the time taken to eliminate the duplicates before processing. My first impulse, if you can do your processing in a SQL statement, would be to just load the records into an empty staging table every day and do an update and an insert statement something like:
UPDATE real_table r
SET (col1, co2, col2, ... , coln) =
    (SELECT col1, co2, col2, ... , coln
     FROM staging_table s
     WHERE r.pk1 = s.pk1 AND
           r.pk2 = s.pk2 AND
           (r.col1 <> s.col1 OR
            r.col2 <> s.col2 OR
            r.coln <> s.coln)
WHERE EXISTS (SELECT 1
              FROM staging_table s
              WHERE r.pk1 = s.pk1 AND
                    r.pk2 = s.pk2);
INSERT INTO real_table
SELECT *
FROM staging_table s
WHERE NOT EXISTS (SELECT 1
                  FROM real_table r
                  WHERE s.pk1 = r.pk1 AND
                        s.pk2 = r.pk2);You should rebuild the PK on staging_table after the load. If you do this paralell, nologging, it should be reasonably quick. Even better would be to get the data sorted by PK fro mthe source system, because you could then use the NOSORT option on the index.
TTFN
John

Similar Messages

  • Avoid Duplicate Tasks when Expanding Groups for Custom Task Process

    Is there a way to:
    Avoid Duplicate Tasks when Expanding Groups for Custom Task Process?
    I've got a people metadata column that I am planning on putting groups into.  I want the groups to expand and send a task for all users in the groups.  I also want to avoid creating multiple tasks if a user happens to be in two groups at the same
    time.
    I'm trying to work out a way to assign users a read task based on job training requirements.  Right now assigning groups and using a workflow task to confirm read is what I'm trying to accomplish.  I just end up getting two tasks for a user if
    their in multiple groups.
    David Jenkins

    Hi David,
    Please verify the followings:
    After Participants, select Parallel(all at a once)
    Expand Task Options, select ‘Assign a task to each member within groups’
    Open the action properties, make sure ExpandGroup is Yes
    Also in SharePoint Designer ,you can edit the property for the Start Approval Porcess to enable ExpandGroup:
    Reference:
    https://social.msdn.microsoft.com/Forums/office/en-US/d14da1c4-bd5a-459b-8698-3a89bb01e6ad/expand-groupnot-creating-tasks-for-users-issue-in-sharepoint-2013-designer-workflow?forum=sharepointgeneral
    https://social.technet.microsoft.com/Forums/office/en-US/ac245d45-ff66-4341-815c-79213efc4394/sharepoint-2010-designer-workflows-and-sharepoint-user-groups?forum=sharepointcustomizationprevious
    Best Regards,
    Eric
    TechNet Community Support
    Please remember to mark the replies as answers if they help, and unmark the answers if they provide no help. If you have feedback for TechNet Support, contact
    [email protected]

  • Avoid Duplicate IDOC :

    Hi All,
    I need to code for avoid duplicate IDOC when my program convert one idoc to another IDOC. The Code is written below..
      LOOP  AT  t_seldoc.
        SELECT SINGLE * FROM  edidc
              WHERE docnum  EQ t_seldoc-idoc.
        REFRESH: t_idocst,
                 t_edidd.
        IF edidc-mestyp = c_msg_type.
          MOVE:  c_new_type     TO  edidc-mestyp,
                 c_st69         TO  edidc-status,
                 c_st69         TO  t_seldoc-status,
                 t_seldoc-idoc  TO  t_idocst-docnum.
        ELSE.
          MOVE:  'Z_NGI_SBT_TICKET'     TO  edidc-mestyp,
                 c_st69         TO  edidc-status,
                 c_377          TO  edidc-stdmes, "Add the stdmes for acks
                 c_st69         TO  t_seldoc-status,
                 t_seldoc-idoc  TO  t_idocst-docnum.
        ENDIF.
        APPEND  t_idocst  TO  t_idocst.
        PERFORM  update_idoc.
        READ TABLE t_output  WITH KEY idoc = t_seldoc-idoc.
        MOVE  sy-tabix  TO  l_tabix.
        MOVE c_upd_idoc TO  t_output-status.
        MODIFY t_output INDEX  l_tabix.
        MODIFY t_seldoc.
      ENDLOOP.
    Line: -
    This is the perform statement.
    CHANGE BY Swati Namdev 28042009
    types : begin of ty_vbak,
          vbeln    type   vbak-vbeln,
    end of ty_vbak.
    Data : LT_dup_check type standard table of Z1NG_SBTTICKETHD,
           it_vbak      type standard table of ty_vbak.
    End Here  Swati Namdev 28042009
      CALL FUNCTION 'EDI_DOCUMENT_OPEN_FOR_EDIT'
        EXPORTING
          document_number                     = t_seldoc-idoc
         ALREADY_OPEN                        = 'N'
       IMPORTING
         IDOC_CONTROL                        =
        TABLES
          idoc_data                           =  t_edidd
        EXCEPTIONS
          document_foreign_lock               = 1
          document_not_exist                  = 2
          document_not_open                   = 3
          status_is_unable_for_changing       = 4
          OTHERS                              = 5.
      IF sy-subrc  NE 0.
        MESSAGE ID     sy-msgid
                TYPE   sy-msgty
                NUMBER sy-msgno
                  WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
        EXIT.
      ENDIF.
      LOOP AT t_edidd  WHERE segnam  EQ  c_tickey_hdr.
        MOVE  t_edidd-sdata  TO  z1tickethd.
        IF  z1tickethd-tkt_type  EQ  '0'.
          MOVE  '3'  TO  z1tickethd-tkt_type.
        ELSEIF
            z1tickethd-tkt_type  EQ  '1'.
          MOVE  '4'  TO  z1tickethd-tkt_type.
        ENDIF.
        MOVE  z1tickethd   TO  t_edidd-sdata.
        MODIFY  t_edidd.
      ENDLOOP.
      DATA: z1ng_sbttickethd LIKE z1ng_sbttickethd,
            z1ng_sbtticketdt LIKE z1ng_sbtticketdt,
            z1ng_ticketdt LIKE z1ng_ticketdt.
      LOOP AT t_edidd  WHERE segnam  EQ  'Z1NG_TICKETDT'.
        MOVE  t_edidd-sdata  TO  z1ng_ticketdt.
        CLEAR: z1ng_sbtticketdt.
        MOVE-CORRESPONDING z1ng_ticketdt TO z1ng_sbtticketdt.
        MOVE  z1ng_sbtticketdt  TO  t_edidd-sdata.
        t_edidd-segnam = 'Z1NG_SBTTICKETDT'.
        MODIFY  t_edidd.
        CALL FUNCTION 'EDI_CHANGE_DATA_SEGMENT'
             EXPORTING
                  idoc_changed_data_record = t_edidd
             EXCEPTIONS
                  idoc_not_open            = 1
                  data_record_not_exist    = 2
                  OTHERS                   = 3.
      ENDLOOP.
      LOOP AT t_edidd  WHERE segnam  EQ  'Z1NG_TICKETHD'.
        MOVE  t_edidd-sdata  TO  z1ng_tickethd.
        CLEAR: z1ng_sbttickethd.
        MOVE-CORRESPONDING z1ng_tickethd TO z1ng_sbttickethd.
        MOVE  z1ng_sbttickethd  TO  t_edidd-sdata.
        t_edidd-segnam = 'Z1NG_SBTTICKETHD'.
        MODIFY  t_edidd.
    CHANGE BY Swati Namdev 28042009
       MOVE-CORRESPONDING z1ng_sbttickethd TO LT_dup_check.
        append  z1ng_sbttickethd to LT_dup_check.
    End here Swati Namdev 28042009
      ENDLOOP.
    CHANGE BY Swati Namdev 28042009
      refresh it_vbak. clear it_vbak.
      if lt_dup_check[] is not initial.
         Select vbeln from vbak into table it_vbak for all entries in
                            lt_dup_check where KUNNR = lt_dup_check-CUST
                            and  ZZTKT_NBR = lt_dup_check-TKT_NBR.
        if it_vbak[] is not initial.
            Message text-002  type 'E'.
        endif.
      endif.
    End here Swati Namdev 28042009
      CALL FUNCTION 'EDI_CHANGE_CONTROL_RECORD'
           EXPORTING
                idoc_changed_control         = edidc
           EXCEPTIONS
                idoc_not_open                = 1
                direction_change_not_allowed = 2
                OTHERS                       = 3.
      IF  sy-subrc NE  0.
        MESSAGE ID     sy-msgid
                TYPE   sy-msgty
                NUMBER sy-msgno
                  WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
        EXIT.
      ENDIF.
      CALL FUNCTION 'EDI_CHANGE_DATA_SEGMENT'
           EXPORTING
                idoc_changed_data_record = t_edidd
           EXCEPTIONS
                idoc_not_open            = 1
                data_record_not_exist    = 2
                OTHERS                   = 3.
      IF sy-subrc <> 0.
        MESSAGE ID     sy-msgid
                TYPE   sy-msgty
                NUMBER sy-msgno
                  WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
        EXIT.
      ENDIF.
      CALL FUNCTION 'EDI_DOCUMENT_CLOSE_EDIT'
         EXPORTING
            document_number        = t_seldoc-idoc
            do_commit              = c_yes
            do_update              = c_yes
               WRITE_ALL_STATUS       = 'X'
         TABLES
                STATUS_RECORDS     =  T_EDI_DS40
         EXCEPTIONS
            idoc_not_open          = 1
            db_error               = 2
            OTHERS                 = 3.
      IF sy-subrc <> 0.
        MESSAGE ID     sy-msgid
                TYPE   sy-msgty
                NUMBER sy-msgno
                  WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
        EXIT.
      ENDIF.
      CALL FUNCTION 'IDOC_STATUS_WRITE_TO_DATABASE'
           EXPORTING
                idoc_number = t_seldoc-idoc
           TABLES
                idoc_status = t_idocst.
      COMMIT WORK.
      CALL FUNCTION 'EDI_DOCUMENT_DEQUEUE_LATER'
           EXPORTING
                docnum = t_seldoc-idoc.
      COMMIT WORK.
    ENDFORM.                    " UPDATE_IDOC
    AT  present I am checking if IDOC is Duplicate giving error message but now I have to set status as 51 and for duplicate idoc and run for remaining.
    Please provide the solution.
    regards
    Swati
    Edited by: Swati Namdev on May 5, 2009 11:26 AM
    Edited by: Swati Namdev on May 5, 2009 11:28 AM
    Edited by: Swati Namdev on May 5, 2009 11:28 AM
    Edited by: Swati Namdev on May 5, 2009 11:29 AM
    Edited by: Swati Namdev on May 5, 2009 11:32 AM

    Hi all
      any inputs on this pl...?
      Q: If the same idoc is received second time then how to stop the processing
           the duplicate idoc ?
           (I understood the question this way )
    regards

  • Avoiding duplicates in Pi

    Hi All,
    I am having scenario  where sap is getting data from ERP system  trough PI ( JMS to ABAP Proxy). Is there any mechanism to avoid duplicates with out sending to sap which are coming form sender. If yes can you please provide me the  procedure.
    Regards,
    Rama

    Write an adapter module to stop processing duplicates in PI.
    Please refer this blog
    /people/sandeep.jaiswal/blog/2008/05/13/adapter-module-to-stop-processing-of-duplicate-file-ftp-location

  • Avoid duplicate standard receipe qty

    Dear All,
           I have found one query when i am making one report. In C203 t.code we can see product receipe. Generally receipe group is only one for one product but in some products i have found two receipe group like 5....100 & 5...200 and it is ok and it happens.
    Now i need to fetch standard qty for input materials vs process order qty for input materials. so currently i can fetch two receipe group like 0001...820 for one receipe group and 0001...820 for second receipe group but i need only one receipe group qty. currently it seems double standard qty against process order qty because BOM no(STLNR) is same for both receipe group.
    I can also see in COR3 t.code in master data tab, there is defined particular receipe group like 5...100. and this effect we see in AFKO table. But mainly i need std.qty of receipe so i have found STAS,STKO and STPO table.In STPO table i can see std.qty of input materials and in STKO we can see Product no and its batch size.  STLAL field in STAS table and also in STKO but noy in STPO for linking purpose. Now in STPO i can see like,
    STLNR        IDNRK           Qty 
    00000639   0001...820    50
    00000639   0001...820    50
    In my report std.qty comes 100 but i want 50 qty because i have not ound any link to filter one BOM no.(STLNR).
    Is there any other tables that i can search or what to do.
    Regards,
    Shivam.

    Hi! shivam pastagia
                                      u can use delete adjacent syntax to avoid duplicate records in internal table.
    STLNR IDNRK Qty
    00000639 0001...820 50
    00000639 0001...820 50
    sort itab by stlnr idrnk etc..
    DELETE ADJACENT DUPLICATES FROM itab comparing stlnr idrnk tetc...
    Regards,
    Mohammed Rasul.S

  • Avoid duplicate records

    hi guys
    could u pls let me know where is the option for avoiding duplicate records?
    1. in case of Info package
    2.In case of DTP?

    Hi,
    Incase of infopackage in 3.5 - > Processing tab -> select only PSA ,update subsequent data tagets,ignore double data records
    in 7.0 processing tab by default the selection is only PSA
    Incase of DTP - >update tab -> select handle duplicate data records.

  • How to avoid Duplicate Records  while joining two tables

    Hi,
    I am trying to join three tables, basically two tables are same one is like history table, so I wrote a query like
    select
    e.id,
    e.seqNo,
    e.name,
    d.resDate,
    d.details
    from employees e,
    ((select * from dept)union(select * from dept_hist)) d
    join on d.id=e.id and e.seqno=d.seqno
    but this returing duplicate records.
    Could anyone please tell me how to avoid duplicate records of this query.

    Actually it is like if the record is processed it will be moved to hist table, so both table will not have same records and I need the record from both the tables so i have done the union of both the tables, so d will have the union of both records.
    But I am getting duplicate records if even I am distinct.

  • How do I "avoid" duplicate calendar entries on my devices?

    I have an iPod touch, an iPhone 4, an iPad 2 and a MacBook Pro running OS X 10.8.2.  My software/IOS is up to date on all devices.  My main calendar is my Google Calendar.  Whenever I sync, my devices show multiple double calendar entries.  Now, I have found a large amount of information on removing the duplicate entries which has been helpful.  However, I am having difficulty finding an answer as to how to eliminate this in the future.  What I want is to be able to go into iCal, on any of my devices, and see the same calendar (which it appears I currently can) without ever having double entries for the same events.  How do I need to have it set up?  Yes, I have searched ad nauseum; but, I can't find my answer.  Maybe I am just incompetent.  Any help and/or guidance would be greatly appreciated.

    O.K., first, thanks for the response Mistimp.  Here is my specific situation, my google calendar is my "main" calendar and is what populates iCal.  I hope that makes sense.  I want to be able to make calendar entries on any device either in iCal or Google.  I believe I can do that.  I'll have to check further.  If I'm not mistaken, the duplicate entries show up every time I sync.  Then, I have to go through the process of turning off a calendar and then re adding it. I want to avoid that step and just avoid duplicate entries forever.

  • How to look for duplicate process instances?

    In Oracle BPM 11g, is there a good way to look for duplicate process instances based on process data attributes? For example, I have entered an instance of a process for 'John Smith' with a date of '4/1/2010' and I want to see if there is another instance in the same process with these same data values to evaluate as a potential duplicate. I believe we can write a java service to invoke the API to do this, but I am wondering if there is a better way within the process design to do this (XPath extension functions or soemthing?). It seems like this would be a common need.

    I am really looking for an approach to looking for duplicate instances within my process flow, not from EM. So, if the user starts a new instance of a process, I can check for another instance that appears to be a duplicate and direct the flow to a human activity to review the potential duplicate and make a decision as to whether to continue processing the new instance or reject it. My guess is that we need to us a service task to invoke java code which uses the API to investigate other instances with the same values. But, I was hoping for a simpler solution....I have to think that this is not too uncommon.

  • Avoid duplicate batch (batch managment)

    dear all,
    We are facing problem related to batch managment .we are using manual batch entry .we don't want to make duplicate entry of same batch (which have already assign to material) against any material.what is solution to avoid duplicate batch entry.
    can u tell me the settings and any user exit in which we can avoid the duplicate batch.
    regards

    Hi hema,
                   in our scenario user  manually enter  batch of raw material in MIGO when we do good recipte in MIGO againts purchase order we enter batch manually.but we need that a batch which has already assign to raw material can not be assign again .if user enter the previous batch then system gives a error that batch has already exist.
    may be u know the prob

  • How to avoid duplicate posting of noted items for advance payment requests?

    How to avoid duplicate posting of noted items for advace payments request?

    Puttasiddappa,
    In the PS module, we allow the deletion of a component pruchase requisition allthough a purchase order exists. The system will send message CN707 "<i>A purchase order already exists for purchase requisition &</i>" as an Iinformation message by design to allow flexible project management.
    If you, however, desire the message CN707 to be of type E you have to
    modify the standard coding. Doing so, using SE91, you can invoke the
    where-used-list of message 707 in message class CN, and to change the
      i707(cn)
    to
      e707(cn)
    where desired.
    Also, user exit CNEX0039 provides the possibility to reject the
    deletion of a component according to customers needs e. g. you may
    check here whether a purchase order exists and reject the deletion.
    Hope this helps!
    Best regards
    Martina Modolell

  • How to avoid duplicate BOM Item Numbers?

    Hello,
    is there a way to avoid duplicate BOM Item Numbers (STPO-POSNR) within one BOM?
    For Routings I could avoid duplicate Operation/Activity Numbers with transaction OP46 by setting T412-FLG_CHK = 'X' for Task List Check. Is there an aquivalent for BOMs?
    Regards,
    Helmut Gante

    Hello,
    is there a way to avoid duplicate BOM Item Numbers (STPO-POSNR) within one BOM?
    For Routings I could avoid duplicate Operation/Activity Numbers with transaction OP46 by setting T412-FLG_CHK = 'X' for Task List Check. Is there an aquivalent for BOMs?
    Regards,
    Helmut Gante

  • #MULTIVALUE even affter checking avoid duplicate row agg.

    Hi experts
    I am getting multivalue error in few rows even after checking the option of avoid duplicate row  agg.
    any ideas
    regards

    Hi,
    #Multivalue :- this error will occur in 3ways
    1) #multivalue in aggregation -
      the o/p context not include i/p context its  situation this error occurs.
    2) #multivalue in breaks header or footer
    3) #multivalue in section level.
    Please provide us with the description of the issue u r facing.
    Regards,
    Chitha.

  • How can I avoid duplicates on contacts and how do I get contacts created on iPhone/ipad synchronized on my mac? so far it doesn't work correctly, just sometimes. same for icalendar

    how can I avoid duplicates on contacts and how do I get contacts created on iPhone/ipad synchronized on my mac? so far it doesn't work correctly, just sometimes. same for icalendar

    On your Mac, for duplicates, switching Contacts off then back on in System Preferences > iCloud may prevent duplicates.
    On the iPhone / iPad tap Settings > iCloud. Make sure Contacts and Calendars are swtiched on.
    Try restarting your Mac and your iOS devices when items won't sync as they should.
    To restart an iOS device:  Hold the On/Off Sleep/Wake button down until the red slider appears. Slide your finger across the slider to turn off iPhone. To turn iPhone back on, press and hold the On/Off Sleep/Wake button until the Apple logo appears.

  • How to avoid duplicate record in a file to file

    Hi Guys,
              Could you please provide a soultion
              in order  to avoid duplicate entries in a flat file based on key field.
              i request in terms of standard functions
             either at message mappingf level or by configuring the file adapter.
    warm regards
    mahesh.

    hi mahesh,
    write module processor for checking the duplicate record in file adapter
    or
    With a JAVA/ABAP mapping u can eliminate the duplicate records
    and check this links
    Re: How to Handle this "Duplicate Records"
    Duplicate records
    Ignoring Duplicate Records--urgent
    Re: Duplicate records frequently occurred
    Re: Reg ODS JUNK DATA
    http://help.sap.com/saphelp_nw2004s/helpdata/en/d0/538f3b294a7f2de10000000a11402f/frameset.htm
    regards
    srinivas

Maybe you are looking for

  • I can't login to App Store after upgrading to Mavericks.

    I read through all the suggestions here, nothing help. My username and password are correct, but still not work. I can't log in, means I can't update my iWork, can't purchase new apps. Should I still wait till Apple fix the problem or should I revers

  • RE: (forte-users) Sv: (forte-users) The Death ofForte

    This is what I got today: Statement of Direction Sun Microsystems, Inc. Forté 4GL(tm) Product (formerly the Forté Application Environment) Product Context · Forté 4GL is an award-winning, proven product with many unique advantages for building enterp

  • 641 movement type in Project

    Dear Experts, Is it possible to assign movement type 641 in Project for non valuated material (packaging material). I have checked it in T_CODE: OPT1, where we can maintain single movement type at a time. Our requirement is for both 281 and 641moveme

  • How to use two sensors.

    I have two sensors because I use two different pair of shoes. I had them both going to the same Nike account until I updated to the new itunes. There is no change button in the new version of iTunes and I can't get the runs to go to the Nike account

  • Triggering GigE Camera with Hardware Trigger

    Hello, Here is an outline of what I want to accomplish: -LabView program starts running and waits for GigE camera to output frames -Hardware trigger leads to GigE camera outputting frames -Some simple arithmetic is done on each frame to generate the