Best way to remove duplicates based on multiple tables

Hi,
I have a mechanism which loads flat files into multiple tables (can be up to 6 different tables) using external tables.
Whenever a new file arrives, I need to insert duplicate rows to a side table, but the duplicate rows are to be searched in all 6 tables according to a given set of columns which exist in all of them.
In the SQL Server Version of the same mechanism (which i'm migrating to Oracle) it uses an additional "UNIQUE" table with only 2 columns(Checksum1, Checksum2) which hold the checksum values of 2 different sets of columns per inserted record. when a new file arrives it computes these 2 checksums for every record and look it up in the unique table to avoid searching all the different tables.
We know that working with checksums is not bulletproof but with those sets of fields it seems to work.
My questions are:
should I use the same checksums mechanism? if so, should I use the owa_opt_lock.checksum function to calculate the checksums?
Or should I look for duplicates in all tables one after the other (indexing some of the columns we check for duplicates with)?
Note:
These tables are partitioned with day partitions and can be very large.
Any advice would be welcome.
Thanks.

>
I need to keep duplicate rows in a side table and not load them into table1...table6
>
Does that mean that you don't want ANY row if it has a duplicate on your 6 columns?
Let's say I have six records that have identical values for your 6 columns. One record meets the condition for table1, one for table2 and so on.
Do you want to keep one of these records and put the other 5 in the side table? If so, which one should be kept?
Or do you want all 6 records put in the side table?
You could delete the duplicates from the temp table as the first step. Or better
1. add a new column WHICH_TABLE NUMBER to the temp table
2. update the new column to -1 for records that are dups.
3. update the new column (might be done with one query) to set the table number based on the conditions for each table
4. INSERT INTO TABLE1 SELECT * FROM TEMP_TABLE WHERE WHICH_TABLE = 1
INSERT INTO TABLE6 SELECT * FROM TEMP_TABLE WHERE WHICH_TABLE = 6
When you are done the WHICH_TABLE will be flagged with
1. NULL if a record was not a DUP but was not inserted into any of your tables - possible error record to examine
2. -1 if a record was a DUP
3. 1 - if the record went to table 1 (2 for table 2 and so on)
This 'flag and then select' approach is more performant than deleting records after each select. Especially if the flagging can be done in one pass (full table scan).
See this other thread (or many, many others on the net) from today for how to find and remove duplicates
Best way of removing duplicates

Similar Messages

  • Best way to do a lookup with multiple tables

    Hello,
    I am looking for an example or how to do a lookup through ESB XSL transformation. What I am wanting to accomplish is something like below:
    I am reading in a flat file and then want to use a field in that file to do a lookup in multiple tables.
    i.e.
    Select c.customer_name
    from customers c, order_lines ol, order_headers oh
    where ol.customer_id = c.customer_id
    and ol.header_id = oh.header_id
    and oh.order_number = p_order_num;
    I know that there is the lookup-table function but didnt know if that would work with this scenario. Any suggestions are appreciated.
    Thanks,
    Jason
    Edited by: Colby J on Oct 21, 2008 11:13 AM

    Hello,
    I am looking for an example or how to do a lookup through ESB XSL transformation. What I am wanting to accomplish is something like below:
    I am reading in a flat file and then want to use a field in that file to do a lookup in multiple tables.
    i.e.
    Select c.customer_name
    from customers c, order_lines ol, order_headers oh
    where ol.customer_id = c.customer_id
    and ol.header_id = oh.header_id
    and oh.order_number = p_order_num;
    I know that there is the lookup-table function but didnt know if that would work with this scenario. Any suggestions are appreciated.
    Thanks,
    Jason
    Edited by: Colby J on Oct 21, 2008 11:13 AM

  • Best Way to Remove Duplicate Movies

    I'm a professor and use Keynote for my lectures. I noticed that my lecture files are HUGE (my summary lecture was over 650mb). When I looked at the package I found several duplicates of movies and images that I use in my slides. Is there any efficient and risk free way to delete the extras? Is the a way of putting them in without making duplicates in the first place? I often duplicate slides and move things around each term to improve my class.

    Really easy:
    Find all the movies in your iPhoto Library. Then Export them to a Folder. Then delete them from iPhoto.
    What makes this easy is that every movie is automatically keyworded when you import it.
    So to find all the Movies:
    File -> New Smart Album
    Keyword -> is -> Movie
    Now that you've found them. Select All and
    File -> Export *_Kind: Original_* (Note: setting the Kind: to Original is vital)
    And export them.
    Then trash them from iPhoto.
    Regards
    TD

  • Best way to implement oracle TEXT on multiple tables with regular updates

    Hi,
    I have the following situation:
    5 tables where we want full text search on multiple columns.
    Some of the tables have a master/detail relation. (1 to 1000, or more)
    because of the number of transactions on these tables we can't have a lag in the sync time.
    Currently I have create a dummy table just for the search with 2 columns: for the primary key to all the other tables and one for the update trigger.
    I use the user_datastore with a procedure to join all the necessary columns resulting in a clob.
    My question is regarding the update.
    Of course I can create triggers to update the dummy field in the search table, but this will give lot of updates on that table with possible locking issues.
    What would be the best approach to have this search functionality working?
    I am open for any ideas!
    Thanks,
    Edward

    Ok, I will focus on building a solution on 12c.
    right now I have used a USER_DATASTORE with a procedure to glue all the field together in one document.
    This works fine for the search.
    I have created a dummy table on which the index is created and also has an extra field which contains the key related to all the tables.
    So, I have the following tables:
    dummy_search
    contracts
    contract_ref
    person_data
    nac_data
    and some other tables...
    the current design is:
    the index is on dummy_search.
    When we update contracts table a trigger will update dummy_search.
    same configuration for the other tables.
    Now we see locking issues when having a lot of updates on these tables as the same time.
    What is you advice for this situation?
    Thanks,
    Edward

  • Best way to find duplicate pages / find duplicate background images

    What is the best way to detect duplicate pages?
    The pages I am dealing with are searchable image (scanned Image background with selectable text overtop). In this case, Any two pages that have the exact same background image will be duplicate.
    I only know how to get page text though, so I've been getting the text and hashing it, then checking for duplicate hashes. This works for the most part, but I fear running into two different pages with the exact same text.
    What about looking at the background image? If a PDF has multiple pages with the same background image, I assume it would store the image once and then just reference it from the pages? Is it possible to check duplicate pages this way?
    Or Does Acrobat have a built-in checking solution I haven't discovered? As always, any help is appreciated

    Ok, well for the most part doing it by text works, but it sometimes flags things that arn't duplicate: such as two of the same worksheets that were not filled out will have the exact same text, despite being completely different pages

  • Best way to handle duplicate headings stemming from linked TOC book?

    What's the best way to handle duplicate topic titles stemming from TOC books that contain links to a topic that you want to have appear in the body? The problem I've had for years now is that the TOC generates one heading, and the topic itself generates one heading.This results in duplicate headings in the printed output.
    I have a large ~2500 topic project I have to print every release, and to date we've been handling it in post-build word macros, but it's not 100% effective, so we're looking to fix this issue on the source side of the fence. On some of our smaller projects, we've actually marked with the heading in the topic itself with the Online CBT and that seems to work. We're thinking of doing the same to our huge project unless there's a better way of handling this. Any suggestions?

    See the tip immediately above this link. http://www.grainge.org/pages/authoring/printing/rh9_printing.htm#wizard_page3
    The alternative is to remove the topic from the print layout so that it only generates by virtue of its link to the book.
    See www.grainge.org for RoboHelp and Authoring tips
    @petergrainge

  • Best way to remove Stateful session beans

    Hi folks.
    I'm running Weblogic 6.1. I'm trying to find the best way of removing
    stateful session beans. I know to call EJBObject.remove() on the
    client side, but this will not always happen if the client crashes or
    times out. This is a java client application connection to weblogic,
    no servlets are involved.
    Is there a way to signal the appserver to remove all stateful session
    beans associated with a user when the User logs out? I would rather
    not remove them using a time out mechanism.
    thanks.
    rob.

    But in the documentation and also based on my experience I noticed that the
    timeout does not take effect till the max-beans-in-cache limit is reached.
    How do you handle that?
    "Thomas Christen" <[email protected]> wrote in message
    news:3e35795d$[email protected]..
    Hi,
    Is there a way to signal the appserver to remove all stateful session
    beans associated with a user when the User logs out? I would rather
    not remove them using a time out mechanism.Had the same problem and solved it the following way :
    - The client has thread polling its sessionbean at the server (every 30
    Sec.)
    - The session bean has a short timeout (2 Minutes)
    If the client fails, the timeout will catch it otherwise the client will
    gracefully call remove bevor exit.
    Regards
    Tomy

  • Best way to remove workstation?

    Hi! when we move a computer to another room, we rename the computer, then do a WSREG -UNREG and WSREG, the new workstation object is created, but the old workstation object stay there.
    If I delete the old object, the information stays on the sybase db!
    So what is the best way to remove a workstation object?
    I have a remove policy that work, but if I dont want to wait?
    thank you,
    Eric.

    eric,
    It appears that in the past few days you have not received a response to your
    posting. That concerns us, and has triggered this automated reply.
    Has your problem been resolved? If not, you might try one of the following options:
    - Visit http://support.novell.com and search the knowledgebase and/or check all
    the other self support options and support programs available.
    - You could also try posting your message again. Make sure it is posted in the
    correct newsgroup. (http://forums.novell.com)
    Be sure to read the forum FAQ about what to expect in the way of responses:
    http://support.novell.com/forums/faq_general.html
    If this is a reply to a duplicate posting, please ignore and accept our apologies
    and rest assured we will issue a stern reprimand to our posting bot.
    Good luck!
    Your Novell Product Support Forums Team
    http://support.novell.com/forums/

  • Remove duplicates based on a condition

    Hi all,
    I need help on a query to remove duplicates based on a condition.
    E.g. My table is
    FE CC DATE FLAG
    FE1 CC1 10/10 FB
    FE1 CC1 9/10 FB
    FE1 CC1 11/10 AB
    FE1 CC2 9/10 AB
    FE1 CC2 10/10 FB
    FE1 CC2 11/10 AB
    I want to remove all duplicate rows on FE and CC based on the below condition :
    DATE <MAX(DATE) WHERE FLAG='FB'
    That means I want to remove the row FE1 CC1 9/10 FB
    but not the rows
    FE1 CC1 10/10 FB
    and
    FE1 CC1 11/10 AB
    as only the row FE1 CC1 9/10 FB has date <MAX(DATE) WHERE FLAG='FB'.
    Similarly I want to keep
    FE1 CC2 10/10 FB
    FE1 CC2 11/10 AB
    but not
    FE1 CC2 9/10 AB
    Many thanks.

    Hi,
    Do you want to DELETE rows from the table, or just not show some rows in the output? Since you're talking about a "query", rather that a "DELETE statement", I'll assume you want to leave those rows in the table, but not show them in the output.
    Here's one way:
    WITH     got_r_num     AS
         SELECT     fe, cc, dt, flag
         ,     RANK () OVER ( PARTITION BY  fe_cc, flag
                          ORDER BY          dt       DESC
                        )          AS r_num
         FROM     table_x
    SELECT     fe
    ,     cc
    ,     TO_CHAR (dt, 'fmMM/YY')          AS dt
    ,     flag
    FROM     got_r_num
    WHERE     flag     != 'FB'
    OR     r_num     = 1
    ;if you'd care to post CREATE TABLE and INSERT statements for your sample data, then I could test it.
    This assumes that the column you called DATE (which is not a good column name, so I called it dt) is a DATE, and that you are displaying it in MM/YY format.
    This also assumes that dt and flag are never NULL.
    If I guessed wrong about these things, then the query can be changed; it will just be a little messier.

  • Crooked stickers / decals on my new X220 - best way to remove ?

    ok, is it just me or are other also extremely annoyed by the fact that they just shelled out over a grand for a notebook and it arrives with all stickers put on there crooked.
    From the energy star sticker on the outside to the Windows 7 decal, the lenovo enhanced experience and the intel inside.
    it is like buying a 911 and the 911 logo on the back is crooked !!!
    why do they not just leave them off instead of making it look like a cheap notebook ???
    (click i have this question too or reply here please)
    and on a second note - what is the best way to remove said decals since i do not and cannot live with them like this and the soft touch surface makes it a pain in the butt to remove them.
    blowdryer, razor blade and goo-gone ? (extremely careful to only use the blade to lift the sticker not touch the notebook)
    and on a positive note - i am pretty amazed at the battery time i am getting on the new machine and the speed of the i7 and the 6GB of RAM. but i am still uninstalling crap and installing all the things i need
    (see also Firefox 13.0 and password manager not working AGAIN thread).....
    thanks in advance for your replies.
    PS: yes, i am aware i am suffering from OCD - but still ....
    someone is already getting paid to put them on there - why not do it right ?

    if a chemical is needed to clean the residue, off-the-shelf rubbing alchohol is best.   'goo gone' is petrolium-based and can damage or discolor the plastic.   i agree with lead_org though—they'll likely just pull off without much of a fight.
    the decals are there because of regulations and contractual advertising.   energy star devices require a decal stating compliance.   intel and microsoft require decals as part of including their products with PCs.   PC manufacturers could pay intel and microsoft extra to not include the decals but the costs would be passed onto the system price and consumers would complain about the increase.
    given how easy it is to remove the decals, i pull them off when a system arrives and enjoy a clean palmrest.
    ThinkStation C20
    ThinkPad X1C · X220 · X60T · s30 · 600

  • Best way to remove last line-feed in text file

    What is the best way to remove last line-feed in text file? (so that the last line of text is the last line, not a line-feed). The best I can come up with is: echo -n "$(cat file.txt)" > newfile.txt
    (as echo -n will remove all trailing newline characters)

    What is the best way to remove last line-feed in text file? (so that the last line of text is the last line, not a line-feed). The best I can come up with is: echo -n "$(cat file.txt)" > newfile.txt
    (as echo -n will remove all trailing newline characters)
    According to my experiments, you have removed all line terminators from the file, and replaced those between lines with a space.
    That is to say, you have turned a multi-line file into one long line with no line terminator.
    If that is what you want, and your files are not very big, then your echo statement might be all you need.
    If you need to deal with larger files, you could try using the 'tr' command, and something like
    tr '
    ' ' ' <file.txt >newfile.txt
    The only problem with this is, it will most likely give you a trailing space, as the last newline is going to be converted to a space. If that is not acceptable, then something else will have to be arranged.
    However, if you really want to maintain a multi-line file, but remove just the very last line terminator, that gets a bit more complicated. This might work for you:
    perl -ne '
    chomp;
    print "
    " if $n++ != 0;
    print;
    ' file.txt >newfile.txt
    You can use cat -e to see which lines have newlines, and you should see that the last line does not have a newline, but all the others still do.
    I guess if you really did mean to remove all newline characters and replace them with a space, except for the last line, then a modification of the above perl script would do that:
    perl -ne '
    chomp;
    print " " if $n++ != 0;
    print;
    ' file.txt >newfile.txt
    Am I even close to understanding what you are asking for?

  • How is the best way to remove something from a photo?

    How is the best way to remove something from a photo?

    This is difficult to answer without fully knowing what you are trying to do.
    That said, a few excellent and user friendly retouching tools include:  The Spot Heealing Brush Tool, Healing Brush Tool, Patch Tool, and the Cloning Stamp Tool.

  • I got some hair spray on my new retina display screen. What is the best way to remove.

    I got some hair spray on my new Mac Book Pro Retina Display. Any thoughts on the best way to remove?

    I would use this.  I use it on my MBPs and it does an excellent job.  I cannot say with authority that it will remove your hair spay residue.
    Ciao.
    http://www.soap.com/p/windex-for-electronics-aerosol-97299?site=CA&utm_source=Go ogle&utm_medium=cpc_S&utm_term=ASJ-294&utm_campaign=GoogleAW&CAWELAID=1323111033 &utm_content=pla&adtype=pla&cagpspn=pla&noappbanner=true
    I clicked the reply button too early.
    Message was edited by: OGELTHORPE

  • What is the best way to create shared variable for multiple PXI(Real-Time) to GUI PC?

    What is the best way to create shared variable for multiple Real time (PXI) to GUI PC? I have 16 Nos of PXI system in network and 1 nos of GUI PC. I want to send command to all the PXI system with using single variable from GUI PC(Like Start Data acquisition, Stop data Acquisition) and I also want data from each PXI system to GUI PC display purpose. Can anybody suggest me best performance system configuration. Where to create variable?(Host PC or at  individual PXI system).

    Dear Ravens,
    I want to control real-time application from host(Command from GUI PC to PXI).Host PC should have access to all 16 sets PXI's variable. During communication failure with PXI, Host will stop data display for particular station.
    Ravens Fan wrote:
    Either.  For the best performance, you need to determine what that means.  Is it more important for each PXI machine to have access to the shared variable, or for the host PC to have access to all 16 sets of variables?  If you have slowdown or issue with the network communication, what kinds of problems would it cause for each machine?
    You want to located the shared variable library on whatever machine is more critical.  That is probably each PXI machine, but only you know your application.
    Ravens Fan wrote:
    Either.  For the best performance, you need to determine what that means.  Is it more important for each PXI machine to have access to the shared variable, or for the host PC to have access to all 16 sets of variables?  If you have slowdown or issue with the network communication, what kinds of problems would it cause for each machine?
    You want to located the shared variable library on whatever machine is more critical.  That is probably each PXI machine, but only you know your application.

  • What's the best way to remove inactive iChat users from jabberd2.db?

    I'm about to run Autobuddy for users on my iChat server. However, there are several users that are no longer around and I don't want their records showing up in everyone's buddy list.
    What's the safest/best way to remove them?
    My plan is to use sqlite3 on the command line and use SQL to remove the entries from the "active" table, but I don't know what impact that may have on the rest of the database.
    Any thoughts or suggestions?

    Never mind...
    Thought I had looked through enough threads.  Found the following just after posting my question:
    /usr/bin/jabber_autobuddy -d [email protected]
    Works like a charm.

Maybe you are looking for