Best way to remove duplicates based on multiple tables

Hi,
I have a mechanism which loads flat files into multiple tables (can be up to 6 different tables) using external tables.
Whenever a new file arrives, I need to insert duplicate rows to a side table, but the duplicate rows are to be searched in all 6 tables according to a given set of columns which exist in all of them.
In the SQL Server Version of the same mechanism (which i'm migrating to Oracle) it uses an additional "UNIQUE" table with only 2 columns(Checksum1, Checksum2) which hold the checksum values of 2 different sets of columns per inserted record. when a new file arrives it computes these 2 checksums for every record and look it up in the unique table to avoid searching all the different tables.
We know that working with checksums is not bulletproof but with those sets of fields it seems to work.
My questions are:
should I use the same checksums mechanism? if so, should I use the owa_opt_lock.checksum function to calculate the checksums?
Or should I look for duplicates in all tables one after the other (indexing some of the columns we check for duplicates with)?
Note:
These tables are partitioned with day partitions and can be very large.
Any advice would be welcome.
Thanks.

>
I need to keep duplicate rows in a side table and not load them into table1...table6
>
Does that mean that you don't want ANY row if it has a duplicate on your 6 columns?
Let's say I have six records that have identical values for your 6 columns. One record meets the condition for table1, one for table2 and so on.
Do you want to keep one of these records and put the other 5 in the side table? If so, which one should be kept?
Or do you want all 6 records put in the side table?
You could delete the duplicates from the temp table as the first step. Or better
1. add a new column WHICH_TABLE NUMBER to the temp table
2. update the new column to -1 for records that are dups.
3. update the new column (might be done with one query) to set the table number based on the conditions for each table
4. INSERT INTO TABLE1 SELECT * FROM TEMP_TABLE WHERE WHICH_TABLE = 1
INSERT INTO TABLE6 SELECT * FROM TEMP_TABLE WHERE WHICH_TABLE = 6
When you are done the WHICH_TABLE will be flagged with
1. NULL if a record was not a DUP but was not inserted into any of your tables - possible error record to examine
2. -1 if a record was a DUP
3. 1 - if the record went to table 1 (2 for table 2 and so on)
This 'flag and then select' approach is more performant than deleting records after each select. Especially if the flagging can be done in one pass (full table scan).
See this other thread (or many, many others on the net) from today for how to find and remove duplicates
Best way of removing duplicates

Similar Messages

Best way to do a lookup with multiple tables

Hello,
I am looking for an example or how to do a lookup through ESB XSL transformation. What I am wanting to accomplish is something like below:
I am reading in a flat file and then want to use a field in that file to do a lookup in multiple tables.
i.e.
Select c.customer_name
from customers c, order_lines ol, order_headers oh
where ol.customer_id = c.customer_id
and ol.header_id = oh.header_id
and oh.order_number = p_order_num;
I know that there is the lookup-table function but didnt know if that would work with this scenario. Any suggestions are appreciated.
Thanks,
Jason
Edited by: Colby J on Oct 21, 2008 11:13 AM

Hello,
I am looking for an example or how to do a lookup through ESB XSL transformation. What I am wanting to accomplish is something like below:
I am reading in a flat file and then want to use a field in that file to do a lookup in multiple tables.
i.e.
Select c.customer_name
from customers c, order_lines ol, order_headers oh
where ol.customer_id = c.customer_id
and ol.header_id = oh.header_id
and oh.order_number = p_order_num;
I know that there is the lookup-table function but didnt know if that would work with this scenario. Any suggestions are appreciated.
Thanks,
Jason
Edited by: Colby J on Oct 21, 2008 11:13 AM

Best Way to Remove Duplicate Movies

I'm a professor and use Keynote for my lectures. I noticed that my lecture files are HUGE (my summary lecture was over 650mb). When I looked at the package I found several duplicates of movies and images that I use in my slides. Is there any efficient and risk free way to delete the extras? Is the a way of putting them in without making duplicates in the first place? I often duplicate slides and move things around each term to improve my class.

Really easy:
Find all the movies in your iPhoto Library. Then Export them to a Folder. Then delete them from iPhoto.
What makes this easy is that every movie is automatically keyworded when you import it.
So to find all the Movies:
File -> New Smart Album
Keyword -> is -> Movie
Now that you've found them. Select All and
File -> Export *_Kind: Original_* (Note: setting the Kind: to Original is vital)
And export them.
Then trash them from iPhoto.
Regards
TD

Best way to implement oracle TEXT on multiple tables with regular updates

Hi,
I have the following situation:
5 tables where we want full text search on multiple columns.
Some of the tables have a master/detail relation. (1 to 1000, or more)
because of the number of transactions on these tables we can't have a lag in the sync time.
Currently I have create a dummy table just for the search with 2 columns: for the primary key to all the other tables and one for the update trigger.
I use the user_datastore with a procedure to join all the necessary columns resulting in a clob.
My question is regarding the update.
Of course I can create triggers to update the dummy field in the search table, but this will give lot of updates on that table with possible locking issues.
What would be the best approach to have this search functionality working?
I am open for any ideas!
Thanks,
Edward

Ok, I will focus on building a solution on 12c.
right now I have used a USER_DATASTORE with a procedure to glue all the field together in one document.
This works fine for the search.
I have created a dummy table on which the index is created and also has an extra field which contains the key related to all the tables.
So, I have the following tables:
dummy_search
contracts
contract_ref
person_data
nac_data
and some other tables...
the current design is:
the index is on dummy_search.
When we update contracts table a trigger will update dummy_search.
same configuration for the other tables.
Now we see locking issues when having a lot of updates on these tables as the same time.
What is you advice for this situation?
Thanks,
Edward

Best way to find duplicate pages / find duplicate background images

What is the best way to detect duplicate pages?
The pages I am dealing with are searchable image (scanned Image background with selectable text overtop). In this case, Any two pages that have the exact same background image will be duplicate.
I only know how to get page text though, so I've been getting the text and hashing it, then checking for duplicate hashes. This works for the most part, but I fear running into two different pages with the exact same text.
What about looking at the background image? If a PDF has multiple pages with the same background image, I assume it would store the image once and then just reference it from the pages? Is it possible to check duplicate pages this way?
Or Does Acrobat have a built-in checking solution I haven't discovered? As always, any help is appreciated

Ok, well for the most part doing it by text works, but it sometimes flags things that arn't duplicate: such as two of the same worksheets that were not filled out will have the exact same text, despite being completely different pages

Best way to handle duplicate headings stemming from linked TOC book?

What's the best way to handle duplicate topic titles stemming from TOC books that contain links to a topic that you want to have appear in the body? The problem I've had for years now is that the TOC generates one heading, and the topic itself generates one heading.This results in duplicate headings in the printed output.
I have a large ~2500 topic project I have to print every release, and to date we've been handling it in post-build word macros, but it's not 100% effective, so we're looking to fix this issue on the source side of the fence. On some of our smaller projects, we've actually marked with the heading in the topic itself with the Online CBT and that seems to work. We're thinking of doing the same to our huge project unless there's a better way of handling this. Any suggestions?

See the tip immediately above this link. http://www.grainge.org/pages/authoring/printing/rh9_printing.htm#wizard_page3
The alternative is to remove the topic from the print layout so that it only generates by virtue of its link to the book.
See www.grainge.org for RoboHelp and Authoring tips
@petergrainge

Best way to remove Stateful session beans

Hi folks.
I'm running Weblogic 6.1. I'm trying to find the best way of removing
stateful session beans. I know to call EJBObject.remove() on the
client side, but this will not always happen if the client crashes or
times out. This is a java client application connection to weblogic,
no servlets are involved.
Is there a way to signal the appserver to remove all stateful session
beans associated with a user when the User logs out? I would rather
not remove them using a time out mechanism.
thanks.
rob.

But in the documentation and also based on my experience I noticed that the
timeout does not take effect till the max-beans-in-cache limit is reached.
How do you handle that?
"Thomas Christen" <[email protected]> wrote in message
news:3e35795d$[email protected]..
Hi,
Is there a way to signal the appserver to remove all stateful session
beans associated with a user when the User logs out? I would rather
not remove them using a time out mechanism.Had the same problem and solved it the following way :
- The client has thread polling its sessionbean at the server (every 30
Sec.)
- The session bean has a short timeout (2 Minutes)
If the client fails, the timeout will catch it otherwise the client will
gracefully call remove bevor exit.
Regards
Tomy

Best way to remove workstation?

Hi! when we move a computer to another room, we rename the computer, then do a WSREG -UNREG and WSREG, the new workstation object is created, but the old workstation object stay there.
If I delete the old object, the information stays on the sybase db!
So what is the best way to remove a workstation object?
I have a remove policy that work, but if I dont want to wait?
thank you,
Eric.

eric,
It appears that in the past few days you have not received a response to your
posting. That concerns us, and has triggered this automated reply.
Has your problem been resolved? If not, you might try one of the following options:
- Visit http://support.novell.com and search the knowledgebase and/or check all
the other self support options and support programs available.
- You could also try posting your message again. Make sure it is posted in the
correct newsgroup. (http://forums.novell.com)
Be sure to read the forum FAQ about what to expect in the way of responses:
http://support.novell.com/forums/faq_general.html
If this is a reply to a duplicate posting, please ignore and accept our apologies
and rest assured we will issue a stern reprimand to our posting bot.
Good luck!
Your Novell Product Support Forums Team
http://support.novell.com/forums/

Remove duplicates based on a condition

Hi all,
I need help on a query to remove duplicates based on a condition.
E.g. My table is
FE CC DATE FLAG
FE1 CC1 10/10 FB
FE1 CC1 9/10 FB
FE1 CC1 11/10 AB
FE1 CC2 9/10 AB
FE1 CC2 10/10 FB
FE1 CC2 11/10 AB
I want to remove all duplicate rows on FE and CC based on the below condition :
DATE <MAX(DATE) WHERE FLAG='FB'
That means I want to remove the row FE1 CC1 9/10 FB
but not the rows
FE1 CC1 10/10 FB
and
FE1 CC1 11/10 AB
as only the row FE1 CC1 9/10 FB has date <MAX(DATE) WHERE FLAG='FB'.
Similarly I want to keep
FE1 CC2 10/10 FB
FE1 CC2 11/10 AB
but not
FE1 CC2 9/10 AB
Many thanks.

Hi,
Do you want to DELETE rows from the table, or just not show some rows in the output? Since you're talking about a "query", rather that a "DELETE statement", I'll assume you want to leave those rows in the table, but not show them in the output.
Here's one way:
WITH     got_r_num     AS
     SELECT     fe, cc, dt, flag
     ,     RANK () OVER ( PARTITION BY fe_cc, flag
                      ORDER BY          dt       DESC
                    )          AS r_num
     FROM     table_x
SELECT     fe
,     cc
,     TO_CHAR (dt, 'fmMM/YY')          AS dt
,     flag
FROM     got_r_num
WHERE     flag     != 'FB'
OR     r_num     = 1
;if you'd care to post CREATE TABLE and INSERT statements for your sample data, then I could test it.
This assumes that the column you called DATE (which is not a good column name, so I called it dt) is a DATE, and that you are displaying it in MM/YY format.
This also assumes that dt and flag are never NULL.
If I guessed wrong about these things, then the query can be changed; it will just be a little messier.

Crooked stickers / decals on my new X220 - best way to remove ?

ok, is it just me or are other also extremely annoyed by the fact that they just shelled out over a grand for a notebook and it arrives with all stickers put on there crooked.
From the energy star sticker on the outside to the Windows 7 decal, the lenovo enhanced experience and the intel inside.
it is like buying a 911 and the 911 logo on the back is crooked !!!
why do they not just leave them off instead of making it look like a cheap notebook ???
(click i have this question too or reply here please)
and on a second note - what is the best way to remove said decals since i do not and cannot live with them like this and the soft touch surface makes it a pain in the butt to remove them.
blowdryer, razor blade and goo-gone ? (extremely careful to only use the blade to lift the sticker not touch the notebook)
and on a positive note - i am pretty amazed at the battery time i am getting on the new machine and the speed of the i7 and the 6GB of RAM. but i am still uninstalling crap and installing all the things i need
(see also Firefox 13.0 and password manager not working AGAIN thread).....
thanks in advance for your replies.
PS: yes, i am aware i am suffering from OCD - but still ....
someone is already getting paid to put them on there - why not do it right ?

if a chemical is needed to clean the residue, off-the-shelf rubbing alchohol is best. 'goo gone' is petrolium-based and can damage or discolor the plastic. i agree with lead_org though—they'll likely just pull off without much of a fight.
the decals are there because of regulations and contractual advertising. energy star devices require a decal stating compliance. intel and microsoft require decals as part of including their products with PCs. PC manufacturers could pay intel and microsoft extra to not include the decals but the costs would be passed onto the system price and consumers would complain about the increase.
given how easy it is to remove the decals, i pull them off when a system arrives and enjoy a clean palmrest.
ThinkStation C20
ThinkPad X1C · X220 · X60T · s30 · 600

Best way to remove last line-feed in text file

What is the best way to remove last line-feed in text file? (so that the last line of text is the last line, not a line-feed). The best I can come up with is: echo -n "$(cat file.txt)" > newfile.txt
(as echo -n will remove all trailing newline characters)

What is the best way to remove last line-feed in text file? (so that the last line of text is the last line, not a line-feed). The best I can come up with is: echo -n "$(cat file.txt)" > newfile.txt
(as echo -n will remove all trailing newline characters)
According to my experiments, you have removed all line terminators from the file, and replaced those between lines with a space.
That is to say, you have turned a multi-line file into one long line with no line terminator.
If that is what you want, and your files are not very big, then your echo statement might be all you need.
If you need to deal with larger files, you could try using the 'tr' command, and something like
tr '
' ' ' <file.txt >newfile.txt
The only problem with this is, it will most likely give you a trailing space, as the last newline is going to be converted to a space. If that is not acceptable, then something else will have to be arranged.
However, if you really want to maintain a multi-line file, but remove just the very last line terminator, that gets a bit more complicated. This might work for you:
perl -ne '
chomp;
print "
" if $n++ != 0;
print;
' file.txt >newfile.txt
You can use cat -e to see which lines have newlines, and you should see that the last line does not have a newline, but all the others still do.
I guess if you really did mean to remove all newline characters and replace them with a space, except for the last line, then a modification of the above perl script would do that:
perl -ne '
chomp;
print " " if $n++ != 0;
print;
' file.txt >newfile.txt
Am I even close to understanding what you are asking for?

How is the best way to remove something from a photo?

How is the best way to remove something from a photo?

This is difficult to answer without fully knowing what you are trying to do.
That said, a few excellent and user friendly retouching tools include: The Spot Heealing Brush Tool, Healing Brush Tool, Patch Tool, and the Cloning Stamp Tool.

I got some hair spray on my new retina display screen. What is the best way to remove.

I got some hair spray on my new Mac Book Pro Retina Display. Any thoughts on the best way to remove?

I would use this. I use it on my MBPs and it does an excellent job. I cannot say with authority that it will remove your hair spay residue.
Ciao.
http://www.soap.com/p/windex-for-electronics-aerosol-97299?site=CA&utm_source=Go ogle&utm_medium=cpc_S&utm_term=ASJ-294&utm_campaign=GoogleAW&CAWELAID=1323111033 &utm_content=pla&adtype=pla&cagpspn=pla&noappbanner=true
I clicked the reply button too early.
Message was edited by: OGELTHORPE

What is the best way to create shared variable for multiple PXI(Real-Time) to GUI PC?

What is the best way to create shared variable for multiple Real time (PXI) to GUI PC? I have 16 Nos of PXI system in network and 1 nos of GUI PC. I want to send command to all the PXI system with using single variable from GUI PC(Like Start Data acquisition, Stop data Acquisition) and I also want data from each PXI system to GUI PC display purpose. Can anybody suggest me best performance system configuration. Where to create variable?(Host PC or at individual PXI system).

Dear Ravens,
I want to control real-time application from host(Command from GUI PC to PXI).Host PC should have access to all 16 sets PXI's variable. During communication failure with PXI, Host will stop data display for particular station.
Ravens Fan wrote:
Either. For the best performance, you need to determine what that means. Is it more important for each PXI machine to have access to the shared variable, or for the host PC to have access to all 16 sets of variables? If you have slowdown or issue with the network communication, what kinds of problems would it cause for each machine?
You want to located the shared variable library on whatever machine is more critical. That is probably each PXI machine, but only you know your application.
Ravens Fan wrote:
Either. For the best performance, you need to determine what that means. Is it more important for each PXI machine to have access to the shared variable, or for the host PC to have access to all 16 sets of variables? If you have slowdown or issue with the network communication, what kinds of problems would it cause for each machine?
You want to located the shared variable library on whatever machine is more critical. That is probably each PXI machine, but only you know your application.

What's the best way to remove inactive iChat users from jabberd2.db?

I'm about to run Autobuddy for users on my iChat server. However, there are several users that are no longer around and I don't want their records showing up in everyone's buddy list.
What's the safest/best way to remove them?
My plan is to use sqlite3 on the command line and use SQL to remove the entries from the "active" table, but I don't know what impact that may have on the rest of the database.
Any thoughts or suggestions?

Never mind...
Thought I had looked through enough threads. Found the following just after posting my question:
/usr/bin/jabber_autobuddy -d [email protected]
Works like a charm.

Best way to remove duplicates based on multiple tables

Similar Messages

Maybe you are looking for