Fastest way to find duplicates

10.2.0.5
I would be intersted if anything was added in 11g.
I have a 148 gb table that is not partitioned and does not have a unique index. I am not allowed to add an index to see if there are duplicates (I know how to do this to get the row ids).
Group by generates too much temp even if I increase hash and sort area size.
I can try parallel, but this does not seem to help much if the table is not partitioned.
Is there anythign better than select count(distinct fiields).
Is an analytic better?
anything new in 11g that is better?

What I used to do when I had more data than Oracle could handle was something like this in korn shell:
echo '
username/password
set linesize 200
set pages 0
set heading off
set termout off
set verify off
set echo off
set feedback off
-- any other sets needed
select * from bigtable
/'|sqlplus -s|grep -v ^$|sort > j$$1
echo '
username/password
set linesize 200
set pages 0
set heading off
set termout off
set verify off
set echo off
set feedback off
-- any other sets needed
select * from bigtable
/'|sqlplus -s|grep -v ^$|sort -u> j$$2
diff j$$1 j$$2
rm j$$*I think I may have fed these into pipes and diff'd the pipes to avoid temp files, but I don't remember, it was like 10-20 years ago. There might be some way to tee the sqlplus output to two pipes which then do the differential sorts and feed pipes to diff, but I never tried that.
Edit: I meant select unique identifier, not select *, of course.
Edited by: jgarry on Jun 30, 2011 8:47 AM

Similar Messages

Best way to find duplicates

hello everyone. i'm trying to free up space on my hard drive. what is the easiest way to find and delete duplicate photos and files. specifically in iphoto. thank you.

You can use one of these applications to identify and remove duplicate photos from an iPhoto Library:
iPhoto Library Manager - $29.95
Duplicate Cleaner for iPhoto - free
Duplicate Annihilator - $7.95 - only app able to detect duplicate thumbnail files or faces files when an iPhoto 8 or earlier library has been imported into another.
PhotoSweeper - $9.95 - This app can search by comparing the image's bitmaps or histograms thus finding duplicate images with different file names and dates.
DeCloner - $19.95 - can find dupicates in iPhoto Libraries or in folders on the HD.
OT

Quickest way to find duplicate photos on mac?

So I'm trying to clean up my macbook and I have a ton of duplicate photos. It would take me days to go through all my photos manually, so I am wondering if there is a way to find all duplicates? Iphoto really messes up the ability to search for a photo (i have tried many times to search my mac for a photo that is in iphoto, and nothing comes up, why?!). So i have no way of knowing which of my photos in iphoto are already saved in other photo folders i have on my computer.
Is there software that can make this process easier?

Use a third party software such as dupeGuru:
http://www.hardcoded.net/dupeguru/
Ciao.

Best Way To Find Duplicate Songs in ITunes

I have 3000+ songs in ITunes. I know I have a few duplicates. What is the best way to find them? Just sort by name and look? Or is there a simpler way?
Thanks.

Oh crap <insert egg on face here>
I am so used to Windows MEdia Player, that I must have just skimmed over the menus.
Thanks!

Best way to find duplicate pages / find duplicate background images

What is the best way to detect duplicate pages?
The pages I am dealing with are searchable image (scanned Image background with selectable text overtop). In this case, Any two pages that have the exact same background image will be duplicate.
I only know how to get page text though, so I've been getting the text and hashing it, then checking for duplicate hashes. This works for the most part, but I fear running into two different pages with the exact same text.
What about looking at the background image? If a PDF has multiple pages with the same background image, I assume it would store the image once and then just reference it from the pages? Is it possible to check duplicate pages this way?
Or Does Acrobat have a built-in checking solution I haven't discovered? As always, any help is appreciated

Ok, well for the most part doing it by text works, but it sometimes flags things that arn't duplicate: such as two of the same worksheets that were not filled out will have the exact same text, despite being completely different pages

Fastest way to copy/duplicate a table?

I want to copy a Oracle 8.0.5 DB table to another table but with different name. What is the fastest way to do this? I currently use INSERT INTO table_name AS (SELECT xxxxxxxxx).

A CTAS (create table as select) with no logging and in parallel (in you have the additional prcessors) would typically be the fastest
Dom

What is the fastest way to delete duplicate songs?

I will spend hours deleting songs from the duplicate veiw unless someone has a faster way to delete duplicates.

Import the photos with your computer as with any other digitial camera. Most computer photo importing apps include an option to delete the photos from the camera after the import process is complete.
Or select Edit followed by selecting multiple photos that you want to delete followed by selecting Delete.

Fastest way to find parents with only one child?

I have two very large tables (both >6 million rows) in an oracle 8i DB. They have a parent child relationship and I would like to construct a query to give me the parents that have only one child......syntactically, what's the best way to construct this query?
I was going to try:
select join_id
FROM parent
where join_id in (select join_id, count(join_id)
FROM child
group by join_id
having count(*)=1)
but then I realized that the subselect has two columns to return and the primary query is only expecting one column, so this will not work.
I suspect there's a quick and dirty way to find out what parents have only one child....
I thought of using rowid's but am not sure this is the best way and in the example below I tried, I'm having problems b/c of functions not being allowed in the where clause.....
select join_id
from child d
where rowid in (select min(rowid)
          FROM child s
          WHERE min(d.rowid)=max(s.rowid)
          AND s.join_id=d.join_id))
Any thoughts?

The two tables are order_header and order_detail. The order_header carries order specific information and the detail contains item specific information.
So if you had ordered three separate products, you would have:
one row in the order_header table (parent)
and three rows in the order_detail table (child)
They are linked by order_number.
I presented the problem this way to make it more accessible to more posters.....;)
One possible solution that I've thought of for my problem is this:
select join_id
from child_table d
where (d.rowid, d.rowid) IN (select min(rowid) MIN_ROW
               ,max(rowid) MAX_ROW
               FROM child_table
               WHERE s.str_order_number=d.str_order_number
               AND s.date>='30-JAN-2005'
               AND s.date<='31-JAN-2005'))
I think this might work because I think that we can safely assume that if the minimum rowid and the maximum rowid (with the same join_id ) are the same then there is only one child record.....is this logic correct?

What is the fastest way to find a Node in an xml ?

Hi all,
We have a procedure which has to process huge XML documents (files of up to 250Mega).
Without going too deep in the algorithm, I need to read data from legacy and build nodes in XML.
In the legacy data I have written in tabular form the xml hierarchy
For example Root-Branch1-SubBranch2-Leaf1
I need the fastest possible algorithm to point to SubBranch2, so that I can append Leaf1 Node, in the example.
Can anybody suggest me the right approach ? for example I wonder if I can use together SAX for finding the parent node and DOM for appending new Nodes ? or can I use XPath ?
Any suggestion is really welcome!
Thanks
Francesco

Not sure if XPath is built on top of DOM or SAX in most implementations, but that would be a major decider. If DOM, you cannot possible hope to ready that many megabytes into memory on a consistent basis (especially if your system is handling other requests). If SAX, then memory should not be as much of an issue and may be a viable option.
However, if you are only descending two or three levels into the hierarchy of a document, XPath would really only be a 'convenience' feature. It would be trivial for you to write your own SAX parser to descend to this level. At the same time, you would have a custom XML serializer (just outputing the strings read in from SAX) that would append additional markup according to your requirements.
Due to the size of your document, memory could be an issue. This will tend to make you gravitate towards SAX.
- Saish

PowerShell - what is the most efficient/fastest way to find an object in an arraylist

Hi
I Work with a lot of array lists in PowerShell when working as a sharepoint administrator. I first used arrays but found them slow and jumped over to array lists.
Often i want to find a specific object in the array list, but the respons-time varies a lot. Does anyone have code for doing this the most efficient way?
Hope for some answers:-)
brgs
Bjorn

Often i want to find a specific object in the array list, but the respons-time varies a lot. Does anyone have code for doing this the most efficient way?
As you decided to use an ArrayList, you must keep your collection sorted, and then use the method BinarySearch() to find the objects your looking for.
Consider using a dictionary, and if your objects are string type, then a StringDictionary.
You stil fail to understand that he slowness is no in the arraylist. It is in the creating of the arraylist which is completely unnecessary. Set up a SharePoint servefr and create a very large list and test.. You will see. An arraylist
with 10000 items takes forever to create. A simple SharePoint search can be done in a few milliseconds.
Once created the lookup in any collection is dependent on the structure of the key and the type of collection. A string key can be slow if it is a long key.
The same rules apply to general database searches against an index.
The main point here is that SharePoint IS a database and searching it as a database is the fastesst method.
Prove me wrong devil! Submit! Back to your cage! Fie on thee!
¯\_(ツ)_/¯
You seem to be making a lot of assumptions about what he's doing with those arraylists that doesn't seem justified based on no more information than there is in the posted question.
[string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

Is there a way to find duplicate photos in iPhoto?

I'm sure I have plenty but not sure if this application or some other can accomplish this.
Ideas?

For dealing with duplicates in iPhoto check out Duplicate Annihilator

Fastest Way To Find String In Map?

problem: basically im trying to match the value of one Map with the key of another. my thinking is that this requires me to loop through the first Map and for each value through the second Map and match up the key to it. heres my code so far:
private static Map equiJoin(Map relation1, Map relation2) // The 2 maps
Map resultRelation = new TreeMap();
Iterator keys1Iterator = relation1.keySet().iterator();
Iterator values1Iterator = relation1.values().iterator();
while (keys1Iterator.hasNext() ) // Loop through relation1
String primaryKey1 = (String) keys1Iterator.next();
String foreignKey1 = (String) values1Iterator.next();
List list = new ArrayList();
Iterator keys2Iterator = relation2.keySet().iterator();
Iterator values2Iterator = relation2.values().iterator();
while (keys2Iterator.hasNext() ) // Loop through relation2
String primaryKey2 = (String) keys2Iterator.next();
String foreignKey2 = (String) values2Iterator.next();
if(primaryKey2.equals(foreignKey1))
list.add(foreignKey1);
list.add(foreignKey2);
resultRelation.put(primaryKey1, list);
return resultRelation;
im trying to come up with ways to make this faster, perhaps by turing the second while loop/linear search into a binary search.
any thoughts would be appreciated.

problem: basically im trying to match the value of one
Map with the key of another. my thinking is that this
requires me to loop through the first Map and for each
value through the second Map and match up the key to
it. heres my code so far:
private static Map equiJoin(Map relation1, Map
relation2) // The 2 maps
Map resultRelation = new TreeMap();
Iterator keys1Iterator =
relation1.keySet().iterator();
Iterator values1Iterator =
relation1.values().iterator();
while (keys1Iterator.hasNext() ) // Loop through
relation1
String primaryKey1 = (String) keys1Iterator.next();
String foreignKey1 = (String) values1Iterator.next();
List list = new ArrayList();
Iterator keys2Iterator =
relation2.keySet().iterator();
Iterator values2Iterator =
relation2.values().iterator();
while (keys2Iterator.hasNext() ) // Loop through
relation2
String primaryKey2 = (String) keys2Iterator.next();
String foreignKey2 = (String) values2Iterator.next();
if(primaryKey2.equals(foreignKey1))
list.add(foreignKey1);
list.add(foreignKey2);
resultRelation.put(primaryKey1, list);
return resultRelation;
im trying to come up with ways to make this faster,
perhaps by turing the second while loop/linear search
into a binary search.
any thoughts would be appreciated.Yep. The inner loop is not needed. Just use Map api.
Object value2 = relation2.get(primaryKey1);
// and check for null for value2, if null don't add to the list, if not null, do a resultRelation.put
btw, you list add logic:
if(primaryKey2.equals(foreignKey1))
list.add(foreignKey1);
list.add(foreignKey2);
resultRelation.put(primaryKey1, list);
} why do you add both foreignKey1 and foreignKey2 to the list when they are equals. Since they are equals, are they redundent?
--lichu

How can i find duplicates

Is there an automated way to find duplicate photos? My wife and I have thousands of pics and (maybe because I am ignorant) I can find any easy way to simply identify and delete duplicate photos and video clips. HELP Thank you.

duplicate annihalitor
LN

Find duplicates in aperture

Is there a way to find duplicates/triplicates of the exact same file, especially the master, in Aperture (and iPhoto and iMovie)?
I'm finding that the longer I have Aperture running (referenced files), seems the more duplicates I am having show up.
Ideas? Solutions?
Thanks so much
Robert

thanks for the replies.
I recently switched from a G5 to th newest macbook pro (circa last fall). I decided that the only option that would keep me running was storage and lots of it, since I dabble somewhat seriously in photos and videos over the last few years.
I originally started with iPhoto, small digital camera on a simple G4 circa 2003 ish. Then I went to a used dual G5 around 2008 ish when I realized I was maxing out the old machine. But there was no going back. I like the interface, products, etc. I purchased Final Cut Express somewhere along the way, eventually got Aperture 2, and that really rocked my world. Put out some amaturish videos, but the family and I had so much fun. Relatives lined up for me to make videos...you probably know how that goes. Had to eventually put a stop to that, way too time intensive.
Cancer hit in 2008, spent 8 months in the hospital and am a different man now. I am alive, grateful to be, but deal with a body that is physically tired and has some ongoing medical challenges.
That leads me to the current.
My oldest daughter graduated from high school in 2011. I purchased her a 15" macbook pro. I did that thinking that I knew a new system was in the near future for me as the IBM chipset was being phased out. I saw the future. Also, the G5 dual was slowing down and it had some odd things about eSata and FW800 that just didn't make for smooth sailing with external hard drives.
I found out at first. I loved having a laptop. But I could not go without massive amounts of expandable storage. Just not an option. But I did discover that FW800 was reliable on her little laptop and that sort of opened doors for me to deal with storage issues.
By now, photos spread between iPhoto, Aperture 2, FCX Videos, and her laptop.
Well, graduation came and went and I had to relent on her laptop. I had just purchased Aperture 3 and was stunned at the amount of upgrades.
That left me with an Ap3 library on her computer, too.
Can y'all sort of get the drift where I'm going?
I now have a G4 (still runs like a charm), a G5, I pulled all her stuff off her MBP and stored it on a new hard drive FW800.
So what does a man who is insane do?
I purchased the latest everything, spared no expense. Ugh, the pocket book hurts. Bad.
But I have to say, Thunderbolt is everything it is cracked up to be. I have been stunned at the speed. I got the 12 TB station primarily as I don't want to keep purchasing this, that, and the other. I want to basically start taking the last 9-10 years of pictures, videos, files, and find a way to organize, clean up, consolidate, and so forth.
Oh, being a redundancy FREAK, I did purchase a drobo and buffalo as random duplicate TM backups and other file backups, nothing more. The Drobo does do hot swapping and such well, but it is S-L-O-W. Painfully so.
The buffalo I had some problems with. Technical support was super and replaced the unit that didn't function. The newer model was rather speedy through ethernet. Not bad, so I put that thing to regular use for redundancy.
I now have files spread out among various computers, large NAS externals, and multiple individual drives (FW800, USB, etc).
Oh, I purchased the thunderbolt display too. That surprised me. Cut down on the number of wires and helped organize the physical lay out.
As photography and such went from physical to digital, so did my camera and video. I went through a few video cameras, , digital cameras, and so forth as things evolved and I tried various formats. That leaves me with various digital formats, including 3G pics, newer iphone 4 and 4s pics/videos.
As all this upgraded, so did iLife and ways to manage data.
Not being a pro, I don't really know about "workflow". It is all fun for me. I did try me hand for a short while last year at a formal photography business, but found out being good at photoshopping pics and fairly decent at taking pics does not mean I would enjoy the "business" of photography. But what does one do at that point? Upgrade to a canon 7d and get a few nice lenses.
My logic was hey, I need this infusion of IViG every 28 days to sustain my life/immune system (chemo destroyed my bone marrow) and this costs about $13,000 every 28 days. Insurance covers most, but bottom line, we were sudden spending about $10,000 a year cash out of pocket. I shut the business down a few weeks ago as I physically am too tired and can't stay focused. Chemo brain is a true phenomenon I have come to find out.
So, after writing this small book to say that I am somehow trying to take the new way of doing things, fast external drive(s) with a laptop and TB display, and now try to clean up and consolidate this mess.
There is redundandcy, a lot of lack of knowledge on my part, and the recent purchases have now allowed me to start trying to get this entire pile of digital spread out jewels of memories into a more cohesive, organized, consolidated system.
Since I had to start referencing Ap3 files, that helped with the storage on the laptop drive. But it caused another problem: I can't see the referenced movie files on iMovie. If I can, I don't know how.
So I am now beginning the cumbersome slow project of pulling everything under the umbrella of iMovie that is movie related, then pulling everything that is a photographic image under the umbrella of Ap3. I had to consolidate various iphoto libraries (found a decent app for that), and so forth.
Yet, as I'm doing all this, I get this feeling that I may not need another set up for a good while. Point being, why not slow down this spurt of energy to organize and consolidate and THINK about it a bit. Ha. right? I mean, I'm doing what I have always done, which is evolve over time and just sort of grow larger and more cumbersome.
When this terribly original idea hit me a few weeks ago, I didn't initially think much about it except "get on it."
Now I'm starting to back up and say to myself, "what if I'm just taking a big conglomeration of so many spread out files, duplicating as I copy, and am not really making for an efficient workflow?"
Workflow is my new buzzword, my new thought. New to me. Quit laughing all you peeps who have it down. Some of us aren't that organized from the front side and look back 20/20 and say how in the world did I get here?
So now I'm reaching out, probably doing the smartest thing I've ever done, and trying to reach out to you all. When someone asked about about workflow, I thought "what does that mean?"
I think I am starting to get what it means, but could use some sharing of ideas/mentoring here. I'm guessing workflow means exactly what I am living: how to organize, move around, handle, and deal with all this data. Am I right on this?
I used to take it to mean, how did I have projects in Ap3 set up. Maybe that is a part of it, but I think (emphasis on think) that I am starting to realize workflow might just have a bigger meaning and application. In other words, how do I plan on organizing and dealing with ALL this digital data.
One example from this morning and I'll stop for the moment to allow for some great feedback. Since I couldn't get iMovie to see the referenced movie files in Ap3 (have preview sharing turned on), I decided that I wouldn't use Ap3 as my movie manager. So I take 420 varying kinds of files from iphones to the canon 7d and export them to my desktop.
Then I started thinking, I had better write all this out and reach out for some help before I just duplicate files and create another digital mess.
I was going to import them into iMovie. Seemed simple. It is, but many are duplicates. Then I thought I would go through and start naming, finding duplicates, blah blah. Talk about time consuming.
I'm willing to do all that if the concensus is that this is the best overall approach.
Someone mentioned the new Final Cut. I've held out, but might consider going that route for managing all movie files.
So here I am with tons of digital everything and I sure could use some wisdom on "workflow", "management," and basically setting myself up for the best pragmatic leveraging of the Apple set up for my intermediate future.
If you've read all this and have remained interested, you are an amazing person. If you have ideas and feedback, I will listen carefully. I think it is relatively clear that I am willing to purchase or do just about whatever needs to be done, I've just sort of fumbled my way to this point. Basically, I feel like a privileged kid dealing with pro equipment and not really utilizing it to its capacity, not even close.
I'll stop for now and will be back on in a few days to read the feedback and respond. I do appreciate your time and energy and feedback.
Thanks,
Robert

Find duplicates using EXIF data

Is there any way to find duplicates of images in iPhoto or just on my desktop using the EXIF data on the images.
I have a folder of around 4000 images that i recovered (after my hard drive went down) and now i'm left with the terrible task of trying to sort all of these images out. I know there are duplicates in the folder but I don't fancy looking through every one of 4000 images to find them!
Thanks in advance for anybodys help.

Alan:
Yes. Give Duplicate Annihilator a try. It's compatible with iPhoto 6. You do not want to remove any apparent duplicates via the Finder (Don't tamper with files in the iPhoto Library folder from the Finder). Do all culling from within iPhoto itself or with DA.

Fastest way to find duplicates

Similar Messages

Maybe you are looking for