SQL Query to identify duplicates

Hi All,
I want to identify and display some duplicate data from the database.
With duplicates I mean rows in a table that contain identical information in a combination of fields (Last Name, First Name, City).
Example:
ID1     ID2     Last Name First Name City               Phone
1005 2010    Krieger      Jeff             San Ramon     9252997100
1012 2010    Krieger      Jeff             San Ramon     9252997100
1017 2010    Krieger      Jeff             San Ramon     9252997100
Now I want to select the IDs (ID1 and ID2) that identify each duplicate rows.
Therefore I want to create a query with a subquery inside, similar to this:
SELECT ID1 ID2 LastName FirstName
   FROM Customers
   INTO TABLE
   WHERE (LastName FirstName City) IN
   (SELECT LastName FirstName City
       FROM Customers
       GROUP BY LastName FirstName City
       HAVING COUNT(*) > 1)
   ORDER BY LastName FirstName.
How would be the syntax in ABAP, because I am not able to define more than one field after WHERE?
Any Ideas how to identify the duplicate data and return the ids?
Thanks for any help!

Is this something you need to do via code or can you use an analysis tool? TAANA & TAANA_AV allow you to analyze distribution spreads by table fields (you can pick your table and the fields). It's primarily used for archiving analysis but it's helpfuul in other scenarios as well. It's not optimal for finding single record dupes but I thought I'd throw it out there.

Similar Messages

Sql query to identify all the responsibilities attached to a form

(oracle - apps) Can anyone help me by giving a sql query to identify all the responsibilities attached to a form and corresponding menu should not be in the menu exclusion.
Thanks in advance
Venki

Bump

ABAP / Query to Identify Duplicate Rows in Cube

Dear Experts,
We have a situation were some of our Cubes (due to compression and varying levels of forceful reloads) now contain duplicate rows.
What I need to know is :-
1) Is there a way to identify duplicate rows where one of the characteristics are different but all key figures are identical.
2) If so what is easier to achieve, ABAP routine/program or Query
3) If ABAP suggestions on how to code such
4) If query same.
What I need it to do is tell me which ClaimNo record (Primary Key) has duplicates and what characteristic has caused it.
I know I am asking for a lot but I really need to get this resolved as it's causing mayhem and trying to pinpoint these records is both time consuming and painful. What we are looking to do with the records is establish how they became duplicated so we can prevent this happening in the future.
Your help as always much appreciated.
Regards
Craig
Message was edited by: Craig Armstead

Hi Craig,
My previous answer can find out what all cubes and data targets have been loaded based on a request.
Actually for your query. The following information will be surely useful.
tables: /BIC/**(source ) , /BIC**(target)
parameter : fieldname like /BIC/****-fieldname ( In ur case the it can be primary key or Duplicate entry )
data: itab_source like /BIC/*** occurs 0 with header line,
      itab_destination like /BIC/*** occurs 0 with header line.
data: wa_itab_destination like line of itab_destination.
select *
from /BIC/*****
into corresponding fields of table itab_source.
where fieldname = fieldname.
******Include your piece of code which is for deleting records
Delete adjacent duplicates from itab_source comparing characteristic ( i.e duplicate characteristic you specified)
****Use this to delete the ODS Data before writing into it
call function 'RSDRI_ODSO_DELETE_RFC'
exporting
    i_odsobject = 'ODS Name'
    i_delete_all = 'X'.
if sy-subrc = 0.
loop at itab_source.
    move-corresponding itab_source to itab_destination.
    append itab_destination.
endloop.
modify /BIC/*** from table itab_destination[].   target being written from itab.
commit work.
endif.
Please Reward points if this helps really.
Thanks,
Srinivas.

Need sql query to remove duplicates using UNION ALL clause

Hi,
I have a sql query which has UNION clause.But the UNION clause is causing some performance issues.
To overcome that I have used UNION ALL to improve performance but its returning duplicates.
Kindly anyone send a sample SQL query where my primary objective is used to use UNION ALL clause and to consider unique rows (elimating duplicate
ones)
Any help will be needful for me
Thanks and Regards

why not UNION? :(
another way also use MINUS
SQL>
SQL> with t as
2 (
3 select 1 if from dual union all
4 select 2 if from dual union all
5 select 1 if from dual union all
6 select 3 if from dual union all
7 select 3 if from dual
8 )
9 ,t2 as
10 (
11 select 1 if from dual union all
12 select 2 if from dual union all
13 select 3 if from dual union all
14 select 4 if from dual union all
15 select 5 if from dual
16 )
17 (select if from t
18 union all
19 select if from t2)
20 /
        IF
         1
         2
         1
         3
         3
         1
         2
         3
         4
         5
10 rows selected
SQL> so
SQL>
SQL> with t as
2 (
3 select 1 if from dual union all
4 select 2 if from dual union all
5 select 1 if from dual union all
6 select 3 if from dual union all
7 select 3 if from dual
8 )
9 ,t2 as
10 (
11 select 1 if from dual union all
12 select 2 if from dual union all
13 select 3 if from dual union all
14 select 4 if from dual union all
15 select 5 if from dual
16 )
17 (select if from t
18 union all
19 select if from t2)
20 minus
21 select -99 from dual
22 /
        IF
         1
         2
         3
         4
         5
SQL>

SQL query to pull duplicate employee suppliers

We have about 11,000 employee suppliers in Oracle 11.5.10.2. At least several thousand of them are duplicates; one set of employee supplier records is in this format; JOHN R DOE
And another set of records are in this format; Doe, Mr. John R.
I am trying to write a query that will have two columns for each row, one employee supplier name all upper case and the other the supplier name in the second format. I then want to create a dataloader file to use for vendor merge. I realize the query won't be perfect and get all the records, but if it can get the majority I can possibly handle the remainder manually.
I am not a coder and don't know how to write the query -- can anyone help?
Thanks in advance.

Wow this is bad database design!
For starters, your query would include the original column as well as a column using the UPPER function:
SELECT column_name, UPPER(column_name)
FROM table_name;As for identifying duplicates, you'll have to analyze the data to see what possible formats the column values are in. I would suggest using queries using DISTINCT or GROUP BY to see what your distinct values are. You can then use REGEXP functions to parse out the values.

Sql query to identify transports in dev but not in prod

I use sqlplus to select fields from saperp.e070v so I can compare transports in prod with those in qa. The purpose is to identify those transports that I need to run when I H-copy prod to QA. I'm using this where clause (TRFUNCTION='K' OR TRFUNCTION='W') . Should I look at TRSTATUS? I'd appreciate any recommendations or ideas. I do not have access to ABAP.

Thanks Markus.
Here are the distinct codes I found. Are there any ot these That I need to be concerned about? Is there a Key/Legend for these?
SQL> select distinct trstatus from saperp.e070v;
T
R
D
O
SQL> select distinct trfunction from saperp.e070v;
T
W
R
P
K
D
Q
M
T
X
F
S
T
G
12 rows selected.
SQL> describe saperp.e070v;
Name                                      Null?    Type
TRKORR                                    NOT NULL VARCHAR2(20)
TRFUNCTION                                NOT NULL VARCHAR2(1)
TRSTATUS                                  NOT NULL VARCHAR2(1)
TARSYSTEM                                 NOT NULL VARCHAR2(10)
AS4USER                                   NOT NULL VARCHAR2(12)
AS4DATE                                   NOT NULL VARCHAR2(8)
AS4TIME                                   NOT NULL VARCHAR2(6)
STRKORR                                   NOT NULL VARCHAR2(20)
LANGU                                     NOT NULL VARCHAR2(1)
AS4TEXT                                   NOT NULL VARCHAR2(60)
CLIENT                                    NOT NULL VARCHAR2(3)
SQL>
F
S
T
G
12 rows selected.
SQL> describe saperp.e070v;
Name                                      Null?    Type
TRKORR                                    NOT NULL VARCHAR2(20)
TRFUNCTION                                NOT NULL VARCHAR2(1)
TRSTATUS                                  NOT NULL VARCHAR2(1)
TARSYSTEM                                 NOT NULL VARCHAR2(10)
AS4USER                                   NOT NULL VARCHAR2(12)
AS4DATE                                   NOT NULL VARCHAR2(8)
AS4TIME                                   NOT NULL VARCHAR2(6)
STRKORR                                   NOT NULL VARCHAR2(20)
LANGU                                     NOT NULL VARCHAR2(1)
AS4TEXT                                   NOT NULL VARCHAR2(60)
CLIENT                                    NOT NULL VARCHAR2(3)
SQL>

Better SQL Query for clients without boundaries?

I've been using and modifying/experimenting with Chris Nackers' SQL query for missing boundaries (http://myitforum.com/myitforumwp/2011/12/07/sql-query-to-identify-missing-smsconfigmgr-boundaries/)
below (changed to add aliases)--but this seems to mainly be showing us non-clients, as several computers that were indeed missing boundaries (using SCCM 2007 SP2 R3, and all our boundaries are protected, most are IP Range, a few IP Subnet, none AD Site) are
not being listed, and everything in the listing has NULL SYS.Client0.
Is there a better query to pinpoint this issue, or maybe using something (error code or log?) that would show computers that can't find a distribution point or some other evidence of not having a boundary?
Thanks!
SELECT DISTINCT SYS.Name0, SYS.Client0, IPA.IP_Addresses0, IPS.IP_Subnets0, SMSAS.SMS_Assigned_Sites0
FROM dbo.v_R_System SYS
LEFT OUTER JOIN dbo.v_RA_System_IPSubnets IPS ON SYS.ResourceID = IPS.ResourceID
LEFT OUTER JOIN dbo.v_RA_System_IPAddresses IPA ON SYS.ResourceID = IPA.ResourceID
LEFT OUTER JOIN dbo.v_RA_System_SMSAssignedSites SMSAS ON SYS.ResourceID = SMSAS.ResourceID
LEFT OUTER JOIN dbo.v_RA_System_SystemOUName SOU ON SYS.ResourceID = SOU.ResourceID
WHERE (SMSAS.SMS_Assigned_Sites0 IS NULL)
AND (NOT (IPA.IP_Addresses0 IS NULL))
AND (NOT (IPS.IP_Subnets0 IS NULL))
AND SYS.Operating_System_Name_and0 LIKE 'microsoft%server%'
ORDER BY IPS.IP_Subnets0, SYS.Name0

I gotcha now... I think most people, myself included, rely on finding clients that are not assigned to determine if a boundary is missing. If you expect clients to not be assigned that's not going to work for you.
WHERE (SMSAS.SMS_Assigned_Sites0
IS NULL)
AND (NOT (IPA.IP_Addresses0
IS NULL))
AND (NOT
(IPS.IP_Subnets0 IS
NULL))
= This is saying show me all clients not assigned but they do have an IP address and they do have a subnet discovered.
In the case of CM12 it is actually possible for that not to work anyway because you can have separate boundaries for client assignment and content lookup.
I am not aware of any query that can compare the IP address, AD Site and IP subnet from each client to what's configured in boundaries and find machines that do not fall into any boundary.
John Marcum | http://myitforum.com/myitforumwp/author/johnmarcum/

T-SQL Query Duplicate dates.

A table is called ‘ImportEvents’ There are currently 50 records in this table. The user has the ability to enter the ‘Import Date’ manually and multiple imports could have happened on the same date. What SQL query would you use to identify
a list of all the dates on which an upload has occured without duplicates?

If I want the number of Imported dates that don't have duplicate values how could I aggregate the query ?
In sumo -
50 records say 18 records where created on one date and 32 on 32 different dates how can I retrieve the list of 33 records based on the Imported Date.
DECLARE @ImportEvents TABLE (importDate date)
INSERT INTO @ImportEvents (importDate)
VALUES ('2014-01-01'),('2014-01-01'),('2014-01-01'),('2014-01-01'),('2014-01-01'),('2014-01-01'),('2014-01-01'),('2014-01-01'),('2014-01-01'),
('2014-01-02'),('2014-01-03'),('2014-01-04'),('2014-01-05'),('2014-01-06'),
('2014-01-07'),('2014-01-07'),('2014-01-07'),('2014-01-07'),('2014-01-07'),('2014-01-07'),('2014-01-07'),('2014-01-07'),('2014-01-07')
select importDate
From @ImportEvents
group by importDate
having sum(importDate) = 1
Thanks
Richard

SQL query needed to identify cancelled invoice where distribution lines

SQL query needed to identify cancelled invoice where distribution lines Debit is not equal Credit line item in particular
Is there a way from back end FROM ap_invoice_distributions_all where we can find for the cancelled invoice where distribution lines Debit is not equal Credit line item
Regards,
Prakash Ranjan

Hello Prakash
Can you please see if this query helps you?
SELECT i.invoice_id, i.invoice_amount, nvl(sum(d.amount),0)
FROM ap_invoice_distributions_all d, ap_invoices_all i
WHERE i.org_id = <you org_id>
AND i.invoice_id = d.invoice_id
AND d.line_type_lookup_code not in ('PREPAY')
AND i.cancelled_date IS NOT NULL
GROUP BY i.invoice_id, i.invoice_amount
HAVING (i.invoice_amount <> nvl(sum(d.amount),0))
ORDER BY i.invoice_id asc
Octavio

Query to find duplicate sql executed

What is the sql to find the duplicate sql executed and count of executions.
I need the query ( though we can get directly the results from OEM)
Please let me know.
Thanks
Naveen

>
What is the sql to find the duplicate sql executed and count of executions.
I need the query ( though we can get directly the results from OEM)Get to know V$SQL, V$SQL_AREA and V$SQL_TEXT.
Check out Christoper Lawson's Oracle performance tuning book - it's
very good on the basics of this subject.
HTH.
Paul...
Naveen--
When asking database related questions, please give other posters
some clues, like OS (with version), version of Oracle being used and DDL.
Other trivia such as CPU, RAM + Disk configuration might also be useful.
The exact text and/or number of error messages is useful (!= "it didn't work!"). Thanks.
Furthermore, as a courtesy to those who spend time analysing and attempting to help,
please do not top post and do try to trim your replies!

Please advise what is the query to identify a SQL executed time

Hi all
Please advise what is the query to identify a SQL executed time.
eg, a DML executed at 16-Apr-2013 11:45hrs

Try like..
select LAST_LOAD_TIME, ELAPSED_TIME, MODULE, SQL_TEXT elasped from v$sql order by LAST_LOAD_TIME desc

Query to find duplicate datas

Hi,
can any one help me the query to find the duplicate data from a column.

maybe this example might be of some help.
SQL> select * from employees;
YEAR EM NAME       PO
2001 04 Sarah      02
2001 05 Susie      06
2001 02 Scott      91
2001 02 Scott      01
2001 02 Scott      07
2001 03 Tom        81
2001 03 Tom        84
2001 03 Tom        87
8 rows selected.
SQL> -- based on the output above we know that there is duplicates on scott and tom
SQL> -- now we need to identified how many are duplicates by grouping into year, empcode, and name
SQL> select year, empcode, name, position,
2         row_number() over (partition by year, empcode, name
3                            order by year, empcode, name, position) as rn,
4         count(*) over (partition by year, empcode, name) as cnt
5   from employees;
YEAR EM NAME       PO         RN        CNT
2001 02 Scott      01          1          3
2001 02 Scott      07          2          3
2001 02 Scott      91          3          3
2001 03 Tom        81          1          3
2001 03 Tom        84          2          3
2001 03 Tom        87          3          3
2001 04 Sarah      02          1          1
2001 05 Susie      06          1          1
8 rows selected.
SQL> -- we have identified the duplicates on the above outputs by the counts
SQL> -- now we want to query only rows that has duplicates
SQL> select emp.year, emp.empcode, emp.name, emp.position, emp.cnt
2    from (select year, empcode, name, position,
3                 row_number() over (partition by year, empcode, name
4                                    order by year, empcode, name, position) as rn,
5                 count(*) over (partition by year, empcode, name) as cnt
6            from employees) emp
7   where rn = 1
8     and cnt > 1;
YEAR EM NAME       PO        CNT
2001 02 Scott      01          3
2001 03 Tom        81          3
SQL>

How To Write A Sql Query to Show 0's

I never can remember this :-/ What do I have to do to get a SQL Query to return 0 if the Count of a condition is 0? For example lets say Joe, Jay, Jim could log on, but only Joe & Jim logged in Jay would not be returned in the query below. What
would I need to set-up so that All 3 users would be returned
Select logonName, Count(Logon)
From logonUserInfo
where logonName is not nullGROUP BY logonName

Currently there is only 1 table involved. I would need a separate table? What would this table need to hold?
You wouldn't need a separate table unless and until you want to store who logged in at what time and so on... then your table structures looks like below..
CREATE TABLE loginnames
loginid INT
name VARCHAR(30),
GO
CREATE TABLE loginTimes
loginid INT,
logintime DATETIME
GO
So, now you can do a left join on both tables on a certain date directly and find out who actually logined and who didn't with a count 0 for the one's who didn't login... and that will abide the principles of normalization as well...
If you don't want to use an another table, you would have to duplicate the loginnames everytime that user logins and also you don't have a track on what date the login happened.. So you will be considering from day 1 to Till date...
Please mark as answer, if this has helped you solve the issue.
Good Luck :) .. visit www.sqlsaga.com for more t-sql code snippets and BI related how to articles.

How to measure the performance of sql query?

Hi Experts,
How to measure the performance, efficiency and cpu cost of a sql query?
What are all the measures available for an sql query?
How to identify i am writing optimal query?
I am using Oracle 9i...
It ll be useful for me to write efficient query....
Thanks & Regards

psram wrote:
Hi Experts,
How to measure the performance, efficiency and cpu cost of a sql query?
What are all the measures available for an sql query?
How to identify i am writing optimal query?
I am using Oracle 9i... You might want to start with a feature of SQL*Plus: The AUTOTRACE (TRACEONLY) option which executes your statement, fetches all records (if there is something to fetch) and shows you some basic statistics information, which include the number of logical I/Os performed, number of sorts etc.
This gives you an indication of the effectiveness of your statement, so that can check how many logical I/Os (and physical reads) had to be performed.
Note however that there are more things to consider, as you've already mentioned: The CPU bit is not included in these statistics, and the work performed by SQL workareas (e.g. by hash joins) is also credited only very limited (number of sorts), but e.g. it doesn't cover any writes to temporary segments due to sort or hash operations spilling to disk etc.
You can use the following approach to get a deeper understanding of the operations performed by each row source:
alter session set statistics_level=all;
alter session set timed_statistics = true;
select /* findme */ ... <your query here>
SELECT
         SUBSTR(LPAD(' ',DEPTH - 1)||OPERATION||' '||OBJECT_NAME,1,40) OPERATION,
         OBJECT_NAME,
         CARDINALITY,
         LAST_OUTPUT_ROWS,
         LAST_CR_BUFFER_GETS,
         LAST_DISK_READS,
         LAST_DISK_WRITES,
FROM     V$SQL_PLAN_STATISTICS_ALL P,
         (SELECT *
          FROM   (SELECT   *
                  FROM     V$SQL
                  WHERE    SQL_TEXT LIKE '%findme%'
                           AND SQL_TEXT NOT LIKE '%V$SQL%'
                           AND PARSING_USER_ID = SYS_CONTEXT('USERENV','CURRENT_USERID')
                  ORDER BY LAST_LOAD_TIME DESC)
          WHERE ROWNUM < 2) S
WHERE    S.HASH_VALUE = P.HASH_VALUE
         AND S.CHILD_NUMBER = P.CHILD_NUMBER
ORDER BY ID
/Check the V$SQL_PLAN_STATISTICS_ALL view for more statistics available. In 10g there is a convenient function DBMS_XPLAN.DISPLAY_CURSOR which can show this information with a single call, but in 9i you need to do it yourself.
Note that "statistics_level=all" adds a significant overhead to the processing, so use with care and only when required:
http://jonathanlewis.wordpress.com/2007/11/25/gather_plan_statistics/
http://jonathanlewis.wordpress.com/2007/04/26/heisenberg/
Regards,
Randolf
Oracle related stuff blog:
http://oracle-randolf.blogspot.com/
SQLTools++ for Oracle (Open source Oracle GUI for Windows):
http://www.sqltools-plusplus.org:7676/
http://sourceforge.net/projects/sqlt-pp/

SQL query was failed in my report after migrating to Oracle 10gR2

We have a aplication running under Oracle 9.2.0.4 DB. we migrated our DB to Oracle 10gR2. While running the report, we got an error:
ORA-00904: "A3"."FACILITY_SYSTEM_ID": invalid identifier
ORA-06512: at "CEAS_MK_RPT.GET_LIST", line 12
The SQL query which causes this error is:
INSERT INTO gtt_facility_seq_generator
(date_range)
SELECT get_list
(CURSOR (SELECT auf.facility_system_id
FROM authorized_facility auf
WHERE auf.facility_system_id = afs.facility_system_id
AND auf.credit_application_system_id = :p_cred_appln_id
FROM authorized_facility afs,
(SELECT af.facility_system_id facility_system_id
FROM facility_obligor fo, authorized_facility af, party p
WHERE fo.facility_system_id = af.facility_system_id
AND fo.party_system_id = p.party_system_id
AND NOT EXISTS (
SELECT NULL
FROM facility_third_party_subst iftps
WHERE iftps.facility_system_id =
fo.facility_system_id)
) tab1
WHERE afs.credit_application_system_id = :p_cred_appln_id
AND tab1.facility_system_id = afs.facility_system_id
ORDER BY afs.creation_date;
The content of function get_list() is:
CREATE OR REPLACE FUNCTION CEAS_MK_RPT.Get_List
p_cursor IN sys_refcursor
RETURN VARCHAR2
IS
l_sep VARCHAR2(4);
l_text VARCHAR2(30000);
l_text_return VARCHAR2(30000);
BEGIN
LOOP
FETCH p_cursor INTO l_text;
EXIT WHEN p_cursor%NOTFOUND;
l_text_return := l_text_return || l_sep || l_text;
l_sep := CHR(10);
END LOOP;
CLOSE p_cursor;
RETURN l_text_return;
dbms_output.put_line ('a');
END Get_List;
The same report was executed perfectly in Oracle 9i. Kindly help us to short out this issue. Thanks in advance.
Regards,
Sengol S

Hi Nirav,
Many thanks for your response. I verified the same, but the same query was executed successfully after removing the INSERT INTO part(executed the SELECT part only). What might be the cause for this error?
For your kind information :
I have 2 schema in my DB( ceas_mk_app and ceas_mk_rpt ).
All the source tables are present in ceas_mk_app schema - These tables are used by ceas_mk_rpt schema(using synonym) to generate the report.
I'm executing the above said query from ceas_mk_rpt to insert the necessary data in to report temp. table.
Thanks,
Sengol S
Edited by: seng1256 on Dec 22, 2008 12:31 PM

SQL Query to identify duplicates

Similar Messages

Maybe you are looking for