SQL: How to remove duplicate rows from a table in SQL Server

How to remove duplicate rows from a table in SQL Server

http://support.microsoft.com/kb/139444

For this example, we will use the following table with duplicate PK values. In this table the primary key is the two columns (col1, col2). We cannot create a unique index or PRIMARY KEY constraint since two rows have duplicate PKs. This procedure illustrates how to identify and remove the duplicates.

create table t1(col1 int, col2 int, col3 char(50))

insert into t1 values (1, 1, ‘data value one’)

insert into t1 values (1, 1, ‘data value one’)

insert into t1 values (1, 2, ‘data value two’)

The first step is to identify which rows have duplicate primary key values:

SELECT col1, col2, count(*)

FROM t1

GROUP BY col1, col2

HAVING count(*) > 1

This will return one row for each set of duplicate PK values in the table. The last column in this result is the number of duplicates for the particular PK value.

Collapse this tableExpand this table

col1 col2
1 1 2

If there are only a few sets of duplicate PK values, the best procedure is to delete these manually on an individual basis. For example:

set rowcount 1

delete from t1

where col1=1 and col2=1

The rowcount value should be n-1 the number of duplicates for a given key value. In this example, there are 2 duplicates so rowcount is set to 1. The col1/col2 values are taken from the above GROUP BY query result. If the GROUP BY query returns multiple rows, the "set rowcount" query will have to be run once for each of these rows. Each time it is run, set rowcount to n-1 the number of duplicates of the particular PK value.

Before deleting the rows, you should verify that the entire row is duplicate. While unlikely, it is possible that the PK values are duplicate, yet the row as a whole is not. An example of this would be a table with Social Security Number as the primary key, and having two different people (or rows) with the same number, each having unique attributes. In such a case whatever malfunction caused the duplicate key may have also caused valid unique data to be placed in the row. This data should copied out and preserved for study and possible reconciliation prior to deleting the data.

If there are many distinct sets of duplicate PK values in the table, it may be too time-consuming to remove them individually. In this case the following procedure can be used:

1. First, run the above GROUP BY query to determine how many sets of duplicate PK values exist, and the count of duplicates for each set.

2. Select the duplicate key values into a holding table. For example:

3. SELECT col1, col2, col3=count(*)

4. INTO holdkey

5. FROM t1

6. GROUP BY col1, col2

7. HAVING count(*) > 1

8. Select the duplicate rows into a holding table, eliminating duplicates in the process. For example:

9. SELECT DISTINCT t1.*

10. INTO holddups

11. FROM t1, holdkey

12. WHERE t1.col1 = holdkey.col1

13. AND t1.col2 = holdkey.col2

14. At this point, the holddups table should have unique PKs, however, this will not be the case if t1 had duplicate PKs, yet unique rows (as in the SSN example above). Verify that each key in holddups is unique, and that you do not have duplicate keys, yet unique rows. If so, you must stop here and reconcile which of the rows you wish to keep for a given duplicate key value. For example, the query:

15. SELECT col1, col2, count(*)

16. FROM holddups

17. GROUP BY col1, col2

should return a count of 1 for each row. If yes, proceed to step 5 below. If no, you have duplicate keys, yet unique rows, and need to decide which rows to save. This will usually entail either discarding a row, or creating a new unique key value for this row. Take one of these two steps for each such duplicate PK in the holddups table.

18. Delete the duplicate rows from the original table. For example:

19. DELETE t1

20. FROM t1, holdkey

21. WHERE t1.col1 = holdkey.col1

22. AND t1.col2 = holdkey.col2

23. Put the unique rows back in the original table. For example:

24. INSERT t1 SELECT * FROM holddups

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s