Finding Duplicate Rows in SQL Server

Summary: in this tutorial, you will learn how to use the GROUP BY clause or ROW_NUMBER() function to find duplicate values in a table.

Technically, you use the UNIQUE constraints to enforce the uniqueness of rows in one or more columns of a table. However, sometimes you may find duplicate values in a table due to the poor database design, application bugs, or uncleaned data from external sources. Your job is to identify these duplicate values in effective ways.

To find the duplicate values in a table, you follow these steps:

First, define criteria for duplicates: values in a single column or multiple columns.
Second, write a query to search for duplicates.

If you want to also delete the duplicate rows, you can go to the deleting duplicates from a table tutorial.

Let’s set up a sample table for the demonstration.

Setting up a sample table

First, create a new table named t1 that contains three columns id, a, and b.

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
    id INT IDENTITY(1, 1), 
    a  INT, 
    b  INT, 
    PRIMARY KEY(id)
);
Code language: SQL (Structured Query Language) (sql)

Then, insert some rows into the t1 table:

INSERT INTO
    t1(a,b)
VALUES
    (1,1),
    (1,2),
    (1,3),
    (2,1),
    (1,2),
    (1,3),
    (2,1),
    (2,2);
Code language: SQL (Structured Query Language) (sql)

The t1 table contains the following duplicate rows:

(1,2)
(2,1)
(1,3)
Code language: SQL (Structured Query Language) (sql)

Your goal is to write a query to find the above duplicate rows.

Using GROUP BY clause to find duplicates in a table

This statement uses the GROUP BY clause to find the duplicate rows in both a and b columns of the t1 table:

SELECT 
    a, 
    b, 
    COUNT(*) occurrences
FROM t1
GROUP BY
    a, 
    b
HAVING 
    COUNT(*) > 1;
Code language: SQL (Structured Query Language) (sql)

Here is the result:

SQL Server Find Duplicates using GROUP BY clause

How it works:

First, the GROUP BY clause groups the rows into groups by values in both a and b columns.
Second, the COUNT() function returns the number of occurrences of each group (a,b).
Third, the HAVING clause keeps only duplicate groups, which are groups that have more than one occurrence.

To return the entire row for each duplicate row, you join the result of the above query with the t1 table using a common table expression (CTE):

WITH cte AS (
    SELECT
        a, 
        b, 
        COUNT(*) occurrences
    FROM t1
    GROUP BY 
        a, 
        b
    HAVING 
        COUNT(*) > 1
)
SELECT 
    t1.id, 
    t1.a, 
    t1.b
FROM t1
    INNER JOIN cte ON 
        cte.a = t1.a AND 
        cte.b = t1.b
ORDER BY 
    t1.a, 
    t1.b;
Code language: SQL (Structured Query Language) (sql)

Here is the output:

Generally, the query for finding the duplicate values in one column using the GROUP BY clause is as follows:

SELECT 
    col, 
    COUNT(col)
FROM 
    table_name
GROUP BY 
    col
HAVING 
    COUNT(col) > 1;
Code language: SQL (Structured Query Language) (sql)

The query for finding the duplicate values in multiple columns using the GROUP BY clause :

SELECT 
    col1,col2,...
    COUNT(*)
FROM 
    table_name
GROUP BY 
    col1,col2,...
HAVING 
    COUNT(*) > 1;
Code language: SQL (Structured Query Language) (sql)

Using ROW_NUMBER() function to find duplicates in a table

The following statement uses the ROW_NUMBER() function to find duplicate rows based on both a and b columns:

WITH cte AS (
    SELECT 
        a, 
        b, 
        ROW_NUMBER() OVER (
            PARTITION BY a,b
            ORDER BY a,b) rownum
    FROM 
        t1
) 
SELECT 
  * 
FROM 
    cte 
WHERE 
    rownum > 1;
Code language: SQL (Structured Query Language) (sql)

Here is the result:

SQL Server Find Duplicates using ROW_NUMBER

How it works:

First, the ROW_NUMBER() distributes rows of the t1 table into partitions by values in the a and b columns. The duplicate rows will have repeated values in the a and b columns, but different row numbers as shown in the following picture:

Second, the outer query removes the first row in each group.

Generally, This statement uses the ROW_NUMBER() function to find the duplicate values in one column of a table:

WITH cte AS (
    SELECT 
        col,
        ROW_NUMBER() OVER (
            PARTITION BY col
            ORDER BY col) row_num
    FROM 
        t1
) 
SELECT * FROM cte 
WHERE row_num > 1;Code language: SQL (Structured Query Language) (sql)

In this tutorial, you have learned how to use the GROUP BY clause or ROW_NUMBER() function to find duplicate values in SQL Server.

Was this tutorial helpful?