Finding Duplicate Records Using GROUP BY in SQL Server

feature-sql

There are various times when we need to find duplicate records in SQL Server. It is possible to find duplicates using DISTINCT, ROW NUMBER as well as the GROUP BY approach.

Duplicate records can create problems sometimes when displaying reports or performing a Multiple Insert update. Finding duplicate records in a database needs further investigation. In some cases, duplicate records are positive, but it all depends on the data and the database design as well.

For example, if a customer has ordered the same product twice on the same date with the the same shipping and billing address, then this may result in a duplicate record.

Let us create a table Customer with First Name, Last Name, and Mobile Number fields.

CREATE TABLE CUSTOMER
(
FirstName VARCHAR(50),
LastName  VARCHAR(50),
MobileNo  VARCHAR(15)
);

INSERT INTO CUSTOMER VALUES ('Niraj','Yadav',989898);

INSERT INTO CUSTOMER VALUES ('Chetan','Gadodia',959595);
INSERT INTO CUSTOMER VALUES ('Chetan','Gadodia',959595);

INSERT INTO CUSTOMER VALUES ('Atul','Kokam',42424242);
INSERT INTO CUSTOMER VALUES ('Atul','Kokam',42424242);

INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);
INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);
INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);

INSERT INTO CUSTOMER VALUES ('Jinendra','Jain',121212);
INSERT INTO CUSTOMER VALUES ('Jinendra','Jain',121212);

SELECT * FROM CUSTOMER;

Finding Duplicates Records Using Group by in SQL Server (2)

Using the DISTINCT approach, we can quickly get unique rows in a table.

SELECT DISTINCT FirstName, LastName, MobileNo FROM CUSTOMER;

Finding Duplicates Records Using Group by in SQL Server (3)

However, this does not show how many times a row has been duplicated. Using the GROUP BY approach, we can find this.

Finding Duplicates Using GROUP BY

Adding grouping and a counting field to our display of FirstName, LastName and MobileNo combination shows how many times each customer’s name appears.

SELECT		FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM		CUSTOMER
GROUP BY	FirstName, LastName, MobileNo;

Finding Duplicates Records Using Group by in SQL Server (4)

GROUP BY will show just one record for each combination of FirstName, LastName and MobileNo.

The count CNT shows how many times the row has been duplicated.
CNT = 1 indicates that row appears only once.

Let us filter out using the Having clause to exclude rows that appear only once.

SELECT		FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM		CUSTOMER
GROUP BY	FirstName, LastName, MobileNo
HAVING		COUNT(1) > 1;

Finding Duplicates Records Using Group by in SQL Server (1)

 

About Vishwanath Dalvi

Vishwanath Dalvi is a gifted engineer and tech enthusiast. He enjoys music, magic, movies, and gaming. When not hacking around or supporting the open source community, he is trying to overcome his phobia of dogs.
View more articles by Vishwanath Dalvi

The Conversation

Follow the reactions below and share your own thoughts.