Finding duplicates in a table by grouping columns SQL Server

The GROUP By clause is used in a SELECT query to group data in rows based on certain columns.
It is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result-set.

To find duplicates in SQL Server table using GROUP BY clause based on multiple columns say name and email as below:

Id NAME EMAIL
1 John John.Smith@abc.com
2 James James.Connor@abc.com
3 Bob Bob.Smith@abc.com
4 Robin Robin.Sharma@abc.com
5 Bob Bob.Smith@abc.com
6 John John.Smith@abc.com

SELECT name, email, COUNT(*) AS cnt FROM userdetails
GROUP BY name, email
HAVING COUNT(*)>1

Output:

NAME EMAIL cnt
John John.Smith@abc.com 2
Bob Bob.Smith@abc.com 2

SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.