SQL Server Instead Of trigger to prevent duplicates

You might face a scenario where you’re using a REST Web Service or WebAPI where multiple hits of the same request to the API from the User interface is causing duplicate inserts. As the hits come to the API at the same date time-stamp, the API check fails at the database level as that record won’t exist in the table at that point in time.

To prevent the duplicate from getting inserted, we have multiple options at the Database level and one of them is using Instead Of trigger.

Using the INSTEAD Of trigger, you can conditionally choose to INSERT into the table or take some other action as per the requirement.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER TRIGGER [dbo].[PREVENT_DUP_INSERT] ON [dbo].[tbl_submission]
INSTEAD OF INSERT
AS
BEGIN
	SET NOCOUNT OFF;
	IF NOT EXISTS(
		SELECT 1 FROM dbo.tbl_submission
		WHERE question_uid=(SELECT inserted.q_uid FROM inserted)
		AND user_survey_instance_id=(SELECT user_inst_id FROM inserted)
	)
	BEGIN
		INSERT INTO dbo.tbl_submission(user_id,q_uid,[value],[group],user_inst_id,created_at,updated_at)
		SELECT user_id,q_uid,[value],[group],user_inst_id,created_at,updated_at FROM inserted
	END
	SELECT [id] FROM [dbo].[tbl_submission] WHERE @@ROWCOUNT > 0 AND [id] = scope_identity();
END

An id must be returned by the body of the INSTEAD OF trigger. This is required especially if you’re using an ORM like Entity Framework which may be give concurrency related exception.

You can also choose to apply UNIQUE constraint on select columns to prevent duplicates and handle the Insert exceptions in the API or Trigger itself. But in my case, since there are already few duplicate insertions and one of the columns was VARCHAR(MAX) which does not allow creating UNIQUE indexes.

Finding duplicates in a table by grouping columns SQL Server

The GROUP By clause is used in a SELECT query to group data in rows based on certain columns.
It is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result-set.

To find duplicates in SQL Server table using GROUP BY clause based on multiple columns say name and email as below:

Id NAME EMAIL
1 John John.Smith@abc.com
2 James James.Connor@abc.com
3 Bob Bob.Smith@abc.com
4 Robin Robin.Sharma@abc.com
5 Bob Bob.Smith@abc.com
6 John John.Smith@abc.com

SELECT name, email, COUNT(*) AS cnt FROM userdetails
GROUP BY name, email
HAVING COUNT(*)>1

Output:

NAME EMAIL cnt
John John.Smith@abc.com 2
Bob Bob.Smith@abc.com 2

SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.