Hi all,
I need to write a query to remove duplicate rows from my Customers
table based on the
Email
column. Can someone help me with this?
Home / DeveloperSection / Forums / Help with Writing a Query to Remove Duplicates from a Table in SQL Server
Hi all,
I need to write a query to remove duplicate rows from my Customers
table based on the
Email
column. Can someone help me with this?
Ravi Vishwakarma
16-Jul-2024To remove duplicates from a table in SQL Server, you typically need to identify the duplicates based on certain columns and then delete the extra rows while retaining one instance of each duplicate group. Here’s a step-by-step approach:
Step 1: Identify Duplicates
First, identify the duplicate rows based on the columns that define the duplicates. You can use a
ROW_NUMBER()
window function to assign a unique number to each row within a group of duplicates.Step 2: Delete Duplicates
Use a common table expression (CTE) or a subquery to delete rows where the row number is greater than 1, thereby retaining only one instance of each duplicate group.
Example
Assume you have a table named
MyTable
with columnsID
,Column1
, andColumn2
, and you want to remove duplicates based onColumn1
andColumn2
.Step 1: Identify Duplicates
This query assigns a row number to each row within each group of duplicates defined by
Column1
andColumn2
. Rows withRowNum
greater than 1 are considered duplicates.Example
Step 2: Delete Duplicates
Explanation:
WITH CTE AS (...)
: Defines a Common Table Expression (CTE) namedCTE
.ROW_NUMBER() OVER (PARTITION BY Column1, Column2 ORDER BY ID) AS RowNum
: Assigns a unique row number to each row within the partition defined byColumn1
andColumn2
. Rows are ordered byID
.DELETE FROM CTE WHERE RowNum > 1
: Deletes rows from the CTE whereRowNum
is greater than 1, effectively removing duplicates and keeping only the first occurrence of each group.Important Note:
Ensure that you have a backup of your data before performing delete operations, as this action cannot be undone.
If your table has a primary key or a unique identifier (like
ID
in this example), this approach works well. If not, you may need to adapt the query to suit your specific table schema.Read more
Write a query to n-th highest salary.
Optimize SQL Server for high-concurrency workloads?
Designing a normalized database schema in SQL Server
How to use SQL Server indexing to optimize query performance?
Explain the Dynamic SQL Query with examples in SQL Server.