blog.sqlauthority.com-sql-server-delete-duplicate-records-rows
Table in example has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.
DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
------
blog.sqlauthority.com-delete-duplicate-rows-Using CTE
/* Delete Duplicate records */
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM CTE
WHERE DuplicateCount > 1
GO
---
Refer Source: How can I remove duplicate rows?
...
Another possible way of doing this is
;WITH ct.
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3
ORDER BY ( SELECT 0)) RN
FROM #MyTable)
DELETE FROM cte
WHERE RN > 1
I am using ORDER BY (SELECT 0) above as it is arbitrary which row to preserve in the event of a tie.
To preserve the latest one in RowID order for example you could use ORDER BY RowID DESC
Refer Source:
SQL Query - Delete duplicates if more than 3 dups?
with cte as (
select row_number() over (partition by dupcol1, dupcol2 order by ID) as rn
from table)
delete from cte
where rn > 2; -- or >3 etc
The query is manufacturing a 'row number' for each record, grouped by the (dupcol1, dupcol2) and ordered by ID. In effect this row number counts 'duplicates' that have the same dupcol1 and dupcol2 and assigns then the number 1, 2, 3.. N, order by ID. If you want to keep just 2 'duplicates', then you need to delete those that were assigned the numbers
3,4,.. N
and that is the part taken care of by the DELLETE.. WHERE rn > 2;
Using this method you can change the
ORDER BY
to suit your preferred order (eg. ORDER BY ID DESC
), so that the LATEST
has rn=1
, then the next to latest is rn=2 and so on. The rest stays the same, the DELETE
will remove only the oldest ones as they have the highest row numbers.
Unlike this closely related question, as the condition becomes more complex, using CTEs and row_number() becomes simpler. Performance may be problematic still if no proper access index exists.
Refer Source: ROW_NUMBER (Transact-SQL)
Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
Refer Source:
Here is our final SQL statement, which achieves the business logic we wanted to implement.
SELECT
[PersonID]
[FamilyID]
,[FirstName]
,[LastName]
,[DateOfBirth]
,ROW_NUMBER() over(PARTITION BY FamilyID,
CONVERT(NVARCHAR(25), DateOfBirth, 111)
ORDER BY DateOfBirth ASC) TwinCode
FROM [People]
ORDER BY PersonID
IIn the
ROW_NUMBER
function above, I am doing several things. I’m grouping on FamilyID, and also grouping on a converted DateOfBirth. I convert the DateOfBirth to an nvarchar
using the 111 conversion code, because that gets results like ‘2009/10/11′ and ‘2009/10/12′ which can easily be grouped by to achieve distinct dates.------------------------------------------------------------------------------------------------------------
Reference:
What is CTE ( Common table expression) in SQL Server?
------------------------------------------------------------------------------------------------------------
No comments:
Post a Comment