Wednesday, 13 May 2015

How to filter duplicate rows (duplicate records by column)

Today my old project manager ask me for a favor to give him a SQL query which exclude duplicate rows.

let me direct come to the point and show you some pictorial presentation.

--Query--
select data.provider_id, data.person_id, data.rank, data.RowNumber from
 (
 select provider_id, person_id
 , Rank()
 over (
 PARTITION BY provider_id
 ORDER BY provider_id ) AS [rank],
 ROW_NUMBER() OVER (
 PARTITION BY provider_id
 ORDER BY provider_id DESC ) as RowNumber
 from testtble

 ) as data
 where data.RowNumber <= 5
 order by provider_id



you can see that in row 2 and 3 provider and person id's are repeated.
To avoid the duplicacy i have to first find the duplicate rows based on columns records.
So i have created below query to remove that duplicacy from result.

--Query--
;WITH CTE AS(
SELECT *,(RowsNumber - Rank_Number + 1) AS Rank_Item FROM
    ( SELECT PROVIDER_ID ,
                PERSON_ID ,
                ROW_NUMBER() OVER( ORDER BY provider_id) AS RowsNumber ,
                RANK() OVER( ORDER BY provider_id) AS Rank_Number
           FROM(
        SELECT
  PROVIDER_ID ,
  PERSON_ID ,
  COUNT(*) AS CNT
                   FROM
  TESTTBLE
                   GROUP BY
  PROVIDER_ID ,
                            PERSON_ID
               )A
       )B
     )SELECT * FROM CTE WHERE Rank_Item<=5
      ORDER BY PROVIDER_ID;


   

2 comments: