且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

删除基于两列的重复项,并保留具有另一列最小值的行

更新时间:2022-12-10 09:50:27

您可以使用此查询删除所有重复的条目,并保留最早的条目:

You can use this query to delete all duplicate entries, leaving the earliest one:

DELETE d
FROM discog d
JOIN discog d1 ON d1.artist = d.artist AND d1.track = d.track AND d1.year < d.year;

更新

对于大型表来说,另一种更有效的替代解决方案是使用行上的UNIQUE索引来创建副本,以防止重复插入:

An alternate solution which should be more efficient for really large tables is to create a copy, using a UNIQUE index on the rows to prevent duplicate insertion:

CREATE TABLE discog_copy (id INT, artist VARCHAR(50), track VARCHAR(50), year INT);
ALTER TABLE discog_copy ADD UNIQUE KEY (artist, track);
INSERT IGNORE INTO discog_copy SELECT * FROM discog ORDER BY year;

唯一键位于艺术家名称和曲目名称的组合上,因此它将允许艺术家使用不同的曲目,而不同的艺术家使用相同的曲目名称.由于查询的SELECT部分具有ORDER BY年,因此它将首先插入最低年份的(artist,track,year)组合,然后由于重复而不会插入其他相同的(artist,track)记录键.

The unique key is on the combination of artist name and track name and so it will allow artists to have different tracks and different artists to have the same track name. Because the SELECT part of the query has ORDER BY year, it will insert the (artist,track,year) combination with the lowest year first and then other identical (artist, track) records will not be inserted due to the duplicate key.

右旋演示[a>

Demo on rextester