且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

删除重新排列,只剩下最旧的行?

更新时间:2023-02-10 15:59:55

由于您使用id列作为哪个记录为原始的指标:



从myTable x
删除x
$ x
在x.subscriberEmail = z.subscriberEmail上加入myTable z
其中x.id> z.id

这将为每个电子邮件地址留下一条记录。



编辑添加:



解释上面的查询...



这里是加入桌面反对自己。假装你有两个表的副本,每个命名的东西不同。然后,您可以将它们相互比较,找到最低的ID或每个电子邮件地址。然后,您将看到稍后创建的重复记录,并可能删除它们。 (我在考虑这个时可视化Excel。)



为了在表上执行该操作,将其与自身进行比较,并且能够识别每一方,您使用表别名 x 是一个表别名。它在中从子句中分配,如下所示: from< table> &LT;别名&GT; 。现在可以在同一查询中的其他地方使用 x 将该表作为快捷方式。



delete x 使用我们的操作和目标启动查询。我们将执行查询以从多个表中选择记录,并且我们要删除出现在 x 中的记录。



别名用于引用表的实例。来自myTable x的在x.subscriberEmail = z.subscriberEmail 上加入myTable z可以使电子邮件与电子邮件相匹配。没有where子句,每个记录将被选中,因为它可以与自己联系起来。



其中子句限制所选记录。 其中x.id> z.id 允许实例别名 x 仅包含与电子邮件相匹配但具有较高 id 值。您在表中真正需要的数据,唯一的电子邮件地址(最低ID)不会是 x 的一部分,不会被删除。 x 中唯一的记录将是重复的记录(电子邮件地址),该记录的比例高于该记录的原始记录 id



在这种情况下,join和where子句可以组合起来:

 删除x 
从myTable x
加入myTable z
在x.subscriberEmail = z.subscriberEmail
和x.id> z.id

为防止重复,请考虑将subscriberEmail列设置为独特的索引列。


I have a table of data and there are many duplicate entries from user submissions.

I want to delete all duplicates rows based on the field subscriberEmail, leaving only the original submission.

In other words, I want to search for all duplicate emails, and delete those rows, leaving only the original.

How can I do this without swapping tables?
My table contains unique IDs for each row.

Since you're using the id column as an indicator of which record is 'original':

delete x 
from myTable x
 join myTable z on x.subscriberEmail = z.subscriberEmail
where x.id > z.id

This will leave one record per email address.

edit to add:

To explain the query above...

The idea here is to join the table against itself. Pretend that you have two copies of the table, each named something different. Then you could compare them to each other, and find the lowest id or for each email address. You'd then see the duplicate records that were created later on and could delete them. (I was visualizing Excel when thinking about this.)

In order to do that operation on a table, compare it to itself and be able to identify each side, you use table aliases. x is a table alias. It is assigned in the from clause like so: from <table> <alias>. x can now be used elsewhere in the same query to refer to that table as a shortcut.

delete x starts the query off with our action and target. We're going to perform a query to select records from multiple tables, and we want to delete records that appear in x.

Aliases are used to refer to both 'instances' of the table. from myTable x join myTable z on x.subscriberEmail = z.subscriberEmail bumps the table up against itself where the emails match. Without the where clause that follows, every record would be selected as it could be joined up against itself.

The where clause limits the records that are selected. where x.id > z.id allows the 'instance' aliased x to contain only the records that match emails but have a higher id value. The data that you really want in the table, unique email addresses (with the lowest id) will not be part of x and will not be deleted. The only records in x will be duplicate records (email addresses) that have a higher id than the original record for that email address.

The join and where clauses could be combined in this case:

delete x 
  from myTable x 
  join myTable z
    on x.subscriberEmail = z.subscriberEmail
      and x.id > z.id

For preventing duplicates, consider making the subscriberEmail column a UNIQUE indexed column.