且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何删除sparql查询中的重复项

更新时间:2023-02-13 15:59:32

Natan Cox的答案显示了排除此类伪重复项的典型方法。结果实际上不是重复的,因为其中一个是例如乔治·伯恩斯(George Burns)是角色,而另一个是他是person2。在许多情况下,您可以添加过滤器以要求对这两件事进行排序,这将删除重复的案例。例如,当您拥有以下数据时:

Natan Cox's answer shows the typical way to exclude these kind of pseudo-duplicates. The results aren't actually duplicates, because in one, e.g., George Burns is the ?actor, and in the other he is the ?person2. In many cases, you can add a filter to require that the two things are ordered, and that will remove the duplicate cases. E.g., when you have data like:

:a :likes :b .
:a :likes :c .

,然后搜索

select ?x ?y where { 
  :a :likes ?x, ?y .
}

您可以添加 filter(?x<?y)强制在?x和?y之间进行排序,这将删除这些伪重复项。但是,在这种情况下,这有点棘手,因为找不到使用相同的critera的?actor和?person2。如果DBpedia包含

you can add filter(?x < ?y) to enforce an ordering between the between ?x and ?y which will remove these pseudo-duplicates. However, in this case, it's a bit trickier, since ?actor and ?person2 aren't found using the same critera. If DBpedia contains

:PersonB dbo:spouse :PersonA

但不是

:PersonA dbo:spouse :PersonB

然后,简单的过滤器将不起作用,因为您永远找不到主题PersonA小于对象PersonB。因此,在这种情况下,您还需要对查询进行一些修改以使条件对称:

then the simple filter won't work, because you'll never find the triple where the subject PersonA is less than the object PersonB. So in this case, you also need to modify your query a bit to make the criteria symmetric:

select distinct ?actor ?spouse (count(?film) as ?count) {
  ?film dbo:starring ?actor, ?spouse .
  ?actor dbo:spouse|^dbo:spouse ?spouse .
  filter(?actor < ?spouse)
}
group by ?actor ?spouse
having (count(?film) > 9)
order by ?actor

(此查询还显示您在这里不需要子查询,可以使用必须过滤汇总值。)但是重要的部分是使用属性路径 dbo:spouse | ^ dbo:spouse 查找?spouse的值,使得 ?actor dbo:spouse?spouse ?spouse dbo:spouse?actor 。这样会使关系对称,因此即使关系仅在一个方向上声明,也可以确保获得所有对。

(This query also shows that you don't need a subquery here, you can use having to "filter" on aggregate values.) But the important part is using the property path dbo:spouse|^dbo:spouse to find a value for ?spouse such that either ?actor dbo:spouse ?spouse or ?spouse dbo:spouse ?actor. This makes the relationship symmetric, so that you're guaranteed to get all the pairs, even if the relationship is only declared in one direction.