且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

SQL Server 中的 INNER JOIN 与 LEFT JOIN 性能

更新时间:2022-11-30 17:49:24

LEFT JOIN 绝对不比 INNER JOIN 快.事实上,它更慢;根据定义,外连接(LEFT JOINRIGHT JOIN)必须完成 INNER JOIN 的所有工作以及 null 的额外工作- 扩展结果.预计还会返回更多行,从而进一步增加总执行时间,因为结果集较大.

A LEFT JOIN is absolutely not faster than an INNER JOIN. In fact, it's slower; by definition, an outer join (LEFT JOIN or RIGHT JOIN) has to do all the work of an INNER JOIN plus the extra work of null-extending the results. It would also be expected to return more rows, further increasing the total execution time simply due to the larger size of the result set.

(即使 LEFT JOIN 特定情况下由于一些难以想象的因素汇合而更快,在功能上不等同于 INNER JOIN,所以你不能简单地用另一个替换一个的所有实例!)

(And even if a LEFT JOIN were faster in specific situations due to some difficult-to-imagine confluence of factors, it is not functionally equivalent to an INNER JOIN, so you cannot simply go replacing all instances of one with the other!)

很可能您的性能问题出在其他地方,例如没有正确索引候选键或外键.9 张桌子需要加入很多,所以几乎可以在任何地方放慢速度.如果您发布自己的架构,我们或许可以提供更多详细信息.

Most likely your performance problems lie elsewhere, such as not having a candidate key or foreign key indexed properly. 9 tables is quite a lot to be joining so the slowdown could literally be almost anywhere. If you post your schema, we might be able to provide more details.

进一步思考这一点,我可以想到一种情况,在这种情况下,LEFT JOIN 可能比 INNER JOIN 更快,那就是:

Reflecting further on this, I could think of one circumstance under which a LEFT JOIN might be faster than an INNER JOIN, and that is when:

  • 有些表格非常很小(比如不到 10 行);
  • 表没有足够的索引来覆盖查询.
  • Some of the tables are very small (say, under 10 rows);
  • The tables do not have sufficient indexes to cover the query.

考虑这个例子:

CREATE TABLE #Test1
(
    ID int NOT NULL PRIMARY KEY,
    Name varchar(50) NOT NULL
)
INSERT #Test1 (ID, Name) VALUES (1, 'One')
INSERT #Test1 (ID, Name) VALUES (2, 'Two')
INSERT #Test1 (ID, Name) VALUES (3, 'Three')
INSERT #Test1 (ID, Name) VALUES (4, 'Four')
INSERT #Test1 (ID, Name) VALUES (5, 'Five')

CREATE TABLE #Test2
(
    ID int NOT NULL PRIMARY KEY,
    Name varchar(50) NOT NULL
)
INSERT #Test2 (ID, Name) VALUES (1, 'One')
INSERT #Test2 (ID, Name) VALUES (2, 'Two')
INSERT #Test2 (ID, Name) VALUES (3, 'Three')
INSERT #Test2 (ID, Name) VALUES (4, 'Four')
INSERT #Test2 (ID, Name) VALUES (5, 'Five')

SELECT *
FROM #Test1 t1
INNER JOIN #Test2 t2
ON t2.Name = t1.Name

SELECT *
FROM #Test1 t1
LEFT JOIN #Test2 t2
ON t2.Name = t1.Name

DROP TABLE #Test1
DROP TABLE #Test2

如果你运行这个并查看执行计划,你会看到INNER JOIN查询确实比LEFT JOIN花费更多,因为它满足两个以上标准.这是因为 SQL Server 想要对 INNER JOIN 进行哈希匹配,但对 LEFT JOIN 进行嵌套循环;前者通常要快得多,但是由于行数太少而且没有索引可以使用,因此哈希运算是整个过程中开销最大的部分.查询.

If you run this and view the execution plan, you'll see that the INNER JOIN query does indeed cost more than the LEFT JOIN, because it satisfies the two criteria above. It's because SQL Server wants to do a hash match for the INNER JOIN, but does nested loops for the LEFT JOIN; the former is normally much faster, but since the number of rows is so tiny and there's no index to use, the hashing operation turns out to be the most expensive part of the query.

您可以通过使用您最喜欢的编程语言编写程序来对具有 5 个元素的列表与具有 5 个元素的哈希表执行大量查找,从而看到相同的效果.由于大小,哈希表版本实际上更慢.但是将其增加到 50 个元素,或 5000 个元素,列表版本会慢到爬行,因为哈希表的 O(N) 与 O(1).

You can see the same effect by writing a program in your favourite programming language to perform a large number of lookups on a list with 5 elements, vs. a hash table with 5 elements. Because of the size, the hash table version is actually slower. But increase it to 50 elements, or 5000 elements, and the list version slows to a crawl, because it's O(N) vs. O(1) for the hashtable.

但是将此查询更改为在 ID 列而不是 Name 列上,您会看到一个非常不同的故事.在这种情况下,它会为两个查询执行嵌套循环,但是 INNER JOIN 版本能够用搜索替换聚集索引扫描之一 - 这意味着这实际上是 大量行数更快.

But change this query to be on the ID column instead of Name and you'll see a very different story. In that case, it does nested loops for both queries, but the INNER JOIN version is able to replace one of the clustered index scans with a seek - meaning that this will literally be an order of magnitude faster with a large number of rows.

所以结论或多或少是我上面几段提到的;这几乎可以肯定是索引或索引覆盖问题,可能与一个或多个非常小的表相结合.在这些情况下,SQL Server 可能有时会为 INNER JOIN 选择比 LEFT JOIN 更差的执行计划.

So the conclusion is more or less what I mentioned several paragraphs above; this is almost certainly an indexing or index coverage problem, possibly combined with one or more very small tables. Those are the only circumstances under which SQL Server might sometimes choose a worse execution plan for an INNER JOIN than a LEFT JOIN.