且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

SQL IN 对性能有害吗?

更新时间:2023-02-17 15:09:42

使用 IN 运算符编写查询时,有几个注意事项可能会影响性能.

There are several considerations when writing a query using the IN operator that can have an affect on performance.

首先,大多数数据库通常在内部重写 IN 子句以使用 OR 逻辑连接词. 所以 col IN ('a','b','c') 重写为:(COL = 'a') OR (COL = 'b') or (COL = 'c').假设您在 col 上有索引,则两个查询的执行计划可能是等效的.

First, IN clauses are generally internally rewritten by most databases to use the OR logical connective. So col IN ('a','b','c') is rewritten to: (COL = 'a') OR (COL = 'b') or (COL = 'c'). The execution plan for both queries will likely be equivalent assuming that you have an index on col.

第二,当使用具有可变数量参数的 IN 或 OR 时,您会导致数据库在每次参数更改时都必须重新解析查询并重建执行计划.查询的执行计划可能是一个昂贵的步骤.大多数数据库使用 EXACT 查询文本作为键来缓存它们运行的​​查询的执行计划.如果您执行类似的查询,但在谓词中使用不同的参数值 - 您很可能会导致数据库花费大量时间来解析和构建执行计划.这就是为什么 绑定变量是强烈推荐以确保***查询性能.

Second, when using either IN or OR with a variable number of arguments, you are causing the database to have to re-parse the query and rebuild an execution plan each time the arguments change. Building the execution plan for a query can be an expensive step. Most databases cache the execution plans for the queries they run using the EXACT query text as a key. If you execute a similar query but with different argument values in the predicate - you will most likely cause the database to spend a significant amount of time parsing and building execution plans. This is why bind variables are strongly recommended as a way to ensure optimal query performance.

第三,许多数据库对它们可以执行的查询的复杂性有限制 - 这些限制之一是谓词中可以包含的逻辑连接词的数量.在您的情况下,有几个十几个值不太可能达到数据库的内置限制,但是如果您希望将数百或数千个值传递给 IN 子句 - 它肯定会发生.在这种情况下,数据库将简单地取消查询请求.

Third, many database have a limit on the complexity of queries they can execute - one of those limits is the number of logical connectives that can be included in the predicate. In your case, a few dozen values are unlikely to reach the built-in limit of the database, but if you expect to pass hundreds or thousands of value to an IN clause - it can definitely happen. In which case the database will simply cancel the query request.

第四,在谓词中包含 IN 和 OR 的查询不能总是在并行环境中以***方式重写. 有多种情况未应用并行服务器优化 - MSDN 有一个不错的介绍来优化并行查询.不过,通常情况下,使用 UNION ALL 运算符的查询在大多数数据库中都可以简单地并行化 - 并且在可能的情况下优先于逻辑连接词(如 OR 和 IN).

Fourth, queries that include IN and OR in the predicate cannot always be optimally rewritten in a parallel environment. There are various cases where parallel server optimization do not get applied - MSDN has a decent introduction to optimizing queries for parallelism. Generally though, queries that use the UNION ALL operator are trivially parrallelizable in most databases - and are preferred to logical connectives (like OR and IN) when possible.