且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

SQLite是否在WHERE子句中使用多个AND条件优化查询?

更新时间:2023-10-23 13:36:46

(基于评论和后续测试的更新答案.)

您问题的实际答案

如何确保如果我们有100万行,则对查询...进行优化,以便仅在易于测试的第二个条件已经为True时才测试第一个条件(CPU昂贵)? /p>

how to make sure that, if we have 1 million rows, the query ... is optimized so that the 1st condition (CPU-expensive) is only tested if the easy-to-test second condition is already True?

取决于

  • WHERE子句中的实际条件,和
  • SQLite查询优化器在估算这些条件的成本方面有多聪明.

一个简单的测试应该告诉您查询是否可以充分优化"以满足您的需求.好消息是,至少在某些情况下,SQLite 将首先执行轻松(廉价)条件.

A simple test should tell you whether your query would be sufficiently "optimized" for your needs. The good news is that SQLite will perform the easy (inexpensive) condition first, at least under certain circumstances.

对于测试表"mytable"

For a test table "mytable"

CREATE TABLE mytable (
    description TEXT(50) NOT NULL,
    column2 INTEGER NOT NULL,
    CONSTRAINT mytable_PK PRIMARY KEY (column2)
);

包含一百万行

description  column2
-----------  -------
row000000          0
row000001          1
row000002          2
...
row999999     999999

Python测试代码

the Python test code

import sqlite3
import time

log_file_spec = r'C:\Users\Gord\Desktop\log_file.txt'

def myfunc(thing):
    with open(log_file_spec, 'a') as log:
        log.write('HODOR\n')
    return(int(thing[-6:]))


with open(log_file_spec, 'w'):
    pass  # just empty the file
cnxn = sqlite3.connect(r'C:\__tmp\SQLite\test.sqlite')
cnxn.create_function("myfunction", 1, myfunc)
crsr = cnxn.cursor()
t0 = time.time()
sql = """\
SELECT COUNT(*) AS n FROM mytable
WHERE myfunction(description) < 500 AND column2 < 1000
"""
crsr.execute(sql)
num_rows = crsr.fetchone()[0]
print(f"{num_rows} rows found in {(time.time() - t0):.1f} seconds")

cnxn.close()

返回

500 rows found in 1.2 seconds

计算我们看到的log_file.txt中的行

and counting the lines in log_file.txt we see

C:\Users\Gord>find /C "HODOR" Desktop\log_file.txt

---------- DESKTOP\LOG_FILE.TXT: 1000

表示我们的函数仅被调用了1000次,而不是一百万次. SQLite显然已经首先应用了column2 < 1000,然后在第一个条件之后的行子集上应用了myfunction(description) < 500条件.

indicating that our function was only called one thousand times, not one million times. SQLite has clearly applied the column2 < 1000 first, and then applied the myfunction(description) < 500 condition on the subset of rows from the first condition.


(原始的袖手旁观"答案.)

问题的实际答案取决于查询优化器的智能程度.一个简单的测试应该告诉您查询是否可以充分优化"以满足您的需求.

The actual answer to your question depends on how clever the query optimizer is. A simple test should tell you whether your query would be sufficiently "optimized" for your needs.

但是,如果您的测试发现您的原始方法太慢,则您有两种选择:

However, you do have a couple of options if your tests find that your original approach is too slow:

选项1:尝试简单比较第一"

更改订单可能会影响查询计划,例如

Changing the order might affect the query plan, e.g.

... WHERE <easy_condition> AND <expensive_condition>

可能会比

... WHERE <expensive_condition> AND <easy_condition> 

选项2:尝试使用子查询强制执行订单

再次,取决于查询优化器的聪明程度

Again, depending on the cleverness of the query optimizer

SELECT easy.* 
FROM 
    (SELECT * FROM mytable WHERE column2 < 1000) easy
WHERE myfunction(easy.description) < 500

可能首先应用便宜的条件,然后对结果的行子集应用昂贵的条件. (但是,有评论指出,SQLite太复杂了,无法接受这种策略.)

might apply the inexpensive condition first, then apply the expensive condition on the resulting subset of rows. (However, a comment indicates that SQLite is too sophisticated to fall for that ploy.)