且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用SQLAlchemy在SQLite上创建全文搜索索引并进行查询?

更新时间:2023-01-21 23:31:59

FTS5提供了支持全文本搜索的虚拟表.换句话说,您不能在现有表中的列上创建全文索引.相反,您可以创建FTS5虚拟表,并从原始表中复制相关数据以建立索引.为了避免两次存储相同的数据,您可以将其设置为外部内容表,仍然必须确保手动或与触发器保持FTS5表保持同步.

FTS5 provides virtual tables that support full-text search. In other words you cannot create a full-text index on a column in an existing table. Instead you can create an FTS5 virtual table and copy relevant data from your original table over for indexing. In order to avoid storing the same data twice you can make it an external content table, though you will still have to make sure that the FTS5 table is kept in sync, either manually or with triggers.

您可以创建一个通用的自定义DDL构造,该构造将处理创建镜像另一个表的FTS5虚拟表:

You could create a generic custom DDL construct that'd handle creating a FTS5 virtual table that mirrors another table:

class CreateFtsTable(DDLElement):
    """Represents a CREATE VIRTUAL TABLE ... USING fts5 statement, for indexing
    a given table.

    """

    def __init__(self, table, version=5):
        self.table = table
        self.version = version


@compiles(CreateFtsTable)
def compile_create_fts_table(element, compiler, **kw):
    """
    """
    tbl = element.table
    version = element.version
    preparer = compiler.preparer
    sql_compiler = compiler.sql_compiler

    tbl_name = preparer.format_table(tbl)
    vtbl_name = preparer.quote(tbl.name + "_idx")

    text = "\nCREATE VIRTUAL TABLE "
    text += vtbl_name + " "
    text += "USING fts" + str(version) + "("

    separator = "\n"

    pk_column, = tbl.primary_key
    columns = [col for col in tbl.columns if col is not pk_column]

    for column in columns:
        text += separator
        separator = ", \n"
        text += "\t" + preparer.format_column(column)

        if not isinstance(column.type, String):
            text += " UNINDEXED"

    text += separator
    text += "\tcontent=" + sql_compiler.render_literal_value(
            tbl.name, String())

    text += separator
    text += "\tcontent_rowid=" + sql_compiler.render_literal_value(
            pk_column.name, String())

    text += "\n)\n\n"
    return text

给定的实现有点天真,默认情况下索引所有文本列.通过在原始表名之后添加 _idx 来隐式命名创建的虚拟表.

The given implementation is a bit naive and indexes all text columns by default. The created virtual table is implicitly named by adding _idx after the original table name.

但是,仅凭这还不够,如果您想使表与触发器保持自动化,并且由于您只为一个表添加索引,则可以选择在迁移脚本中使用文本DDL构造:

But that alone is not enough, if you want to automate keeping the tables in sync with triggers, and since you're adding an index for just one table, you could just opt to use text DDL constructs in your migration script:

def upgrade():
    ddl = [
        """
        CREATE VIRTUAL TABLE person_idx USING fts5(
            name,
            thumb UNINDEXED,
            content='person',
            content_rowid='id'
        )
        """,
        """
        CREATE TRIGGER person_ai AFTER INSERT ON person BEGIN
            INSERT INTO person_idx (rowid, name, thumb)
            VALUES (new.id, new.name, new.thumb);
        END
        """,
        """
        CREATE TRIGGER person_ad AFTER DELETE ON person BEGIN
            INSERT INTO person_idx (person_idx, rowid, name, thumb)
            VALUES ('delete', old.id, old.name, old.thumb);
        END
        """,
        """
        CREATE TRIGGER person_au AFTER UPDATE ON person BEGIN
            INSERT INTO person_idx (person_idx, rowid, name, thumb)
            VALUES ('delete', old.id, old.name, old.thumb);
            INSERT INTO person_idx (rowid, name, thumb)
            VALUES (new.id, new.name, new.thumb);
        END
        """
    ]

    for stmt in ddl:
        op.execute(sa.DDL(stmt))

如果您的人员表包含现有数据,请记住也将其插入到创建的虚拟表中以进行索引.

If your person table contains existing data, remember to insert those to the created virtual table as well for indexing.

为了实际使用创建的虚拟表,您可以创建一个

In order to actually use the created virtual table you could create a non-primary mapper for Person:

person_idx = db.Table('person_idx', db.metadata,
                      db.Column('rowid', db.Integer(), primary_key=True),
                      db.Column('name', db.Text()),
                      db.Column('thumb', db.Text()))

PersonIdx = db.mapper(
    Person, person_idx, non_primary=True,
    properties={
        'id': person_idx.c.rowid
    }
)

并使用例如MATCH进行全文查询:

And to make a full-text query using for example MATCH:

db.session.query(PersonIdx).\
    filter(PersonIdx.c.name.op("MATCH")("john")).\
    all()

请注意,结果是 Person 对象的列表. PersonIdx 只是 映射器 .

Note that the result is a list of Person objects. PersonIdx is just a Mapper.

如Victor K所述,不推荐使用非主要映射器,而新的替代方法是使用 aliased() .设置基本相同,但是使用以下命令创建 person_idx Table 时,需要进行 rowid id 的映射 Column key 参数:

As noted by Victor K. the use of non-primary mappers is deprecated and the new alternative is to use aliased(). The setup is mostly the same, but the rowid to id mapping needs to take place when creating person_idx Table using the key parameter of Column:

person_idx = db.Table('person_idx', db.metadata,
                      db.Column('rowid', db.Integer(), key='id', primary_key=True),
                      db.Column('name', db.Text()),
                      db.Column('thumb', db.Text()))

,而不是新的映射器,创建别名:

and instead of a new mapper create the alias:

PersonIdx = db.aliased(Person, person_idx, adapt_on_names=True)

别名的工作方式与映射类更相似,因为您不通过 .c 而是直接访问

The alias works more like the mapped class in that you do not access mapped attributes through .c, but directly:

db.session.query(PersonIdx).\
    filter(PersonIdx.name.op("MATCH")("john")).\
    all()