且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在Postgres jsonb中查询数组结构的正确索引是什么?

更新时间:2022-11-25 16:52:38

首先,你不能像这样访问JSON数组值。对于给定的json值

First of all, you cannot access JSON array values like that. For a given json value

[{"event_slug":"test_1","start_time":"2014-10-08","end_time":"2014-10-12"},
 {"event_slug":"test_2","start_time":"2013-06-24","end_time":"2013-07-02"},
 {"event_slug":"test_3","start_time":"2014-03-26","end_time":"2014-03-30"}]

针对第一个数组元素的有效测试将是:

A valid test against the first array element would be:

WHERE e->0->>'event_slug' = 'test_1'

但你可能不会想要将搜索限制在数组的第一个元素。使用Postgres 9.4中的 jsonb 数据类型,您可以获得额外的运算符和索引支持。要索引数组的元素,您需要一个GIN索引。

But you probably don't want to limit your search to the first element of the array. With the jsonb data type in Postgres 9.4 you have additional operators and index support. To index elements of an array you need a GIN index.

GIN索引的内置运算符类不支持大于或小于运算符 &GT; > =< &LT = 击>。对于 jsonb 也是如此,您可以在两个运算符类之间进行选择。 每份文件

The built-in operator classes for GIN indexes do not support "greater than" or "less than" operators > >= < <=. This is true for jsonb as well, where you can choose between two operator classes. Per documentation:

Name             Indexed Data Type  Indexable Operators
...
jsonb_ops        jsonb              ? ?& ?| @>
jsonb_path_ops   jsonb              @>

jsonb_ops 是默认值。)你可以涵盖相等测试,但这些运营商都不能满足您对> = 比较的要求。你需要一个btree索引。

(jsonb_ops being the default.) You can cover the equality test, but neither of those operators covers your requirement for >= comparison. You would need a btree index.

支持与索引进行相等性检查:

To support the equality check with an index:

CREATE INDEX locations_events_gin_idx ON locations
USING gin (events jsonb_path_ops);

SELECT * FROM locations WHERE events @> '[{"event_slug":"test_1"}]';

如果过滤器足够有选择性,这可能就足够了。

假设 end_time> = start_time ,所以我们不需要两次检查。仅检查 end_time 更便宜且相当于:

This might be good enough if the filter is selective enough.
Assuming end_time >= start_time, so we don't need two checks. Checking only end_time is cheaper and equivalent:

SELECT l.*
FROM   locations l
     , jsonb_array_elements(l.events) e
WHERE  l.events @> '{"event_slug":"test_1"}'
AND   (e->>'end_time')::timestamp >= '2014-10-30 14:04:06 -0400'::timestamptz;

利用隐式 JOIN LATERAL 。详情(最后一章):

Utilizing an implicit JOIN LATERAL. Details (last chapter):

  • PostgreSQL unnest() with element number

小心不同数据类型!您在JSON值中的内容类似于时间戳[没有时区] ,而您的谓词使用时区与时区文字。 时间戳值根据当前时区设置进行解释,而给定的 timestamptz 文字必须明确地转换为 timestamptz ,否则时区将被忽略!以上查询应该按照需要工作。详细解释:

Careful with the different data types! What you have in the JSON value looks like timestamp [without time zone], while your predicates use timestamp with time zone literals. The timestamp value is interpreted according to the current time zone setting, while the given timestamptz literals must be cast to timestamptz explicitly or the time zone would be ignored! Above query should work as desired. Detailed explanation:

  • Ignoring timezones altogether in Rails and PostgreSQL

的更多解释c $ c> jsonb_array_elements():

  • PostgreSQL joining using JSONB

如果上述情况不够好,我会考虑 MATERIALIZED VIEW 以标准化形式存储相关属性。这允许普通的btree索引。

If the above is not good enough, I would consider a MATERIALIZED VIEW that stores relevant attributes in normalized form. This allows plain btree indexes.

该代码假定您的JSON值具有问题中显示的一致格式。

The code assumes that your JSON values have a consistent format as displayed in the question.

设置:

CREATE TYPE event_type AS (
 , event_slug  text
 , start_time  timestamp
 , end_time    timestamp
);

CREATE MATERIALIZED VIEW loc_event AS
SELECT l.location_id, e.event_slug, e.end_time  -- start_time not needed
FROM   locations l, jsonb_populate_recordset(null::event_type, l.events) e;

的相关答案jsonb_populate_recordset()

  • How to convert PostgreSQL 9.4's jsonb type to float

CREATE INDEX loc_event_idx ON loc_event (event_slug, end_time, location_id);

还包括 location_id 以允许仅索引扫描。 (参见手册页 Postgres Wiki

Also including location_id to allow index-only scans. (See manual page and Postgres Wiki.)

查询:

SELECT *
FROM   loc_event
WHERE  event_slug = 'test_1'
AND    end_time  >= '2014-10-30 14:04:06 -0400'::timestamptz;

或者,如果您需要基础位置的完整行表格:

Or, if you need full rows from the underlying locations table:

SELECT l.*
FROM  (
   SELECT DISTINCT location_id
   FROM   loc_event
   WHERE  event_slug = 'test_1'
   AND    end_time  >= '2014-10-30 14:04:06 -0400'::timestamptz
   ) le
JOIN locations l USING (location_id);