且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何优化查询以计算与行相关的日期时间关系?

更新时间:2023-02-27 14:38:08

首先,如果参照完整性是通过FK约束实施的,则可以从查询中完全删除 Patient 表:

First, if referential integrity is enforced with FK constraints, you can drop the patient table from the query completely:

SELECT COUNT(DISTINCT patient)  -- still not optimal
FROM   event a
JOIN   event o USING (patient_id)
JOIN   event m USING (patient_id)
WHERE  a.category = 'admission'
AND    o.category = 'operation'
AND    m.category = 'medication'
AND    m.date > o.date
AND    o.date > a.date;

接下来,使用DISTINCT 来解决外部 SELECT 中的问题.postgresql.org/docs/current/functions-subquery.html#FUNCTIONS-SUBQUERY-EXISTS"rel =" nofollow noreferrer> EXISTS 改为半联接:

Next, get rid of the repeated multiplication of rows and the DISTINCT to counter that in the outer SELECT by using EXISTS semi-joins instead:

SELECT COUNT(*)
FROM   event a
WHERE  EXISTS (
   SELECT FROM event o
   WHERE  o.patient_id = a.patient_id
   AND    o.category = 'operation'
   AND    o.date > a.date
   AND    EXISTS (
      SELECT FROM event m
      WHERE  m.patient_id = a.patient_id
      AND    m.category = 'medication'
      AND    m.date > o.date
      )
   )
AND    a.category = 'admission';

请注意,录取中仍然存在重复项,但这可能是您的数据模型/查询设计中的主要问题,需要进行注释中所述的澄清.

Note, there can still be duplicates in the admission, but that's probably a principal problem in your data model / query design, and would need clarification as discussed in the comments.

如果您确实出于某种原因想要将同一例患者的所有病例都合并在一起,则可以采用多种方法在初始步骤中尽早让每位患者尽早入院,并重复类似的方法每增加一个步骤.针对您的情况可能最快(将患者表重新引入查询中):

If you indeed want to lump all cases of the same patient together for some reason, there are various ways to get the earliest admission for each patient in the initial step - and repeat a similar approach for every additional step. Probably fastest for your case (re-introducing the patient table to the query):

SELECT count(*)
FROM   patient p
CROSS  JOIN LATERAL ( -- get earliest admission
   SELECT e.date
   FROM   event e
   WHERE  e.patient_id = p.id 
   AND    e.category = 'admission'
   ORDER  BY e.date
   LIMIT  1
   ) a
CROSS  JOIN LATERAL ( -- get earliest operation after that
   SELECT e.date
   FROM   event e
   WHERE  e.patient_id = p.id 
   AND    e.category = 'operation'
   AND    e.date > a.date
   ORDER  BY e.date
   LIMIT  1
   ) o
WHERE EXISTS (  -- the *last* step can still be a plain EXISTS
      SELECT FROM event m
      WHERE  m.patient_id = p.id
      AND    m.category = 'medication'
      AND    m.date > o.date
      );

请参阅:

您可以通过缩短冗长的(和多余的)类别名称来优化表设计.使用查找表并仅存储整数(甚至将 int2 "char" 值存储为FK.)

You might optimize your table design by shortening the lengthy (and redundant) category names. Use a lookup table and only store an integer (or even int2 or "char" value as FK.)

为了获得***性能(这很关键),在(parent_id,类别,日期DESC)上使用多列索引,并确保定义了所有三列不为空.索引表达式的顺序很重要. DESC 在这里大部分是可选的.在您的情况下,Postgres可以按默认的 ASC 排序顺序使用索引,效果几乎一样有效.

For best performance (and this is crucial) have a multicolumn index on (parent_id, category, date DESC) and make sure all three columns are defined NOT NULL. The order of index expressions is important. DESC is mostly optional here. Postgres can use the index with default ASC sort order almost as efficiently in your case.

如果 VACUUM (***以autovacuum的形式)可以跟上写操作,或者您开始​​时处于只读情况,那么您会非常快地

If VACUUM (preferably in the form of autovacuum) can keep up with write operations or you have a read-only situation to begin with, you'll get very fast index-only scans out of this.

相关: