更新时间:2023-02-27 14:38:08
首先,如果参照完整性是通过FK约束实施的,则可以从查询中完全删除 Patient
表:
First, if referential integrity is enforced with FK constraints, you can drop the patient
table from the query completely:
SELECT COUNT(DISTINCT patient) -- still not optimal
FROM event a
JOIN event o USING (patient_id)
JOIN event m USING (patient_id)
WHERE a.category = 'admission'
AND o.category = 'operation'
AND m.category = 'medication'
AND m.date > o.date
AND o.date > a.date;
接下来,使用 SELECT
中的问题.postgresql.org/docs/current/functions-subquery.html#FUNCTIONS-SUBQUERY-EXISTS"rel =" nofollow noreferrer> EXISTS
改为半联接:
Next, get rid of the repeated multiplication of rows and the DISTINCT
to counter that in the outer SELECT
by using EXISTS
semi-joins instead:
SELECT COUNT(*)
FROM event a
WHERE EXISTS (
SELECT FROM event o
WHERE o.patient_id = a.patient_id
AND o.category = 'operation'
AND o.date > a.date
AND EXISTS (
SELECT FROM event m
WHERE m.patient_id = a.patient_id
AND m.category = 'medication'
AND m.date > o.date
)
)
AND a.category = 'admission';
请注意,录取中仍然存在重复项,但这可能是您的数据模型/查询设计中的主要问题,需要进行注释中所述的澄清.
Note, there can still be duplicates in the admission, but that's probably a principal problem in your data model / query design, and would need clarification as discussed in the comments.
如果您确实出于某种原因想要将同一例患者的所有病例都合并在一起,则可以采用多种方法在初始步骤中尽早让每位患者尽早入院,并重复类似的方法每增加一个步骤.针对您的情况可能最快(将患者表重新引入查询中):
If you indeed want to lump all cases of the same patient together for some reason, there are various ways to get the earliest admission for each patient in the initial step - and repeat a similar approach for every additional step. Probably fastest for your case (re-introducing the patient table to the query):
SELECT count(*)
FROM patient p
CROSS JOIN LATERAL ( -- get earliest admission
SELECT e.date
FROM event e
WHERE e.patient_id = p.id
AND e.category = 'admission'
ORDER BY e.date
LIMIT 1
) a
CROSS JOIN LATERAL ( -- get earliest operation after that
SELECT e.date
FROM event e
WHERE e.patient_id = p.id
AND e.category = 'operation'
AND e.date > a.date
ORDER BY e.date
LIMIT 1
) o
WHERE EXISTS ( -- the *last* step can still be a plain EXISTS
SELECT FROM event m
WHERE m.patient_id = p.id
AND m.category = 'medication'
AND m.date > o.date
);
请参阅:
您可以通过缩短冗长的(和多余的)类别名称来优化表设计.使用查找表并仅存储整数
(甚至将 int2
或"char"
值存储为FK.)
You might optimize your table design by shortening the lengthy (and redundant) category names. Use a lookup table and only store an integer
(or even int2
or "char"
value as FK.)
为了获得***性能(这很关键),在(parent_id,类别,日期DESC)
上使用多列索引,并确保定义了所有三列不为空
.索引表达式的顺序很重要. DESC
在这里大部分是可选的.在您的情况下,Postgres可以按默认的 ASC
排序顺序使用索引,效果几乎一样有效.
For best performance (and this is crucial) have a multicolumn index on (parent_id, category, date DESC)
and make sure all three columns are defined NOT NULL
. The order of index expressions is important. DESC
is mostly optional here. Postgres can use the index with default ASC
sort order almost as efficiently in your case.
如果 VACUUM
(***以autovacuum的形式)可以跟上写操作,或者您开始时处于只读情况,那么您会非常快地
If VACUUM
(preferably in the form of autovacuum) can keep up with write operations or you have a read-only situation to begin with, you'll get very fast index-only scans out of this.
相关:
To implement your additional time frames (your "advanced use case"), build on the second query since we have to consider all events again.
在相关情况下,您应该确实具有病例ID或更明确的名称,以将手术与入院和药物与手术等联系起来.(可能只是引用事件的 id
!)仅日期/时间戳容易出错.
You should really have case IDs or something more definitive to tie operation to admission and medication to operation etc. where relevant. (Could simply be the id
of the referenced event!) Dates / timestamps alone are error-prone.
SELECT COUNT(*) -- to count cases
-- COUNT(DISTINCT patient_id) -- to count patients
FROM event a
WHERE EXISTS (
SELECT FROM event o
WHERE o.patient_id = a.patient_id
AND o.category = 'operation'
AND o.date >= a.date -- or ">"
AND o.date < a.date + 7 -- based on data type "date"!
AND EXISTS (
SELECT FROM event m
WHERE m.patient_id = a.patient_id
AND m.category = 'medication'
AND m.date >= o.date -- or ">"
AND m.date < o.date + 30 -- syntax for timestamp is different
)
)
AND a.category = 'admission';
关于日期
/时间戳
算术: