且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

加快PostgreSQL查询(检查另一个表中是否存在条目)

更新时间:2022-05-10 23:35:20

这是您的查询:

SELECT "genomic_accession", "assembly", "product_accession", "tmpcol",
       ("product_accession" IN ( SELECT product_accession FROM "cacheDB" ) OR
        "tmpcol" IN ( SELECT product_accession FROM "cacheDB")
       ) AS "CACHE",
       ("product_accession" IN ( SELECT product_accession FROM "SBPDB" ) OR
        "tmpcol" IN ( SELECT product_accession FROM "SBPDB" ) AS "SBP"
FROM (SELECT * FROM "pairTable2" LIMIT 500000) "dbplyr_031";

我会删除所有双引号。不要创建列名以及需要转义的表名。然后,具有正确索引的 EXISTS 通常会表现更好:

I would get rid of all the double quotes. Don't create column names and table names that need to be escaped. Then, EXISTS with the right indexes often performs better:

SELECT "genomic_accession", "assembly", "product_accession", "tmpcol",
       (EXISTS (SELECT 1
                FROM "cacheDB" c
                *WHERE c.product_accession IN (pt.product_accession, pt.tmpcol ) 
               )
       ) AS CACHE,
       (EXISTS (SELECT 1
                FROM "SBPDB" s
                WHERE s.product_accession IN (pt.product_accession, pt.tmpcol ) 
               )
       ) AS SBP
FROM (SELECT * FROM "pairTable2" LIMIT 500000) pt;

然后,为了提高性能,您希望在 cachedb(product_accession)上建立索引 sbpdb(product_accession)

Then, for performance, you want indexes on cachedb(product_accession) and sbpdb(product_accession).