且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

PostgreSQL cluster大幅减少nestloop离散IO的优化方法

更新时间:2022-08-13 10:25:32

背景

对于较大数据量的表,如果在索引字段上面有小结果集JOIN,用nestloop JOIN是比较好的方法。

但是nestloop带来的一个问题就是离散IO,这个是无法回避的问题,特别是硬件IO能力不行的情况下,性能会比较糟糕。

有什么优化方法呢?

PostgreSQL提供了一个命令,可以修改物理存储的顺序,减少离散IO就靠它了。

例子

创建两张表

postgres=# create unlogged table test01(id int primary key, info text);
CREATE TABLE
postgres=# create unlogged table test02(id int primary key, info text);
CREATE TABLE

产生一些离散primary key数据

postgres=# insert into test01 select trunc(random()*10000000), md5(random()::text) from generate_series(1,10000000) on conflict on constraint test01_pkey do nothing;
INSERT 0 6322422

postgres=#  insert into test02 select trunc(random()*10000000), md5(random()::text) from generate_series(1,10000000) on conflict on constraint test02_pkey do nothing;
INSERT 0 6320836

分析表

postgres=# analyze test01;
postgres=# analyze test02;

清除缓存,并重启

$ pg_ctl stop -m fast
# echo 3 > /proc/sys/vm/drop_caches
$ pg_ctl start

第一次调用,耗费大量的离散IO,执行时间18.490毫秒(我这台机器是SSD,IOPS能力算好的,差的机器时间更长)

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.*,t2.* from test01 t1,test02 t2 where t1.id=t2.id and t1.id between 1 and 1000;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=19.25..7532.97 rows=623 width=74) (actual time=0.465..17.221 rows=402 loops=1)
   Output: t1.id, t1.info, t2.id, t2.info
   Buffers: shared hit=1929 read=1039 dirtied=188
   ->  Bitmap Heap Scan on public.test01 t1  (cost=18.82..2306.39 rows=623 width=37) (actual time=0.416..8.019 rows=640 loops=1)
         Output: t1.id, t1.info
         Recheck Cond: ((t1.id >= 1) AND (t1.id <= 1000))
         Heap Blocks: exact=637
         Buffers: shared hit=5 read=637 dirtied=123
         ->  Bitmap Index Scan on test01_pkey  (cost=0.00..18.66 rows=623 width=0) (actual time=0.254..0.254 rows=640 loops=1)
               Index Cond: ((t1.id >= 1) AND (t1.id <= 1000))
               Buffers: shared hit=4 read=1
   ->  Index Scan using test02_pkey on public.test02 t2  (cost=0.43..8.38 rows=1 width=37) (actual time=0.013..0.013 rows=1 loops=640)
         Output: t2.id, t2.info
         Index Cond: (t2.id = t1.id)
         Buffers: shared hit=1924 read=402 dirtied=65
 Planning time: 26.668 ms
 Execution time: 18.490 ms
(17 rows)

第二次,缓存命中5.4毫秒

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.*,t2.* from test01 t1,test02 t2 where t1.id=t2.id and t1.id between 1 and 1000;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=19.25..7532.97 rows=623 width=74) (actual time=0.392..5.150 rows=402 loops=1)
   Output: t1.id, t1.info, t2.id, t2.info
   Buffers: shared hit=2968
   ->  Bitmap Heap Scan on public.test01 t1  (cost=18.82..2306.39 rows=623 width=37) (actual time=0.373..1.760 rows=640 loops=1)
         Output: t1.id, t1.info
         Recheck Cond: ((t1.id >= 1) AND (t1.id <= 1000))
         Heap Blocks: exact=637
         Buffers: shared hit=642
         ->  Bitmap Index Scan on test01_pkey  (cost=0.00..18.66 rows=623 width=0) (actual time=0.218..0.218 rows=640 loops=1)
               Index Cond: ((t1.id >= 1) AND (t1.id <= 1000))
               Buffers: shared hit=5
   ->  Index Scan using test02_pkey on public.test02 t2  (cost=0.43..8.38 rows=1 width=37) (actual time=0.004..0.004 rows=1 loops=640)
         Output: t2.id, t2.info
         Index Cond: (t2.id = t1.id)
         Buffers: shared hit=2326
 Planning time: 0.956 ms
 Execution time: 5.434 ms
(17 rows)

根据索引字段调整表的物理顺序,降低离散IO。

postgres=# cluster test01 using test01_pkey;
CLUSTER
postgres=# cluster test02 using test02_pkey;
CLUSTER
postgres=# analyze test01;
postgres=# analyze test02;

清除缓存,重启数据库

$ pg_ctl stop -m fast
# echo 3 > /proc/sys/vm/drop_caches
$ pg_ctl start

第一次调用,降低到了5.4毫秒

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.*,t2.* from test01 t1,test02 t2 where t1.id=t2.id and t1.id between 1 and 1000;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.86..5618.07 rows=668 width=74) (actual time=0.069..4.072 rows=402 loops=1)
   Output: t1.id, t1.info, t2.id, t2.info
   Buffers: shared hit=2323 read=12
   ->  Index Scan using test01_pkey on public.test01 t1  (cost=0.43..30.79 rows=668 width=37) (actual time=0.040..0.557 rows=640 loops=1)
         Output: t1.id, t1.info
         Index Cond: ((t1.id >= 1) AND (t1.id <= 1000))
         Buffers: shared hit=5 read=6
   ->  Index Scan using test02_pkey on public.test02 t2  (cost=0.43..8.35 rows=1 width=37) (actual time=0.004..0.004 rows=1 loops=640)
         Output: t2.id, t2.info
         Index Cond: (t2.id = t1.id)
         Buffers: shared hit=2318 read=6  --  注意在cluster之后,shared hit并没有下降,因为LOOP了多次,但是性能确比cluster 之前提升了很多,因为需要访问的HEAP page少了,OS cache可以瞬间命中。 
 Planning time: 42.356 ms
 Execution time: 5.426 ms
(13 rows)

第二次调用,3.6毫秒

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.*,t2.* from test01 t1,test02 t2 where t1.id=t2.id and t1.id between 1 and 1000;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.86..5618.07 rows=668 width=74) (actual time=0.055..3.414 rows=402 loops=1)
   Output: t1.id, t1.info, t2.id, t2.info
   Buffers: shared hit=2335
   ->  Index Scan using test01_pkey on public.test01 t1  (cost=0.43..30.79 rows=668 width=37) (actual time=0.037..0.374 rows=640 loops=1)
         Output: t1.id, t1.info
         Index Cond: ((t1.id >= 1) AND (t1.id <= 1000))
         Buffers: shared hit=11
   ->  Index Scan using test02_pkey on public.test02 t2  (cost=0.43..8.35 rows=1 width=37) (actual time=0.003..0.004 rows=1 loops=640)
         Output: t2.id, t2.info
         Index Cond: (t2.id = t1.id)
         Buffers: shared hit=2324
 Planning time: 1.042 ms
 Execution time: 3.620 ms
(13 rows)

小结

通过cluster, 将表的物理顺序和索引对齐,所以如果查询的值是连续的,在使用嵌套循环时可以大幅减少离散IO,取得非常好查询优化的效果。

如果查询的值是跳跃的,那么这种方法就没有效果啦,不过好在PostgreSQL有bitmap index scan,在读取heap tuple前,会对ctid排序,按排序后的ctid取heap tuple,也可以起到减少离散IO的作用。