update testj3
set object_id = (select USER_ID
from testj2
where testj3.owner = testj2.username)
考虑如上的语句,如何估算其时间,实际上这个语句不管怎么样都会更新所有的行,
匹配的行更新为相应的值,不匹配的则更新为NULL。
同时其中包含了内联子查询,其执行时间受到查询时间的影响有着巨大的差别。其
方法类似于NEST LOOP,如下:
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
| 1 | UPDATE | TESTJ3 | 1 | | 0 |00:00:00.01 | 92 |
| 2 | TABLE ACCESS FULL| TESTJ3 | 1 | 29 | 29 |00:00:00.01 | 3 |
|* 3 | TABLE ACCESS FULL| TESTJ2 | 29 | 1 | 28 |00:00:00.01 | 87 |
---------------------------------------------------------------------------------------
其中这里我的我的TESTJ3中有29个不同的值,所以这里被驱动表TESTJ2被驱动了29次。
注意这里29是TESTJ3中OBJECT_ID不同的值。而不是行数
如果TESTJ4表中有192条记录但是不同的值只有3个会怎么样呢?
SQL> select count(*) from testj4;
COUNT(*)
----------
192
SQL> select distinct(object_id) from testj4;
OBJECT_ID
----------
63
58
60
执行计划将会如下:
SQL_ID chyutr057uqv8, child number 0
-------------------------------------
update /*+ gather_plan_statistics */ testj4 set object_id = (select
USER_ID from testj2 where
testj4.owner = testj2.username)
Plan hash value: 1040199981
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
| 1 | UPDATE | TESTJ4 | 1 | | 0 |00:00:00.01 | 20 |
| 2 | TABLE ACCESS FULL| TESTJ4 | 1 | 192 | 192 |00:00:00.01 | 7 |
|* 3 | TABLE ACCESS FULL| TESTJ2 | 3 | 1 | 3 |00:00:00.01 | 9 |
---------------------------------------------------------------------------------------
可以看到实际只是驱动了3次被驱动表而已,试想如果TESTJ2表巨大,同时TEST4中有着很多的不同值
那么效率可想而知。
当然类似NEST LOOP其被驱动表中如果加入索引也就是这里的TESTJ2表的username字段那么效率当然会
有很大的提高。
如下:
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID 40nt9qhbr7jau, child number 0
-------------------------------------
update /*+ gather_plan_statistics */ testj4 set object_id = (select USER_ID
from testj2 where testj4.owner = testj2.username)
Plan hash value: 877285848
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------
| 1 | UPDATE | TESTJ4 | 1 | | 0 |00:00:00.01 | 18 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
| 2 | TABLE ACCESS FULL | TESTJ4 | 1 | 192 | 192 |00:00:00.01 | 7 |
| 3 | TABLE ACCESS BY INDEX ROWID| TESTJ2 | 3 | 1 | 3 |00:00:00.01 | 5 |
|* 4 | INDEX RANGE SCAN | TEST_JI | 3 | 1 | 3 |00:00:00.01 | 2 |
--------------------------------------------------------------------------------------------------
总结一下,
类似
update testj3
set object_id = (select USER_ID
from testj2
where testj3.owner = testj2.username)
这样的UPDATE,需要考虑到UPDATE本身的时间,同时查询时间将会是需要考虑的另一个重点,
关于内层表被驱动的次数和驱动表中关联字段的DISTINCT值密切相关,同时建议内层表关联字段
***使用索引。
其实这样的列子还有很多。比如INSERT INTO SELECT 需要考虑的时间同样如此。