且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

oracle等待事件3——高速缓冲内enq锁

更新时间:2022-08-22 10:51:29

6、 enq:TC-contention

在手动执行检查点操作中,一部分需要获得TC锁(thread checkpointlock 或 tablespace checkpointlock )在获得TC锁过程中,若发生争用,则需要等待enq:TC-contention 事件。事实上获得TC锁的过程稍微复杂。

1) 服务器进程首先以X模式获得TC锁

2) 服务器进程将已获得的TC锁变更为SSX模式。同时,CKPT进程以SS模式获得该锁。CKPT获得锁后执行检查点操作。

3) 服务器欲重新以X模式获得TC锁,等待CKPT释放该锁,这时的等待事件就是enqueue:TC-contention

4) 检查点工作结束后,CKPT进程将会释放TC锁,服务器进程就会获得TC锁,因此得知检查点工作已经结束。


Enq:TC-contention 等待即便在没有多个进程引起争用的情况下,也可以发生,在这一点上与其他锁争用引起的等待现象不同。需要理解的是在等待现象中,存在只有争用才能引发的等待现象,但是也存在不发生争用,也会单纯为了等待工作结束而等待的情况。。

发生检查点的情况虽然很多,但不是所有的情况都会发生TC锁引起的等待,之后再进程由服务器进程引发的检查点同步过程中发生。

enq:TC-contention 等待发生的代表案例如下:并行查询   和 表空间热备

1:、并行查询(parallel query)

pq发生检查点的原因是slave session引起的direct path read。这就是所谓的“直接路径读”,它不经过高速缓冲区直接读取数据文件。oracle在如下三种情况下使用direct path read(也叫physical read direct) 方式的读取。

(1)内存区域上不能完成排序工作时,会在临时段的区域里存储和读取的过程中,发生direct path write ,direct path read 。这时的等待事件可以通过direct path read temp、direct path write temp 观察。

(2)slave session(从属会话)为了扫描直接读取的数据文件时,使用direct path read 。这时等待事件通过direct path read 事件观察。

(3)若判断是因为I/O系统的性能下降,导致不能将以足够快的速度读取,oracle为了临时方便会使用direct path read。


slave session执行direct path read 对象时数据文件,从数据文件上直接读取数据时,因为不经过SGA,所以可能发生当期SGA上的块和数据文件上的块之间版本不一致的现象,为了防止这些现象,oracle对数据文件执行direct path read 之前,应该执行检查点。coordinate session在驱动slave session之前,对于执行direct path read现象,请求段级别的检查点,检查点发生之前一直处于enq:TC-contention等待事件状态。coordinate session上可以发现enq:TC-eontention等待,slave session上则可以发现direct path read 等待。

2、表空间热备份(tablespace hot backup)

执行alter tablespace 。。begin backup后,将属于此表空间的所有高速缓冲区的脏数据记录到磁盘上,这个过程经历enq:TC-contention 等待。



7、enq:CI-contention 和 enq:RO-contention


“Cross Instance call Enqueue”是一种在一个或多个instance实例间调用后台进程行为时用到的队列锁,具体调用的后台进程行为包括检查点checkpoint、日志切换logfile switch、shutdown实例、载入数据文件头等等。需要注意的是这种Enqueue Lock并不仅仅在RAC中使用,即便是单节点也会用到。CI锁的数量取决于并行执行Cross Instance Call调用的进程的总数。

SQL> col ksqsttyp for a20
SQL> col ksqstrsn for a20
SQL> col ksqstexpl for a80
SQL> set linesize 200 pagesize 2000;
SQL> select ksqsttyp,ksqstrsn,ksqstexpl from x$ksqst where ksqsttyp='CI';

KSQSTTYP             KSQSTRSN             KSQSTEXPL
-------------------- -------------------- --------------------------------------------------------------------------------
CI                   contention           Coordinates cross-instance function invocations

SQL> show parameter cluster_database

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_database                     boolean     FALSE
cluster_database_instances           integer     1


SQL> select * from v$enqueue_stat where eq_type='CI';

   INST_ID EQ TOTAL_REQ# TOTAL_WAIT#  SUCC_REQ# FAILED_REQ# CUM_WAIT_TIME
---------- -- ---------- ----------- ---------- ----------- -------------
         1 CI        595           0        595           0             0

当系统中出现有大量这种跨实例后台进程调用时,将出现CI队列锁的争用。

假设在一个RAC场景中,同时有大量的回话开始对不同的数据表执行TRUNCATE截断操作,TRUNCATE的一个前提是在所有实例上(因为对象表的dirty buffer可能分布在多个实例上)发生对象级别的检查点(object level checkpoint),检查点发生时CKPT进程会通知DBWR写出指定对象表相关的脏块,DBWR需要扫描Buffer Cache以找出脏块,而如果Buffer Cache很大那么扫描将花费大量的时间,而在此过程中前台进程将一直排他地持有着本地的CI队列锁,这就将造成CI锁的严重争用。

为了减少CI队列锁地争用,我们第一步所要做的是找出实际的Cross Instance call跨实例调用的类型。这里要另外提一下的是在10g以前不管是v$session_wait或statspack中都不会将enqueue锁等待事件的具体enqueue lock类型写明,一般需要我们从p1/p2/p3列中找出enqueue的具体身份,例如”WAIT #1: nam=’enqueue’ ela= 910796 p1=1128857606   p2=1   p3=4″,这里的p1为1128857606也就是16进制的43490006,高位的’4349′转换为ascii码也就是’CI’,而这里的p2/p3对应为V$lock中的ID1/ID2,ID1=1代表了”Reuse (checkpoint and invalidate) block range”,ID2=4代表了”Mounted excl, use to allocate mechanism”。

具体ID1/ID2代表的含义在不同版本中有所变化,可以参考下表:

Id1, Id2 Combination:

   Oracle 10gR1

27 TO 29  *Same as 9i R2

                30    process waiters after row cache requeue
                31    Active Change Directory extent relocation
                32    block change tracking state change
                33    kgl mulitversion obsolete
                34    set previous resetlogs data
                35    set recovery destination pointer
                36    fast object reuse request
                37    test ksbcic()
                38    ASM diskgroup discovery wait
                39    ASM diskgroup release
                40    ASM push DB updates
                41    ASM add ACD chunk
                42    ASM map resize message
                43    ASM map lock message
                44    ASM map unlock message (phase 1)
                45    ASM map unlock message (phase 2)
                46    ASM generate add disk redo marker
                47    ASM check of PST validity
                48    ASM offline disk CIC
                49    Logical Standby Sync Point SCN
                50    update SQL Tuning Base existence bitvector
                51    PQ induced Checkpointing
                52    ASM F1X0 relocation
                53    Scheduler autostart
                54    KZS increment grant/revoke counter
                55    ASM disk operation message
                56    ASM I/O error emulation
                57    DB Supp log cursor invalidation
                58    Cache global range invalidation
                59    Cache global object invalidation
                60    ASM Pre-Existing Extent Lock wait
                61    Perform a ksk action through DBWR
                62    ASM diskgroup refresh wait 

   Oracle 10gR2

       30 to 62 *Same as 10gR1

               63    KCBO object checkpoint
               64    KCBO object pq checkpoint
               65    global health check event
               66    Oracle Label Security refresh
               67    thread internal enable
               68    cross-instance registration
               69    KGL purge unused subheaps
               70    clear pin instance flag
               71    Rolling operations CIC

   Oracle 9iR2

                 Id1   Meaning
                 ~~~   ~~~~~~

             25 TO 26  *Same as 9i R1

                 27    set Database Force Logging mode
                 28    invalidate cached file address translations
                 29    Cursor Unauthorize Mode
                 30    snapshot too old diagnosis
                 31    process waiters after row cache requeue

   Oracle 9iR1

                Id1    Meaning
                ~~~    ~~~~~~~~

             18 TO 24  *Same as Oracle 8i

                25     Update Dscn Tracking (ktcndt)
                26     Purge dictionary Object number Cache

   Oracle 8i
                Id1     Meaning
                ~~~     ~~~~~~~

             0 TO 17    *Same as Oracle 8(please see "Oracle8*" for the Meaning)

                18      Object reuse request
                19      Rolling release checks
                20      Propagate begin backup scn for a file
                21      Refresh top plan (for db scheduler)
                22      Clear checkpoint progress record
                23      Drop temp file
                24      Quiesce database Restricted 

               Id2      Meaning
               ~~~      ~~~~~~~

               0x01     Used to pass in parameters
               0x02     Used to invoke the function in backgroud process
               0x03     Used to indicate the foreground has not returned
               0x04     Mounted excl, use to allocate mechanism
               0x05     Used to queue up interested clients 

    Oracle 8*

		Id1 	Meaning
		~~~	~~~~~~~
                0       Checkpoint block range
                1       Reuse (checkpoint and invalidate) block range
                2       LGWR Checkpointing and Hot Backup
                3       DBWR syncronization of SGA with control file
                4       Log file add/drop/rename notification
                5       Write buffer for CR read
                6       Test call
                7       Invalidate KCK cache in all instances
                8       Alter rollback segment optimal
                9       Signal Query Servers/coordinator
                10      Create Remote parallel query Server
                11      Set Global Partitions
                12      Stop Disk Writes
                13      Drop Sort Segments
                14      Release unused space from Sort Segments
                15      Instance Recovery for Parallel operation Group
                16      Validate parallel slave Lock Value
                17      Check transaction state objects
                18      Flush blocks in object
                19      Rolling release checks
                20      Propagate begin backup scn for a file
                21      Clear checkpoint progress record

     Oracle 7

		Id1 	Meaning
		~~~	~~~~~~~
		0 	Flush buffers for reuse as new class
		1 	LGWR checkpointing and Hot Backup
		2 	DBWR synchronization of SGA with control file
		3 	Log file add/drop/rename notification
		4 	Write buffer for CR read
		5 	Test Call
		6 	Invalidate KCK cache in all instances
		7 	Alter rollback segment optimal
		8 	Signal Query Servers/coordinator
		9 	Create Remote Parallel Query Server
		10 	Set Global Partitions
		11 	Stop Disk Writes
		12 	Drop Sort Segments
		13 	Release unused space from Sort Segments
		14 	Instance Recovery for Parallel operation Group
		15 	Validate parallel slave Lock Value
		16 	Check Transaction State Objects

		Id2 	Meaning
		~~~	~~~~~~~
		1 	Pass in Parameters
		2 	Invoke the call in background process
		3 	Foreground has not returned yet
		4 	Used to allocate the CI call
		5 	Used to queue up interested clients


enq: RO - fast object reuse 等待事件

查了一下这个等待,出现这个等待比较高的情况一般都有异常:
1.truncate表或者分区表时
2.收集统计信息采用degree>1时
这个event表示在等待DBWR to clean cache.
出现异常的时候症状:the CKPT background process is the one holding the needed RO enqueue although it is actually doing nothing.

Bug:7385253.

这个wait event表示在等待DBWR to clean cache.

如果要优化这个问题,需要综合考虑,比如减少cache size,增加dbwr process或减少MTTR等等。

Is it still locked? 
For some reason the truncate is still waiting for CKPT process. When you truncate or drop a table, CKPT does a range flush of the db_cache_size, which seems to be completed according to your alert_log. 
That was an issue in 9i-10g. This looks like bug 4201369 which is supposed to be fixed in 10.1.0.5. I will suggest you open a tar on this! They will make you do a hanganalysis which should clarify the issue.


该等待事件多与bug 相关

 

2.1 Bug 1Bug 7385253

 

Bug 7385253 - Slow Truncate / DBWR useshigh CPU / CKPT blocks on RO enqueue [ID 7385253.8]

 

Affects:

Product (Component)

Oracle Server (Rdbms)

Range of versions believed to be affected

Versions >= 10 but BELOW 11.2

Versions confirmed as being affected

Platforms affected

Generic (all / most platforms affected)

Fixed:

This issue is fixed in

 

 

该Bug的3个表现:

(1)    Hang(Involving Shared Resource)

(2)    PerformanceAffected (General)

(3)    Waits for "enq:RO - fast object reuse"

 

 

DBWR may use alot of CPU and seem to spin in or around kcbo_write_qdue to large number offree buffers on the object reuse queue or checkpoint queue.

 

In some casesthe CKPT holds the RO enqueue for very long blocking other operations  with waitevent "enq: RO - fast objectreuse".

 

Operations so farreported being affected are :

- Apply Processes in StandBy databases

- Gather stats

- Truncates

- drop/shrink/alter tablespace

 

Note: This fix was previously incorrectlylisted as not affecting 11g.

     The bug itself is present in 11g but it is unlikely to show anysignificant symptom due to other 11g changes meaning that free buffers are nolonger kept on the object queue.

 

对与该Bug 的解决方法:

setting _db_fast_obj_truncate=FALSE <--did not fix the issue
enabling asyn i/o <-- customer refused to implement to avoid corruptionsrisk
applying 7287289 <-- did not fix the issue

 

2.2 文档二

 

'enq: RO - fastobject reuse' contention when gathering schema/table statistics in parallel [ID762085.1]

 

 

Symptoms

(1)Database has been recently upgradedfrom 10.2.0.1 to 10.2.0.4.
(2)There is 'enq: RO - fastobject reuse' contention when gathering schema/table statistics in parallelusing DBMS_STATS package (with DEGREE>1).

 

其也是因为Bug 7385253导致这个问题。

 

解决方法:

1) Flushing the buffer cache.
OR
2) Setting "_db_fast_obj_truncate" =FALSE. This reverts back to the9i way of invalidating buffers in the buffer cache. 

Kindly note thatboth workarounds could have an impact on the database performance. Instead, itis recommended applying the corresponding patch.

--2种解决方法对db 性能都有很大影响,建议应用合适的patch

 

2.3 文档三

Bug8544896 - Waits for "enq: RO - fast object reuse" with high DBWR CPU[ID 8544896.8]

 

Affects:

Product (Component)

Oracle Server (Rdbms)

Range of versions believed to be affected

Versions >= 10.2.0.4 but BELOW 10.2.0.5

Versions confirmed as being affected

Platforms affected

Generic (all / most platforms affected)


 It is believed to be a regression in default behaviour thus:
   Regression introduced in 10.2.0.4

Fixed:

This issue is fixed in

 

 

This problem is introduced in 10.2.0.4.

 

Sessions can wait on "enq: RO - fastobject reuse" while DBWR consumes lots of CPU when performing truncatetype operations.

 

 

Workaround:

(1)Flush the buffer cache beforetruncating

 OR

(2) set _db_fast_obj_truncate = FALSE.

 

 

我这里出现这2个等待事件都与Truncate 操作有关。