且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

[0424]11GR2 Automatic block media repair

更新时间:2022-09-09 14:09:25

[20140424]11GR2 Automatic block media repair.txt

摘要:PacktPub.Oracle.Data.Guard.11gR2.Administration.Beginners.Guide.Jun.2013.pdf P388

In Oracle 11gR2, when Actve Data Guard is being used with Real-Time Apply, if a physical corrupton is detected on the
primary database, Oracle will automatcally try to repair the corrupton using the non-corrupted block on the standby
database. This operaton is also valid in the opposite directon, which means standby database corrupton will be repaired
using the data block on the primary database. A notfcaton will be printed in the alert log about the automatc block
media repair operaton in the meantme; this repair operaton is completely transparent to database users.

We can also manually repair a corrupted data block with the RMAN command's RECOVER BLOCKcommand. By default, this
command will try to use an Active Data Guard physical standby database if it exists. In order to exclude the standby
database as a source to repair corruption, we must use the EXCLUDE STANDBYoption of this command.

--自己做一个另类的测试:
1.建立测试环境:
DGMGRL> show configuration
Configuration - study
  Protection Mode: MaxPerformance
  Databases:
    test   - Primary database
    testdg - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS

SYS@test> @ver
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

SCOTT@test> show parameter db_block_
NAME               TYPE     VALUE
------------------ -------- ---------
db_block_buffers   integer  0
db_block_checking  string   FALSE
db_block_checksum  string   TYPICAL
db_block_size      integer  8192
--这些参数是缺省设置.

SCOTT@test> select rowid,a.* from dept1 a where deptno=60;

ROWID                  DEPTNO DNAME          LOC
------------------ ---------- -------------- -------------
AAAcC1AAIAAAACHAAA         60 MMMM           DDDDDx

SCOTT@test> @lookup_rowid AAAcC1AAIAAAACHAAA
    OBJECT       FILE      BLOCK        ROW DBA                  TEXT
---------- ---------- ---------- ---------- -------------------- ----------------------------------------
    114869          8        135          0 8,135                alter system dump datafile 8 block 135 ;

SCOTT@test> select name from v$datafile where file#=8;
NAME
--------------------------------------------------
/u01/app/oracle11g/oradata/test/test01.dbf

2.关闭数据库,安全起见,做一个冷备份,修改该块信息:

SYS@test> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

$ cp /u01/app/oracle11g/oradata/test/test01.dbf /data/testtest/

-- 135*8192=1105920,使用bvi修改该块信息,将'MMMM'替换成'AAAA'.
$ bvi -b 1105920 -s 8192 /u01/app/oracle11g/oradata/test/test01.dbf

3.启动数据库:
--由于修改信息,导致检查和不一致,看看是否会报错.
$ dbv file=/u01/app/oracle11g/oradata/test/test01.dbf

DBVERIFY: Release 11.2.0.3.0 - Production on Thu Apr 24 09:14:33 2014
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - Verification starting : FILE = /u01/app/oracle11g/oradata/test/test01.dbf
DBVERIFY - Verification complete
Total Pages Examined         : 8192
Total Pages Processed (Data) : 1507
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 13
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 225
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 6447
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 3269838040 (0.3269838040)
--昏!这样修改dbv竟然检查不出来.我以前没有想到这种情况.

SCOTT@test> select rowid,a.* from dept1 a where deptno=60;
ROWID                  DEPTNO DNAME          LOC
------------------ ---------- -------------- -------------
AAAcC1AAIAAAACHAAA         60 AAAA           DDDDDx

--可以发现并没有报错.而实际上检查和应该不一致.如果这个时候修改这条记录的Loc信息:
SCOTT@test> update scott.dept1 set loc='DDDDDy' where deptno=60;
1 row updated.

SCOTT@test> commit ;
Commit complete.

--检查testdg库看看.
SYS@testdg> select rowid,a.* from scott.dept1 a where deptno=60;
ROWID                  DEPTNO DNAME          LOC
------------------ ---------- -------------- -------------
AAAcC1AAIAAAACHAAA         60 MMMM           DDDDDy

-- 很明显修改后两边的信息不一致.这种逻辑损坏,oracle并不监测在缺省的参数下db_block_*参数.
-- 使用rman检查逻辑数据库的验证检查.
backup validate check logical datafile 8;
RMAN> backup validate check logical datafile 8;

Starting backup at 2014-04-24 09:35:59
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=11 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00008 name=/u01/app/oracle11g/oradata/test/test01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
8    OK     0              6447         8192            3269839017
  File Name: /u01/app/oracle11g/oradata/test/test01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              1507
  Index      0              13
  Other      0              225

Finished backup at 2014-04-24 09:36:07

--可以发现错误已经不存在,因为我修改信息时已经将正确的检查和写入块中.
--将两边信息修改一致,再来测试看看.

SCOTT@test> update scott.dept1 set dname='NNNN' where deptno=60;
1 row updated.

SCOTT@test> commit ;
Commit complete.

4.再来重复测试:
--关闭数据库
-- 135*8192=1105920,使用bvi修改该块信息,将'NNNN'替换成'AAAA'.
$ bvi -b 1105920 -s 8192 /u01/app/oracle11g/oradata/test/test01.dbf

-- 启动到mount模式.
SYS@test> startup mount
ORACLE instance started.

Total System Global Area 1603411968 bytes
Fixed Size                  2228784 bytes
Variable Size            1006636496 bytes
Database Buffers          587202560 bytes
Redo Buffers                7344128 bytes
Database mounted.

RMAN> backup validate check logical datafile 8;

Starting backup at 2014-04-24 09:42:45
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=206 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00008 name=/u01/app/oracle11g/oradata/test/test01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
8    OK     0              6447         8192            3269839703
  File Name: /u01/app/oracle11g/oradata/test/test01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              1507
  Index      0              13
  Other      0              225

Finished backup at 2014-04-24 09:42:51
--昏,逻辑检查也不能发现问题.
SYS@test> alter system set db_block_checksum=full scope=memory;
System altered.

SYS@test> alter system set db_block_checking=full scope=memory;
System altered.

SYS@test> alter database open;
Database altered.

SCOTT@test> select rowid,a.* from dept1 a where deptno=60;
ROWID                  DEPTNO DNAME          LOC
------------------ ---------- -------------- -------------
AAAcC1AAIAAAACHAAA         60 AAAA           DDDDDy

--昏这样也不报错误吗?

5.现在修改两边一致:
SCOTT@test> select rowid,a.* from dept1 a where deptno=60;
ROWID                  DEPTNO DNAME          LOC
------------------ ---------- -------------- -------------
AAAcC1AAIAAAACHAAA         60 MMMM           DDDDDz

--关闭数据库.
--这次使用bvi写入一些垃圾数据到该数据块头部中.

$ dbv file=/u01/app/oracle11g/oradata/test/test01.dbf

DBVERIFY: Release 11.2.0.3.0 - Production on Thu Apr 24 10:17:24 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /u01/app/oracle11g/oradata/test/test01.dbf
Page 135 is influx - most likely media corrupt
Corrupt block relative dba: 0x02000087 (file 8, block 135)
Fractured block found during dbv:
Data in bad block:
type: 0 format: 0 rdba: 0x00000000
last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0xd1b90601
check value in block header: 0x0
block checksum disabled


--启动执行,等1小会,显示正常:
SCOTT@test> select rowid,a.* from dept1 a where deptno=60;
ROWID                  DEPTNO DNAME          LOC
------------------ ---------- -------------- -------------
AAAcC1AAIAAAACHAAA         60 MMMM           DDDDDz

--看看alert*.log文件:

Hex dump of (file 8, block 135) in trace file /u01/app/oracle11g/diag/rdbms/test/test/trace/test_ora_9395_127_0_0_1.trc
Corrupt block relative dba: 0x02000087 (file 8, block 135)
Fractured block found during multiblock buffer read
Data in bad block:
type: 0 format: 0 rdba: 0x00000000
last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0xd1b90601
check value in block header: 0x0
block checksum disabled
Reading datafile '/u01/app/oracle11g/oradata/test/test01.dbf' for corruption at rdba: 0x02000087 (file 8, block 135)
Reread (file 8, block 135) found same corrupt data (no logical check)
Starting background process ABMR
Thu Apr 24 10:18:35 2014
ABMR started with pid=36, OS id=9397
Automatic block media recovery service is active.
Automatic block media recovery requested for (file# 8, block# 135)
Thu Apr 24 10:18:35 2014
Automatic block media recovery successful for (file# 8, block# 135)
Automatic block media recovery successful for (file# 8, block# 135)
WARNING: AutoBMR fixed mismatched on-disk block 0 with in-mem rdba 2000087.


6.重复测试,使用rman:
--关闭数据库.
--这次使用bvi写入一些垃圾数据到该数据块头部中.
$ dbv file=/u01/app/oracle11g/oradata/test/test01.dbf
DBVERIFY: Release 11.2.0.3.0 - Production on Thu Apr 24 10:24:14 2014
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - Verification starting : FILE = /u01/app/oracle11g/oradata/test/test01.dbf
Page 135 is influx - most likely media corrupt
Corrupt block relative dba: 0x02000087 (file 8, block 135)
Fractured block found during dbv:
Data in bad block:
type: 0 format: 0 rdba: 0x00000000
last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0xd41e0601
check value in block header: 0x0
block checksum disabled

RMAN> list backup ;
using target database control file instead of recovery catalog
specification does not match any backup in the repository
--我没有做任何备份!

RMAN> recover datafile 8 block 135;
Starting recover at 2014-04-24 10:28:37
using channel ORA_DISK_1
finished standby search, restored 1 blocks
starting media recovery
media recovery complete, elapsed time: 00:00:01
Finished recover at 2014-04-24 10:28:41

--从提示也可以看出来,恢复是从standby取出数据.
SCOTT@test> select rowid,a.* from dept1 a where deptno=60;

ROWID                  DEPTNO DNAME          LOC
------------------ ---------- -------------- -------------
AAAcC1AAIAAAACHAAA         60 MMMM           DDDDDz

总结:
测试有点困惑,如果这样造成的逻辑损坏,oracle无法查看吗?不知那位知道我的测试问题在那里....