start_udev至RAC网络服务不可用

环境:11.2.0.4 RAC下,asm设备权限是利用udev进行管理的,在存储添加磁盘卷后,需要生效映射上来的权限,所以做了如下操作
1.fdisk -l|grep “/dev/sd*”|wc -l <<<<<发现存储映射上来的设备,在rac两个节点能看见
2.multipath -ll <<<<<<<<,发现存储映射上来的设备,通过linux多路径管理软件multipath工具做聚合后的设备看不见
3.mulitpath -v2 <<<<<<<<生效多路径软件信息
4.检查权限信息

[oracle@hpaydb1:/etc/udev/rules.d]$ls -lat /dev/dm*
brw-rw---- 1 grid asmadmin 252, 13 Feb 20 10:08 /dev/dm-13
brw-rw---- 1 grid asmadmin 252, 18 Feb 20 10:08 /dev/dm-18
brw-rw---- 1 grid asmadmin 252, 24 Feb 20 10:08 /dev/dm-24
brw-rw---- 1 grid asmadmin 252, 19 Feb 20 10:08 /dev/dm-19
brw-rw---- 1 grid asmadmin 252, 21 Feb 20 10:08 /dev/dm-21
brw-rw---- 1 grid asmadmin 252, 7 Feb 20 10:08 /dev/dm-7
brw-rw---- 1 grid asmadmin 252, 23 Feb 20 10:08 /dev/dm-23
brw-rw---- 1 grid asmadmin 252, 6 Feb 20 10:08 /dev/dm-6
brw-rw---- 1 grid asmadmin 252, 10 Feb 20 10:07 /dev/dm-10
brw-rw---- 1 grid asmadmin 252, 14 Feb 20 10:07 /dev/dm-14
brw-rw---- 1 grid asmadmin 252, 2 Feb 20 10:07 /dev/dm-2
brw-rw---- 1 grid asmadmin 252, 22 Feb 20 10:07 /dev/dm-22
brw-rw---- 1 grid asmadmin 252, 11 Feb 20 10:07 /dev/dm-11
brw-rw---- 1 grid asmadmin 252, 12 Feb 20 10:07 /dev/dm-12
brw-rw---- 1 grid asmadmin 252, 16 Feb 20 10:07 /dev/dm-16
brw-rw---- 1 grid asmadmin 252, 8 Feb 20 10:07 /dev/dm-8
brw-rw---- 1 grid asmadmin 252, 28 Feb 20 10:07 /dev/dm-28
brw-rw---- 1 grid asmadmin 252, 20 Feb 20 10:07 /dev/dm-20
brw-rw---- 1 grid asmadmin 252, 9 Feb 20 10:07 /dev/dm-9
brw-rw---- 1 grid asmadmin 252, 15 Feb 20 10:07 /dev/dm-15
brw-rw---- 1 grid asmadmin 252, 27 Feb 20 10:07 /dev/dm-27
brw-rw---- 1 grid asmadmin 252, 4 Feb 20 10:07 /dev/dm-4
brw-rw---- 1 grid asmadmin 252, 3 Feb 20 09:49 /dev/dm-3
brw-rw---- 1 grid asmadmin 252, 17 Feb 20 09:49 /dev/dm-17
brw-rw---- 1 grid asmadmin 252, 5 Feb 20 09:49 /dev/dm-5
brw-rw---- 1 root disk 252, 29 Feb 19 17:56 /dev/dm-29 <<<<<<<<<<<<<<<<<<<<
brw-rw---- 1 root disk 252, 26 Feb 19 15:03 /dev/dm-26
brw-rw---- 1 root disk 252, 25 Feb 4 17:43 /dev/dm-25
brw-rw---- 1 root disk 252, 0 Feb 4 17:43 /dev/dm-0
brw-rw---- 1 root disk 252, 1 Feb 4 17:43 /dev/dm-1

5.udevadm control –reload-rules
start_udev <<<<<<<<生效上来设备权限 <<<<<<<<<<<<<<<<<<<<<<<<此步操作导致oracle rac公共网络接口移除服务飘移

现象:
During start_udev, udev has deleted the public network interface and this caused the listener to crash, and clusterware moved all services, scan listeners and the VIP on node 2 to node 1.

原因:
Running a “start_udev” will cause the network hotplug action to be applied to every interface configuration file on the host which does not have HOTPLUG=no set.
This will activate any interface which does not have HOTPLUG=no set, regardless of the ONBOOT setting.

临时解决方案:
1.重启oracle网络
2.飘移服务器至节点2上

解决方案1:(oracle mos推荐在ol7版本之后,民间也有ol6.2上做过,没有问题待测试)

To add or load Udev rules using the below commands.
/sbin/udevadm control --reload-rules
/sbin/udevadm trigger --type=devices --action=change

解决方案2:
As per RHEL you are advised to set HOTPLUG=”no” for the network configuration scripts.
If you would like to avoid this then please ensure that each ifcfg file has HOTPLUG=no set.
Add HOTPLUG=”no” to the ifcfg-eth0 (public), ifcfg-eth1 (private) and ifcfg-eth2 (backup) network config files in /etc/sysconfig/network-scripts directory.

11.2.0.4 RAC OCR disk异常被dismount原因分析

11.2.0.4 RAC OCR disk异常被dismount

集群alert日志报出CRSD进程异常因为ocr盘不可用,然后说明物理存储异常报错

2019-02-16 03:51:38.327: 
[crsd(44425)]CRS-1006:The OCR location +OCRVOTE is inaccessible. Details in /u01/app/11.2.0/grid/log/hpaypr2/crsd/crsd.log. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2019-02-16 03:51:45.622: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(45127)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:5:14} in /u01/app/11.2.0/grid/log/hpaypr2/agent/crsd/oraagent_grid/oraagent_grid.log.
2019-02-16 03:51:45.622: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(45115)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:3:4420} in /u01/app/11.2.0/grid/log/hpaypr2/agent/crsd/orarootagent_root/orarootagent_root.log.
2019-02-16 03:51:45.622: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(20887)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:29:12211} in /u01/app/11.2.0/grid/log/hpaypr2/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2019-02-16 03:51:45.622: 
[/u01/app/11.2.0/grid/bin/scriptagent.bin(77182)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:27:28} in /u01/app/11.2.0/grid/log/hpaypr2/agent/crsd/scriptagent_grid/scriptagent_grid.log.
2019-02-16 03:51:45.624: 
[ohasd(42993)]CRS-2765:Resource 'ora.crsd' has failed on server 'hpaypr2'.
2019-02-16 03:51:46.916: 
[crsd(142730)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/hpaypr2/crsd/crsd.log.
2019-02-16 03:51:46.919: 
[crsd(142730)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/hpaypr2/crsd/crsd.log.

grid用户 oraagent_grid.log里显示了 agent接收了stop OCRVOTE资源,并且成功停止资源(也可以通过crsd日志里反应出来)

2019-02-16 01:07:22.692: [ AGFW][2783635200]{2:56937:2} Agent received the message: AGENT_HB[Engine] ID 12293:14940775
2019-02-16 01:07:52.694: [ AGFW][2783635200]{2:56937:2} Agent received the message: AGENT_HB[Engine] ID 12293:14940786
2019-02-16 01:08:19.827: [ AGFW][2783635200]{2:56937:17198} Agent received the message: RESOURCE_STOP[ora.OCRVOTE.dg hpaypr2 1] ID 4099:14940799
2019-02-16 01:08:19.827: [ AGFW][2783635200]{2:56937:17198} Preparing STOP command for: ora.OCRVOTE.dg hpaypr2 1
2019-02-16 01:08:19.827: [ AGFW][2783635200]{2:56937:17198} ora.OCRVOTE.dg hpaypr2 1 state changed from: ONLINE to: STOPPING <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2019-02-16 01:08:19.827: [ora.OCRVOTE.dg][2785736448]{2:56937:17198} [stop] (:CLSN00108:) clsn_agent::stop {
2019-02-16 01:08:19.827: [ora.OCRVOTE.dg][2785736448]{2:56937:17198} [stop] DgpAgent::stop: enter { 
2019-02-16 01:08:19.827: [ora.OCRVOTE.dg][2785736448]{2:56937:17198} [stop] getResAttrib: attrib name USR_ORA_OPI value true len 4
2019-02-16 01:08:19.827: [ora.OCRVOTE.dg][2785736448]{2:56937:17198} [stop] Agent::flagUsrOraOpiIsSet(true) reason not dependency
2019-02-16 01:08:19.827: [ora.OCRVOTE.dg][2785736448]{2:56937:17198} [stop] DgpAgent::stop: tha exit }
2019-02-16 01:08:19.827: [ora.OCRVOTE.dg][2785736448]{2:56937:17198} [stop] DgpAgent::stopSingle status:2 }
2019-02-16 01:08:19.827: [ora.OCRVOTE.dg][2785736448]{2:56937:17198} [stop] (:CLSN00108:) clsn_agent::stop }
2019-02-16 01:08:19.827: [ AGFW][2785736448]{2:56937:17198} Command: stop for resource: ora.OCRVOTE.dg hpaypr2 1 completed with status: SUCCESS <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2019-02-16 01:08:19.827: [ AGFW][2783635200]{2:56937:17198} Agent sending reply for: RESOURCE_STOP[ora.OCRVOTE.dg hpaypr2 1] ID 4099:14940799
2019-02-16 01:08:19.827: [ora.OCRVOTE.dg][2768303872]{2:56937:17198} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-02-16 01:08:19.843: [ora.OCRVOTE.dg][2768303872]{2:56937:17198} [check] DgpAgent::runCheck: asm stat asmRet 0
2019-02-16 01:08:19.843: [ora.OCRVOTE.dg][2768303872]{2:56937:17198} [check] DgpAgent::getConnxn connected
2019-02-16 01:08:19.846: [ora.OCRVOTE.dg][2768303872]{2:56937:17198} [check] DgpAgent::queryDgStatus excp no data found
2019-02-16 01:08:19.846: [ora.OCRVOTE.dg][2768303872]{2:56937:17198} [check] DgpAgent::queryDgStatus no data found in v$asm_diskgroup_stat
2019-02-16 01:08:19.846: [ora.OCRVOTE.dg][2768303872]{2:56937:17198} [check] DgpAgent::queryDgStatus dgName OCRVOTE ret 1

ocssd日志里发现超时现象,并且另个节点也有相关情况

2019-02-16 01:07:48.092: [ CSSD][315041536]clssscMonitorThreads clssnmvWorkerThread not scheduled for 16090 msecs
2019-02-16 01:07:48.092: [ CSSD][315041536]clssscMonitorThreads clssnmvWorkerThread not scheduled for 16010 msecs
2019-02-16 01:07:48.092: [ CSSD][315041536]clssscMonitorThreads clssnmvWorkerThread not scheduled for 16060 msecs
2019-02-16 01:07:49.092: [ CSSD][315041536]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 16070 msecs
2019-02-16 01:07:49.777: [ CSSD][820770560]clssgmUnregisterShared: Cross group member share client 2 (0x7f6e28636a10), grp UFG_+ASM2, member 1
2019-02-16 01:07:49.777: [ CSSD][820770560]clssgmTermShare: (0x7f6e28636b80) local grock UFG_+ASM2 member 1 type 2
2019-02-16 01:07:49.777: [ CSSD][820770560]clssgmUnreferenceMember: local grock UFG_+ASM2 member 1 refcount is 3
2019-02-16 01:07:49.777: [ CSSD][820770560]clssgmUnregisterShared: Cross group member share client 2 (0x7f6e28636a10), grp DBHPAYPR, member 1
2019-02-16 01:07:49.777: [ CSSD][820770560]clssgmTermShare: (0x7f6e282c1cd0) global grock DBHPAYPR member 1 type 2
2019-02-16 01:07:49.777: [ CSSD][820770560]clssgmUnreferenceMember: global grock DBHPAYPR member 1 refcount is 33
2019-02-16 01:07:49.778: [ CSSD][820770560]clssgmExitGrock: client 2 (0x7f6e28636a10), grock DG_LOCAL_ARCH, member 0
2019-02-16 01:07:49.778: [ CSSD][820770560]clssgmUnregisterPrimary: Unregistering member 0 (0x7f6e2831d660) in local grock DG_LOCAL_ARCH
2019-02-16 01:07:49.778: [ CSSD][820770560]clssgmUnreferenceMember: local grock DG_LOCAL_ARCH member 0 refcount is 10
2019-02-16 01:07:52.180: [ CSSD][318195456]clssnmSendingThread: sending status msg to all nodes
2019-02-16 01:07:52.180: [ CSSD][318195456]clssnmSendingThread: sent 5 status msgs to all nodes
2019-02-16 01:07:56.613: [ CSSD][820770560]clssscMonitorThreads clssnmvWorkerThread not scheduled for 24580 msecs
2019-02-16 01:07:57.181: [ CSSD][318195456]clssnmSendingThread: sending status msg to all nodes
2019-02-16 01:07:57.181: [ CSSD][318195456]clssnmSendingThread: sent 5 status msgs to all nodes
2019-02-16 01:08:02.181: [ CSSD][318195456]clssnmSendingThread: sending status msg to all nodes
2019-02-16 01:08:02.181: [ CSSD][318195456]clssnmSendingThread: sent 5 status msgs to all nodes
2019-02-16 01:08:07.182: [ CSSD][318195456]clssnmSendingThread: sending status msg to all nodes
2019-02-16 01:08:07.182: [ CSSD][318195456]clssnmSendingThread: sent 5 status msgs to all nodes
2019-02-16 01:08:12.183: [ CSSD][318195456]clssnmSendingThread: sending status msg to all nodes
2019-02-16 01:08:12.183: [ CSSD][318195456]clssnmSendingThread: sent 5 status msgs to all nodes
2019-02-16 01:08:17.184: [ CSSD][318195456]clssnmSendingThread: sending status msg to all nodes
2019-02-16 01:08:17.184: [ CSSD][318195456]clssnmSendingThread: sent 5 status msgs to all nodes
2019-02-16 01:08:19.811: [ CSSD][820770560]clssgmUnregisterShared: Cross group member share client 5 (0x7f6e281ac450), grp DB+ASM, member 1
2019-02-16 01:08:19.811: [ CSSD][820770560]clssgmTermShare: (0x7f6e281c1f10) global grock DB+ASM member 1 type 2
2019-02-16 01:08:19.811: [ CSSD][820770560]clssgmUnreferenceMember: global grock DB+ASM member 1 refcount is 7
2019-02-16 01:08:19.811: [ CSSD][820770560]clssgmExitGrock: client 5 (0x7f6e281ac450), grock DG_OCRVOTE, member 1
2019-02-16 01:08:19.811: [ CSSD][820770560]clssgmUnregisterPrimary: Unregistering member 1 (0x7f6e281a6ff0) in global grock DG_OCRVOTE
2019-02-16 01:08:19.812: [ CSSD][820770560]clssgmAllocateRPCIndex: allocated rpc 881 (0x7f6e31059238)
2019-02-16 01:08:19.812: [ CSSD][820770560]clssgmRPC: rpc 0x7f6e31059238 (RPC#881) tag(3710034) sent to node 2
2019-02-16 01:08:19.812: [ CSSD][820770560]clssgmUnreferenceMember: global grock DG_OCRVOTE member 1 refcount is 2
2019-02-16 01:08:19.812: [ CSSD][321349376]clssgmHandleMemberChange: [s(2) d(2)]
2019-02-16 01:08:19.812: [ CSSD][321349376]clssgmRPCDone: rpc 0x7f6e31059238 (RPC#881) state 6, flags 0x100
2019-02-16 01:08:19.812: [ CSSD][321349376]clssgmChangeMemCmpl: rpc 0x7f6e31059238, ret 0, client 0x7f6e281ac450 member 0x7f6e281a6ff0
2019-02-16 01:08:19.812: [ CSSD][321349376]clssgmFreeRPCIndex: freeing rpc 881
2019-02-16 01:08:19.812: [ CSSD][321349376]clssgmAllocateRPCIndex: allocated rpc 884 (0x7f6e31059430)
2019-02-16 01:08:19.812: [ CSSD][820770560]clssgmDiscEndpcl: gipcDestroy 0x21d0
2019-02-16 01:08:19.812: [ CSSD][321349376]clssgmRPCBroadcast: rpc(0x3740034), status(1), sendcount(1), filtered by specific properties: 
2019-02-16 01:08:19.813: [ CSSD][321349376]clssgmRPCDone: rpc 0x7f6e31059430 (RPC#884) state 4, flags 0x402
2019-02-16 01:08:19.813: [ CSSD][321349376]clssgmBroadcastGrockRcfgCmpl: RPC(0x3740034) of grock(DG_OCRVOTE) received all acks, grock update sequence(9)

+asm日志信息可以看出因为write io 到 PST disk 超时15秒 in group 4

Sat Feb 16 01:07:48 2019
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 4.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 4.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 4.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 4.
Sat Feb 16 01:07:48 2019
NOTE: process _b000_+asm2 (145243) initiating offline of disk 1.3473295321 (OCRVOTE2) with mask 0x7e in group 4
NOTE: process _b000_+asm2 (145243) initiating offline of disk 2.3473295322 (OCRVOTE3) with mask 0x7e in group 4
NOTE: checking PST: grp = 4
GMON checking disk modes for group 4 at 44 for pid 44, osid 145243
ERROR: no read quorum in group: required 2, found 1 disks
NOTE: checking PST for grp 4 done.
NOTE: initiating PST update: grp = 4, dsk = 1/0xcf0647d9, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 4, dsk = 2/0xcf0647da, mask = 0x6a, op = clear
GMON updating disk modes for group 4 at 45 for pid 44, osid 145243
ERROR: no read quorum in group: required 2, found 1 disks
Sat Feb 16 01:07:49 2019
NOTE: cache dismounting (not clean) group 4/0xA966B74E (OCRVOTE)
NOTE: messaging CKPT to quiesce pins Unix process pid: 145290, image: oracle@hpaypr2.zr.hpay (B001)
Sat Feb 16 01:07:49 2019
NOTE: halting all I/Os to diskgroup 4 (OCRVOTE) <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Sat Feb 16 01:07:49 2019
NOTE: LGWR doing non-clean dismount of group 4 (OCRVOTE)
NOTE: LGWR sync ABA=23.21 last written ABA 23.21
WARNING: Offline for disk OCRVOTE2 in mode 0x7f failed.
WARNING: Offline for disk OCRVOTE3 in mode 0x7f failed.
Sat Feb 16 01:07:49 2019
kjbdomdet send to inst 1
detach from dom 4, sending detach message to inst 1
Sat Feb 16 01:07:49 2019
List of instances:
1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 12)
Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 4 invalid = TRUE
130 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
Sat Feb 16 01:07:49 2019
WARNING: dirty detached from domain 4
NOTE: cache dismounted group 4/0xA966B74E (OCRVOTE)
SQL> alter diskgroup OCRVOTE dismount force /* ASM SERVER:2842081102 */ <<<<<<<<<<<<<<<<<<<<<<<<<<,
Sat Feb 16 01:07:49 2019
NOTE: cache deleting context for group OCRVOTE 4/0xa966b74e
GMON dismounting group 4 at 46 for pid 45, osid 145290
NOTE: Disk OCRVOTE1 in mode 0x7f marked for de-assignment
NOTE: Disk OCRVOTE2 in mode 0x7f marked for de-assignment
NOTE: Disk OCRVOTE3 in mode 0x7f marked for de-assignment
NOTE:Waiting for all pending writes to complete before de-registering: grpnum 4
Sat Feb 16 01:07:51 2019
ASM Health Checker found 1 new failures
Sat Feb 16 01:08:19 2019
SUCCESS: diskgroup OCRVOTE was dismounted <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
SUCCESS: alter diskgroup OCRVOTE dismount force /* ASM SERVER:2842081102 */
SUCCESS: ASM-initiated MANDATORY DISMOUNT of group OCRVOTE

存储压力图

发现每天晚上特别是周六晚上的,通过查看存储及awr报告分新在01:02:00开始至01:30:00有大量的耗时很长统计报表sql至IO很高,至OCR盘在做write io pts验证时超后,发生悲剧

11G single primary switchover RAC standby

主库准备工作

SQL> select inst_id,database_role,OPEN_MODE from  gv$database;
   INST_ID DATABASE_ROLE    OPEN_MODE
---------- ---------------- --------------------
         1 PRIMARY        READ WRITE

SQL>  CREATE RESTORE POINT SWITCHOVER_START_GRP GUARANTEE FLASHBACK DATABASE;
Restore point created.

备库准备工作

SQL> select inst_id,database_role,OPEN_MODE from  gv$database;
   INST_ID DATABASE_ROLE    OPEN_MODE
---------- ---------------- --------------------
         1 PHYSICAL STANDBY READ ONLY WITH APPLY
         2 PHYSICAL STANDBY READ ONLY WITH APPLY

[oracle@hpadb2-prod admin]$  srvctl stop instance -d hpadb -i hpadb2
SQL> select inst_id,database_role,OPEN_MODE from  gv$database;
   INST_ID DATABASE_ROLE    OPEN_MODE
---------- ---------------- --------------------
         1 PHYSICAL STANDBY READ ONLY WITH APPLY

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Database altered.

SQL> CREATE RESTORE POINT SWITCHOVER_START_GRP GUARANTEE FLASHBACK DATABASE;
Restore point created.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Database altered.

主库切换工作

SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;
SWITCHOVER_STATUS
--------------------
SESSIONS ACTIVE

SQL>  ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN;
Database altered.

备库切换

SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;
SWITCHOVER_STATUS
--------------------
TO PRIMARY

SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;
Database altered.

SQL> ALTER DATABASE OPEN;
Database altered.

老主库处理

SQL> shutdown immediate;
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory

[oracle@hpaydg:/home/oracle]$ps -ef |grep pmon
oracle   45444 57941  0 06:59 pts/0    00:00:00 grep pmon

SQL> startup mount
ORACLE instance started.
Total System Global Area 4.0486E+10 bytes
Fixed Size              2261968 bytes
Variable Size              5368712240 bytes
Database Buffers    3.5031E+10 bytes
Redo Buffers               84606976 bytes
Database mounted.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Database altered.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE cancel;
Database altered.

SQL> alter database open;
Database altered.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Database altered.

新主库处理

SQL> DROP RESTORE POINT SWITCHOVER_START_GRP;
Restore point dropped.

新备库处理

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Database altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup mount;
ORACLE instance started.
Total System Global Area 4.0486E+10 bytes
Fixed Size              2261968 bytes
Variable Size              5771365424 bytes
Database Buffers    3.4628E+10 bytes
Redo Buffers                84606976 bytes
Database mounted.

SQL> DROP RESTORE POINT SWITCHOVER_START_GRP;
Restore point dropped.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Database altered.

SQL>  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Database altered.

SQL> alter database open;
Database altered.

SQL>  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Database altered.

新主库操作

srvctl start instance -d hpadb -i hpadb2

11gR2下DATAGUARD备库增量恢复

 

1、在备库上取消日志应用

alter database recover managed standby database cancel;

2、查看备库scn

select to_char(current_scn) from v$database;
     CURRENT_SCN
---------------
  3056655162

3、根据scn,在主库上进行rman增量备份

backup incremental from scn 3056655162 database format '/u02/rman/riskdb/ForStandby_%U.bka' tag 'forstandby';
或者
backup as compressed backupset incremental from scn 3056655162 database format '/u02/rman/riskdb/ForStandby_%U.bka' tag 'forstandby';

–备份控制文件

backup current controlfile for standby format '/u02/rman/riskdb/ForStandbyCTRL.bkc';


4、将备库关闭后重新启动到mount阶段

SQL>  shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.

SQL> startup nomount;
ORACLE instance started.
Total System Global Area 8551575552 bytes
Fixed Size              2270360 bytes
Variable Size              4227861352 bytes
Database Buffers    4311744512 bytes
Redo Buffers                 9699328 bytes

SQL> alter database mount standby database;
Database altered.

5、注册从主库传过来的增量备份

[oracle@hpdbpre3:/home/oracle]$rman target / nocatalog
Recovery Manager: Release 11.2.0.4.0 - Production on Wed Jan 3 13:09:42 2018
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
connected to target database: RISKDB (DBID=2017870605, not open)
using target database control file instead of recovery catalog

RMAN> catalog start with '/u02/rman/riskdb1';
searching for all files that match the pattern /u02/rman/riskdb1
List of Files Unknown to the Database
=====================================
File Name: /u02/rman/riskdb1/ForStandby_mosnlfhk_1_1.bka
File Name: /u02/rman/riskdb1/ForStandbyCTRL.bkc
File Name: /u02/rman/riskdb1/ForStandby_mpsnll4t_1_1.bka
Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done
List of Cataloged Files
=======================
File Name: /u02/rman/riskdb1/ForStandby_mosnlfhk_1_1.bka
File Name: /u02/rman/riskdb1/ForStandbyCTRL.bkc
File Name: /u02/rman/riskdb1/ForStandby_mpsnll4t_1_1.bka

6、进行数据库recover恢复

RMAN> RECOVER DATABASE NOREDO;
Starting recover at 03-JAN-18
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=351 device type=DISK
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00001: /u02/oradata/riskrp1/riskrp1/system01.dbf
destination for restore of datafile 00002: /u02/oradata/riskrp1/riskrp1/sysaux01.dbf
destination for restore of datafile 00003: /u02/oradata/riskrp1/riskrp1/undotbs01.dbf
destination for restore of datafile 00004: /u02/oradata/riskrp1/riskrp1/users01.dbf
destination for restore of datafile 00005: /u02/oradata/riskrp1/riskrp1/hprisk01.dbf
destination for restore of datafile 00006: /u02/oradata/riskrp1/riskrp1/hpriskidx01.dbf
destination for restore of datafile 00007: /u02/oradata/riskrp1/riskrp1/hpriskbase01.dbf
destination for restore of datafile 00008: /u02/oradata/riskrp1/riskrp1/hpriskbaseidx01.dbf
destination for restore of datafile 00009: /u02/oradata/riskrp1/riskrp1/hprisk02.dbf
destination for restore of datafile 00010: /u02/oradata/riskrp1/riskrp1/hprisk03.dbf
destination for restore of datafile 00011: /u02/oradata/riskrp1/riskrp1/hprisk04.dbf
destination for restore of datafile 00012: /u02/oradata/riskrp1/riskrp1/credit.dbf
destination for restore of datafile 00013: /u02/oradata/riskrp1/riskrp1/geedun01.dbf
destination for restore of datafile 00014: /u02/oradata/riskrp1/riskrp1/geedunidx01.dbf
destination for restore of datafile 00015: /u02/oradata/riskrp1/riskrp1/hprisk05.dbf
destination for restore of datafile 00016: /u02/oradata/riskrp1/riskrp1/hprisk06.dbf
destination for restore of datafile 00017: /u02/oradata/riskrp1/hprisk07.dbf
destination for restore of datafile 00018: /u02/oradata/riskrp1/hprisk08.dbf
destination for restore of datafile 00019: /u02/oradata/riskrp1/hprisk09.dbf
destination for restore of datafile 00020: /u02/oradata/riskrp1/hprisk10.dbf
destination for restore of datafile 00021: /u02/oradata/riskrp1/hprisk11.dbf
destination for restore of datafile 00022: /u02/oradata/riskrp1/hprisk12.dbf
destination for restore of datafile 00023: /u02/oradata/riskrp1/hprisk13.dbf
destination for restore of datafile 00024: /u02/oradata/riskrp1/hprisk14.dbf
destination for restore of datafile 00025: /u02/oradata/riskrp1/hprisk15.dbf
destination for restore of datafile 00026: /u02/oradata/riskrp1/riskrp1/hprisk16
destination for restore of datafile 00027: /u02/oradata/riskrp1/riskrp1/hprisk17
destination for restore of datafile 00028: /u02/oradata/riskrp1/riskrp1/hprisk18.dbf
destination for restore of datafile 00029: /u02/oradata/riskrp1/riskrp1/hprisk16.dbf
destination for restore of datafile 00030: /u02/oradata/riskrp1/riskrp1/hprisk19.dbf
destination for restore of datafile 00031: /u02/oradata/riskrp1/riskrp1/hprisk20.dbf
destination for restore of datafile 00032: /u02/oradata/riskrp1/riskrp1/hprisk21.dbf
destination for restore of datafile 00033: /u02/oradata/riskrp1/hpay01.dbf
destination for restore of datafile 00034: /u02/oradata/riskrp1/hpayidx01.dbf
destination for restore of datafile 00035: /u02/oradata/riskrp1/smhpay01.dbf
destination for restore of datafile 00036: /u02/oradata/riskrp1/smhpayidx01.dbf
destination for restore of datafile 00037: /u02/oradata/riskrp1/CHALIDX01.dbf
destination for restore of datafile 00038: /u02/oradata/riskrp1/CHANNEL01.dbf
destination for restore of datafile 00039: /u02/oradata/riskrp1/AHPAY01.dbf
destination for restore of datafile 00040: /u02/oradata/riskrp1/AHPAYIDX01.dbf
destination for restore of datafile 00041: /u02/oradata/riskrp1/RCTL01.dbf
destination for restore of datafile 00042: /u02/oradata/riskrp1/RCTLIDX01.dbf
destination for restore of datafile 00043: /u02/oradata/riskrp1/HIST01.dbf
destination for restore of datafile 00044: /u02/oradata/riskrp1/HISTIDX01.dbf
destination for restore of datafile 00045: /u02/oradata/riskrp1/riskrp1/hprisk22.dbf
destination for restore of datafile 00046: /u02/oradata/riskrp1/riskdb/hprisk23.dbf
destination for restore of datafile 00047: /u02/oradata/riskrp1/riskdb/hprisk24.dbf
destination for restore of datafile 00048: /u02/oradata/riskrp1/riskdb/hprisk25.dbf
destination for restore of datafile 00049: /u02/oradata/riskrp1/riskdb/hprisk26.dbf
destination for restore of datafile 00050: /u02/oradata/riskrp1/riskdb/hprisk27.dbf
destination for restore of datafile 00051: /u02/oradata/riskrp1/riskdb/hprisk28.dbf
destination for restore of datafile 00052: /u02/oradata/riskrp1/riskdb/hprisk29.dbf
destination for restore of datafile 00053: /u02/oradata/riskrp1/riskdb/hprisk30.dbf
destination for restore of datafile 00054: /u02/oradata/riskrp1/riskdb/hprisk31.dbf
destination for restore of datafile 00055: /u02/oradata/riskrp1/riskdb/hprisk32.dbf
destination for restore of datafile 00056: /u02/oradata/riskpre1/riskdb/geedun02.dbf
destination for restore of datafile 00057: /u02/oradata/riskpre1/riskdb/hprisk33.dbf
destination for restore of datafile 00058: /u02/oradata/riskpre1/riskdb/hprisk34.dbf
destination for restore of datafile 00059: /u02/oradata/riskpre1/riskdb/hprisk35.dbf
destination for restore of datafile 00060: /u02/oradata/riskpre1/riskdb/hprisk36.dbf
destination for restore of datafile 00061: /u02/oradata/riskpre1/riskdb/hprisk37.dbf
destination for restore of datafile 00062: /u02/oradata/riskpre1/riskdb/hprisk38.dbf
channel ORA_DISK_1: reading from backup piece /u02/rman/riskdb1/ForStandby_mosnlfhk_1_1.bka
channel ORA_DISK_1: piece handle=/u02/rman/riskdb1/ForStandby_mosnlfhk_1_1.bka tag=FORSTANDBY
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 05:14:17
Finished recover at 03-JAN-18

SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.

SQL> startup nomount;
ORACLE instance started.
Total System Global Area 8551575552 bytes
Fixed Size              2270360 bytes
Variable Size              4227861352 bytes
Database Buffers    4311744512 bytes
Redo Buffers                 9699328 bytes

7、还原控制文件

RMAN> restore standby controlfile from '/u02/rman/riskdb1/ForStandbyCTRL.bkc';
Starting restore at 04-JAN-18
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=816 device type=DISK
channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
output file name=/u02/oradata/riskpre1/control01.dbf
output file name=/u02/oradata/riskpre1/control02.dbf
output file name=/u02/oradata/riskpre1/control03.dbf
Finished restore at 04-JAN-18

8.启动备库到mount状态

SQL> startup nomount;
ORACLE instance started.

Total System Global Area 8551575552 bytes
Fixed Size              2270360 bytes
Variable Size              4227861352 bytes
Database Buffers    4311744512 bytes
Redo Buffers                 9699328 bytes

SQL> alter database mount standby database;
Database altered.


9、由于恢复了控制文件,可能需要重新添加新的standby redo log file(可选)

alter database add standby logfile group 8 ('/u02/oradata/riskrp1/riskrp1/standby08.log') size 500M reuse;
alter database add standby logfile group 9 ('/u02/oradata/riskrp1/riskrp1/standby09.log') size 500M reuse;
alter database add standby logfile group 10 ('/u02/oradata/riskrp1/riskrp1/standby10.log') size 500M reuse;
alter database add standby logfile group 11 ('/u02/oradata/riskrp1/riskrp1/standby11.log') size 500M reuse;
alter database add standby logfile group 12 ('/u02/oradata/riskrp1/riskrp1/standby12.log') size 500M reuse;

10、启动mrp进程

SQL> alter database recover managed standby database disconnect from session using current logfile;
Database altered.

10.启动备库同步

SQL> alter database open read only;
Database altered.

Thu Jan 04 09:09:11 2018
Errors in file /u01/app/oracle/diag/rdbms/riskpre1/riskpre1/trace/riskpre1_dbw0_16368.trc:
ORA-01157: cannot identify/lock data file 56 - see DBWR trace file
ORA-01110: data file 56: '/u02/oradata/riskrp1/riskdb/geedun02.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/riskpre1/riskpre1/trace/riskpre1_dbw0_16368.trc:
ORA-01157: cannot identify/lock data file 57 - see DBWR trace file
ORA-01110: data file 57: '/u02/oradata/riskrp1/riskdb/hprisk33.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/riskpre1/riskpre1/trace/riskpre1_dbw0_16368.trc:
ORA-01157: cannot identify/lock data file 58 - see DBWR trace file
ORA-01110: data file 58: '/u02/oradata/riskrp1/riskdb/hprisk34.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/riskpre1/riskpre1/trace/riskpre1_dbw0_16368.trc:
ORA-01157: cannot identify/lock data file 59 - see DBWR trace file
ORA-01110: data file 59: '/u02/oradata/riskrp1/riskdb/hprisk35.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/riskpre1/riskpre1/trace/riskpre1_dbw0_16368.trc:
ORA-01157: cannot identify/lock data file 60 - see DBWR trace file
ORA-01110: data file 60: '/u02/oradata/riskrp1/riskdb/hprisk36.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/riskpre1/riskpre1/trace/riskpre1_dbw0_16368.trc:
ORA-01157: cannot identify/lock data file 61 - see DBWR trace file
ORA-01110: data file 61: '/u02/oradata/riskrp1/riskdb/hprisk37.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/riskpre1/riskpre1/trace/riskpre1_dbw0_16368.trc:
ORA-01157: cannot identify/lock data file 62 - see DBWR trace file
ORA-01110: data file 62: '/u02/oradata/riskrp1/riskdb/hprisk38.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/riskpre1/riskpre1/trace/riskpre1_dbw0_16368.trc:
ORA-01157: cannot identify/lock data file 63 - see DBWR trace file
ORA-01110: data file 63: '/u02/oradata/riskrp1/riskdb/users02.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
MRP0: Background Media Recovery terminated with error 1110

,58,59,60,61,62,63
backup datafile 61,62,63 format '/u02/rman/riskdb/datafile3.bkp';

restore datafile 61,62,63