11gR2 RAC+ADG+RAC Actual Operation
一、Environment configuration information
PRIMARY:
1.OS information
[oracle@trsenpr1:/home/oracle]$uname -a
Linux trsenpr1.zr.trsen 3.8.13-16.2.1.el6uek.x86_64 #1 SMP Thu Nov 7 17:01:44 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
[oracle@trsenpr1:/home/oracle]$more /etc/redhat-release
Red Hat Enterprise Linux Server release 6.5 (Santiago)
2.IP informations
[oracle@trsenpr1:/home/oracle]$more /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
#public IP
10.88.100.40 trsenpr1 trsenpr1.zr.trsen
10.88.100.41 trsenpr2 trsenpr2.zr.trsen
#virtual IP
10.88.100.42 trsenpr1-vip trsenpr1-vip.zr.trsen
10.88.100.43 trsenpr2-vip trsenpr2-vip.zr.trsen
3.Database version
SQL> select version,comments from registry$history;
VERSION COMMENTS
------------------------------ --------------------------------------------------
11.2.0.4 Patchset 11.2.0.2.0集群版本:
4.path information
./opatch lsinventory
Not opatch for the database and cluster.
STANDBY.
1.OS information
[oracle@trsendb1:/home/oracle]$uname -a
Linux trsendb1.zr.trsen 3.8.13-16.2.1.el6uek.x86_64 #1 SMP Thu Nov 7 17:01:44 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
[oracle@trsendb1:/home/oracle]$more /etc/redhat-release
Red Hat Enterprise Linux Server release 6.5 (Santiago)
2.IP informations
[oracle@trsendb1:/home/oracle]$more /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
#public ip
10.88.100.50 trsendb1 trsendb1.zr.trsen
10.88.100.51 trsendb2 trsendb2.zr.trsen
#virtual IP
10.88.100.52 trsendb1-vip trsendb1-vip.zr.trsen
10.88.100.53 trsendb2-vip trsendb2-vip.zr.trsen
3.Database version
SQL> select version,comments from registry$history;
VERSION COMMENTS
------------------------------ --------------------------------------------------
11.2.0.4 Patchset 11.2.0.2.0
4.path information
./opatch lsinventory
Not opatch for the database and cluster.
二、Prerequisites / Preparation
1.We will switchover 100.50/51 to primary database and 100.40/41 to standby database.
2.We will Data Guard Physical Standby Switchover using SQL*Plus
3.Review Primary Database Initialization Parameters
Ensure that the LOG_ARCHIVE_CONFIG & DG_CONFIG initialization parameters are established at the primary database and 'compatible' is set correctly and to the same Value on the Primary and Standby Site.
4.Verify the Physical Standby Database Is Performing Properly
a.on the standby database, query the V$ARCHIVED_LOG view to identify existing files in the archived redo log.
SELECT SEQUENCE#, FIRST_TIME, NEXT_TIME FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;
b.On the primary database, issue the ALTER SYSTEM SWITCH LOGFILE statement to force a log switch and archive the current online redo log file group
ALTER SYSTEM SWITCH LOGFILE;
c.On the standby database, query the V$ARCHIVED_LOG view to verify the redo data was received and archived on the standby database
SELECT SEQUENCE#, FIRST_TIME, NEXT_TIME FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;
d.On the standby database, query the V$ARCHIVED_LOG view to verify that received redo has been applied
SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;
5.Verify Managed Recovery is running on the standby
The following query at the standby verifies that managed recovery is running
SELECT PROCESS FROM V$MANAGED_STANDBY WHERE PROCESS LIKE 'MRP%';
6.The following query at the Primary verifies that recovery is running with “REAL TIME APPLY” option.
In the example below, LOG_ARCHIVE_DEST_5 is established to ship redo to the target standby (dest_id=5):
SQL> SELECT RECOVERY_MODE FROM V$ARCHIVE_DEST_STATUS WHERE DEST_ID=5;
RECOVERY_MODE
-----------------------
MANAGED REAL TIME APPLY
If managed standby recovery is not running or not started with real-time apply, restart managed recovery with real-time apply enabled:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Note: If you previously defined a delay for this standby the delay is ignored when you start real time apply
7.Verify Sufficient Number of Archiver Processes
Make sure LOG_ARCHIVE_MAX_PROCESSES is set to 4 or higher on every primary and standby database in the Data Guard configuration.
Take care not to set it too high as the additional archivers can increase the time to shutdown the database.
This parameter can be set dynamically via ALTER SYSTEM.
Ensure Online Redo Log Files on the Target Physical Standby have been cleared.
Online redo logs on the target physical standby need to be cleared before that standby database can become a primary database.
Although this will automatically happen as part of the SWITCHOVER TO PRIMARY command, it is recommended that the logs are cleared prior to the switchover.
If your databases are using Oracle Managed Files (OMF) or you have already set the parameter LOG_FILE_NAME_CONVERT you can skip this step as the online log files will always be cleared automatically.
8.Verify there are no large Gaps
Identify the current sequence number for each thread on the primary database
SQL> SELECT THREAD#, SEQUENCE# FROM V$THREAD;
Verify the target physical standby database has applied up to, but not including the logs from the primary query. On the standby the following query should be within 1 or 2 of the primary query result.
SELECT THREAD#, MAX(SEQUENCE#) FROM V$ARCHIVED_LOG
WHERE APPLIED = 'YES'
AND RESETLOGS_CHANGE# = (SELECT RESETLOGS_CHANGE#
FROM V$DATABASE_INCARNATION WHERE STATUS = ‘CURRENT’)
GROUP BY THREAD#;
9.Verify Primary and Standby tempfiles match and all datafiles are ONLINE
10.If The Primary is a RAC or Standby is a RAC, then shutdown all secondary primary instances
A normal or immediate shutdown can be done, but to expedite the shutdown issue a SHUTDOWN ABORT on secondary RAC instances on the primary cluster only leaving one instance up.
Wait until the remaining instance has completed cluster reconfiguration (and performed recovery if you chose to abort the secondary instances) before continuing.
NOTE:it's very very important.
11.Verify that the primary database can be switched to the standby role
Query the SWITCHOVER_STATUS column of the V$DATABASE view on the primary database:
SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;
SWITCHOVER_STATUS
-----------------
TO STANDBY
三.Switchover
NOTE:These steps are completed as part of the switchover process on the day of the planned outage.
1.Cancel any apply delay for the target standby
Remove any delay in applying redo that may be in effect on the standby database that will become the new primary database.
If there is a delay then on the target standby database execute the following command.
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE NODELAY DISCONNECT FROM SESSION;
Clear Potential Blocking Parameters & Jobs and Capture current job state on the primary
SELECT * FROM DBA_JOBS_RUNNING;
Depending on what the running job is, be ready to terminate the job if necessary.
SELECT OWNER, JOB_NAME, START_DATE, END_DATE, ENABLED FROM DBA_SCHEDULER_JOBS WHERE ENABLED=’TRUE’ AND OWNER <> ‘SYS”;
Note: Job candidates to be disabled among others: oracle text sync and optimizer, RMAN backups, application garbage collectors, application background agents.
Block further job submission
ALTER SYSTEM SET job_queue_processes=0 SCOPE=BOTH SID=’*’;
Disable any jobs that may interfere.
SQL> EXECUTE DBMS_SCHEDULER.DISABLE( <job_name> );
Tail the alert logs
tail –f <background_dump_dest location>/alert*
check transaction
select xidusn,xidslot,xidsqn,status,start_time from gv$transaction;
Switchover the primary to a standby database
ALTER SYSTEM CHECKPOINT;
ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN;
alter database recover managed standby database using current logfile disconnect from session;
If an ORA-16139 error is encountered, as long as V$DATABASE.DATABASE_ROLE=’PHYSICAL STANDBY’, then you can proceed.
A common case where this can occur is when there are a large number of data files.
Once managed recovery is started on the new standby, the database will recover.
If the role was not changed then you need to cancel the switchover and review the alert logs and trace files further.
Verify the standby has received the end-of-redo (EOR) log(s)
In the primary alert log you will see messages like these:
Reconfiguration started (old inc 4, new inc 6)<<<<<<<<======shutdown immedaite second node,reconfiguration.
List of instances:
1 (myinst: 1)
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Tue Nov 22 03:04:31 2016
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Nov 22 03:04:31 2016
Tue Nov 22 03:04:31 2016
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Tue Nov 22 03:04:32 2016
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
Tue Nov 22 03:04:45 2016
Redo thread 2 internally disabled at seq 6493 (CKPT)
Tue Nov 22 03:04:45 2016
ARC3: Archiving disabled thread 2 sequence 6493
Archived Log entry 43702 added for thread 2 sequence 6493 ID 0x43f83f07 dest 1:
Tue Nov 22 03:05:37 2016
Decreasing number of real time LMS from 3 to 0
Tue Nov 22 03:06:41 2016
ALTER SYSTEM SET log_archive_dest_state_3='DEFER' SCOPE=BOTH;
Tue Nov 22 03:06:51 2016
ALTER SYSTEM SET log_archive_dest_state_5='DEFER' SCOPE=BOTH;
Tue Nov 22 03:08:06 2016
alter database commit to switchover to physical standby WITH SESSION SHUTDOWN<<<<<<<<=======staring switchover primary to standby.
ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY [Process Id: 144279] (trsenpr1)
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Waiting for all FAL entries to be archived...
All FAL entries have been archived.
Waiting for potential Physical Standby switchover target to become synchronized...
Active, synchronized Physical Standby switchover target has been identified
Switchover End-Of-Redo Log thread 1 sequence 7911 has been fixed
Switchover: Primary highest seen SCN set to 0x4.0x2f9d035c
ARCH: Noswitch archival of thread 1, sequence 7911
ARCH: End-Of-Redo Branch archival of thread 1 sequence 7911
ARCH: LGWR is actively archiving destination LOG_ARCHIVE_DEST_2
ARCH: Standby redo logfile selected for thread 1 sequence 7911 for destination LOG_ARCHIVE_DEST_2
Archived Log entry 43706 added for thread 1 sequence 7911 ID 0x43f83f07 dest 1:
ARCH: Archiving is disabled due to current logfile archival
Primary will check for some target standby to have received alls redo
Final check for a synchronized target standby. Check will be made once.
LOG_ARCHIVE_DEST_2 is a potential Physical Standby switchover target
Active, synchronized target has been identified
Target has also received all redo
Backup controlfile written to trace file /u01/app/oracle/diag/rdbms/trsenpr/trsenpr1/trace/trsenpr1_ora_144279.trc
Clearing standby activation ID 1140342535 (0x43f83f07)
The primary database controlfile was created using the
'MAXLOGFILES 192' clause.
There is space for up to 182 standby redo logfiles
Use the following SQL commands on the standby database to create
standby redo logfiles that match the primary database:
ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 1073741824;
ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 1073741824;
ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 1073741824;
ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 1073741824;
ALTER DATABASE ADD STANDBY LOGFILE 'srl5.f' SIZE 1073741824;
ALTER DATABASE ADD STANDBY LOGFILE 'srl6.f' SIZE 1073741824;
ALTER DATABASE ADD STANDBY LOGFILE 'srl7.f' SIZE 1073741824;
ALTER DATABASE ADD STANDBY LOGFILE 'srl8.f' SIZE 1073741824;
ALTER DATABASE ADD STANDBY LOGFILE 'srl9.f' SIZE 1073741824;
ALTER DATABASE ADD STANDBY LOGFILE 'srl10.f' SIZE 1073741824;
ALTER DATABASE ADD STANDBY LOGFILE 'srl11.f' SIZE 1073741824;
Archivelog for thread 1 sequence 7911 required for standby recovery
Switchover: Primary controlfile converted to standby controlfile succesfully.
Switchover: Complete - Database shutdown required
......
.....
Tue Nov 22 03:09:23 2016
ALTER DATABASE OPEN
This instance was first to open
Beginning Standby Crash Recovery.
Serial Media Recovery started
Tue Nov 22 03:09:23 2016
Managed Standby Recovery starting Real Time Apply
CHANGE TRACKING change stream 1 is open.
CHANGE TRACKING file is not usable. The above change
stream(s) are open. Change tracking was not closed
cleanly the last time it was shutdown.
CHANGE TRACKING is reinitializing the change tracking file.
Starting background process CTWR
Tue Nov 22 03:09:23 2016
CTWR started with pid=47, OS id=4150
ARC7: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Block change tracking service is active.
Media Recovery Log +ARCH/trsenpr/1_7911_909679324.dbf
SUCCESS: diskgroup ARCH was mounted
Identified End-Of-Redo (switchover) for thread 1 sequence 7911 at SCN 0x4.2f9d035c
Tue Nov 22 03:09:24 2016
NOTE: dependency between database trsenpr and diskgroup resource ora.ARCH.dg is established
Resetting standby activation ID 0 (0x0)
Incomplete Recovery applied until change 17978688348 time 11/22/2016 03:08:08
Block change tracking service stopping.
Stopping background process CTWR
Completed Standby Crash Recovery.
Picked Lamport scheme to generate SCNs
Tue Nov 22 03:09:25 2016
SMON: enabling cache recovery
Dictionary check beginning
Dictionary check complete
Database Characterset is ZHS16GBK
Opening with Resource Manager plan: trsenPLAN
Tue Nov 22 03:09:26 2016
Starting background process VKRM
Tue Nov 22 03:09:26 2016
VKRM started with pid=47, OS id=4249
Starting background process GTX0
Tue Nov 22 03:09:26 2016
GTX0 started with pid=49, OS id=4252
replication_dependency_tracking turned off (no async multimaster replication found)
Physical standby database opened for read only access.
Completed: ALTER DATABASE OPEN
Tue Nov 22 03:09:26 2016
db_recovery_file_dest_size of 51100 MB is 0.23% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Tue Nov 22 03:09:52 2016
alter database recover managed standby database using current logfile disconnect from session
Attempt to start background Managed Standby Recovery process (trsenpr1)
Tue Nov 22 03:09:52 2016
MRP0 started with pid=53, OS id=4605
MRP0: Background Managed Standby Recovery process started (trsenpr1)
started logmerger process
Tue Nov 22 03:09:57 2016
Managed Standby Recovery starting Real Time Apply
Parallel Media Recovery started with 32 slaves
Tue Nov 22 03:09:59 2016
Block change tracking file is current.
Starting background process CTWR
Tue Nov 22 03:09:59 2016
CTWR started with pid=87, OS id=4891
Block change tracking service is active.
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Clearing online redo logfile 1 +DATA/trsenpr/redo1-1.log
Clearing online log 1 of thread 1 sequence number 7908
Completed: alter database recover managed standby database using current logfile disconnect from session
And correspondingly in the standby alert log file you should see messages like these
Tue Nov 22 03:04:46 2016
RFS[28]: Assigned to RFS process 1500
RFS[28]: Opened log for thread 2 sequence 6493 dbid 1097341969 branch 909679324
Archived Log entry 729 added for thread 2 sequence 6493 rlc 909679324 ID 0x43f83f07 dest 2:
Tue Nov 22 03:04:46 2016
Media Recovery Log +ARCH/trsendb/2_6493_909679324.dbf
Tue Nov 22 03:05:31 2016
Decreasing number of real time LMS from 3 to 0
Tue Nov 22 03:08:08 2016
RFS[29]: Assigned to RFS process 3944
RFS[29]: Selected log 11 for thread 1 sequence 7911 dbid 1097341969 branch 909679324
Tue Nov 22 03:08:08 2016
Archived Log entry 730 added for thread 1 sequence 7911 ID 0x43f83f07 dest 1:
Tue Nov 22 03:08:08 2016
Resetting standby activation ID 1140342535 (0x43f83f07)
Media Recovery End-Of-Redo indicator encountered
Media Recovery Continuing
Media Recovery Waiting for thread 1 sequence 7912
Tue Nov 22 03:08:09 2016
RFS[17]: Possible network disconnect with primary database
Tue Nov 22 03:08:09 2016
RFS[30]: Assigned to RFS process 59932
RFS[30]: Possible network disconnect with primary database
Tue Nov 22 03:08:09 2016
RFS[24]: Possible network disconnect with primary database
In versions prior to Oracle Database 11g Release 2, the MRP (Redo Apply coordinator) would stop automatically after processing the End-of-Redo marker.
With Oracle Database 11g Release 2, it no longer stops leaving all bystander standby databases still ready to apply redo from the new primary database without having to be restarted.
The MRP process will be shut down automatically by the switchover command when executed at the target standby database.
Verify that the standby database can be switched to the primary role
Query the SWITCHOVER_STATUS column of the V$DATABASE view on the standby database:
SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;
SWITCHOVER_STATUS
-----------------
TO PRIMARY
A value of TO PRIMARY or SESSIONS ACTIVE indicates that the standby database is ready to be switched to the primary role.
If neither of these values is returned, verify that redo apply is active and that redo transport is configured and working properly.
Continue to query this column until the value returned is either TO PRIMARY or SESSIONS ACTIVE.
Switchover the standby database to a primary
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;
In the standby alert log file you should see messages like these:
lter database commit to switchover to primary WITH SESSION SHUTDOWN
ALTER DATABASE SWITCHOVER TO PRIMARY (trsendb1)
Maximum wait for role transition is 15 minutes.
Switchover: Media recovery is still active
Role Change: Canceling MRP - no more redo to apply
Tue Nov 22 03:11:38 2016
MRP0: Background Media Recovery cancelled with status 16037
Errors in file /u01/app/oracle/diag/rdbms/trsendb/trsendb1/trace/trsendb1_pr00_71666.trc:
ORA-16037: user requested cancel of managed recovery operation
Tue Nov 22 03:11:38 2016
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Tue Nov 22 03:11:45 2016
MRP0: Background Media Recovery process shutdown (trsendb1)
Tue Nov 22 03:11:47 2016
Role Change: Canceled MRP
All dispatchers and shared servers shutdown
CLOSE: killing server sessions.
....
Active process 31997 user 'grid' program 'oracle@trsendb1.zr.trsen'
Active process 121903 user 'grid' program 'oracle@trsendb1.zr.trsen'
Active process 31999 user 'grid' program 'oracle@trsendb1.zr.trsen'
Active process 31002 user 'grid' program 'oracle@trsendb1.zr.trsen'
Active process 4500 user 'grid' program 'oracle@trsendb1.zr.trsen'
Active process 4502 user 'grid' program 'oracle@trsendb1.zr.trsen'
Active process 747 user 'grid' program 'oracle@trsendb1.zr.trsen'
Active process 4504 user 'grid' program 'oracle@trsendb1.zr.trsen'
CLOSE: all sessions shutdown successfully.
Tue Nov 22 03:11:50 2016
SMON: disabling cache recovery
Tue Nov 22 03:12:00 2016
Backup controlfile written to trace file /u01/app/oracle/diag/rdbms/trsendb/trsendb1/trace/trsendb1_ora_196197.trc
SwitchOver after complete recovery through change 17978688348
Online log +DATA/trsendb/redo1-1.log: Thread 1 Group 1 was previously cleared
Online log +DATA/trsendb/redo1-2.log: Thread 1 Group 2 was previously cleared
Online log +DATA/trsendb/redo1-3.log: Thread 1 Group 3 was previously cleared
Online log +DATA/trsendb/redo1-4.log: Thread 1 Group 4 was previously cleared
Online log +DATA/trsendb/redo1-5.log: Thread 1 Group 5 was previously cleared
Online log +DATA/trsendb/redo2-1.log: Thread 2 Group 6 was previously cleared
Online log +DATA/trsendb/redo2-2.log: Thread 2 Group 7 was previously cleared
Online log +DATA/trsendb/redo2-3.log: Thread 2 Group 8 was previously cleared
Online log +DATA/trsendb/redo2-4.log: Thread 2 Group 9 was previously cleared
Online log +DATA/trsendb/redo2-5.log: Thread 2 Group 10 was previously cleared
Standby became primary SCN: 17978688346
Switchover: Complete - Database mounted as primary
Completed: alter database commit to switchover to primary WITH SESSION SHUTDOWN
Tue Nov 22 03:12:08 2016
ARC1: Becoming the 'no SRL' ARCH
Tue Nov 22 03:12:16 2016
Starting ORACLE instance (normal)
Tue Nov 22 03:12:53 2016
alter database open
This instance was first to open
Open the new primary database
ALTER DATABASE OPEN;
Correct any tempfile mismatch
If there was a tempfile that was not corrected during the pre-switchover check, then correct it now on the new primary.
Restart the new standby
If the new standby database (former primary database) was not shutdown since switching it to standby, bring it to the mount state and start managed recovery.
This can be done in parallel to the new primary open.
SHUTDOWN immediate;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
The blog reference the 11.2 Data Guard Physical Standby Switchover Best Practices using SQL*Plus (Doc ID 1304939.1)
The blog come from real cases.