Monday, May 8, 2017

FAL[server, ARC0]: Error 12541 creating remote archivelog file while setting up a physical standby database for EBS 12.2



While i was working on setting up the physical standby for EBS 12.2, i encountered an error where RFS process were not coming up in the standby database. RFS is the background process and it should be started by default when we configure the dataguard. Apart from RFS, there are other important processes
================

Archiver Process ? The archiver process (ARCn or ARCH) is responsible for archiving online redo logs. The archival destination could be a local destination or a remote standby database site. In the case of a Data Guard configuration, one of the archival destinations must be a standby database.  The archiver process of the primary database writes the redo log file.
For a better data protection mode, the standby redo log files can be configured on the standby database. In this case, the archiver process on the standby site will be used to archive the standby redo log files.

-     Log Writer (LGWR) ? The log writer process on the primary database writes entries from the redo log buffer to the online redo log file. When the current online redo log file is full, it triggers the archiver process to start the archiving activity. In some cases, the log writer process writes redo entries to the online redo log file of the primary database and the standby redo log file of the standby database. Usually, in this kind of arrangement the LGWR works as the log transport agent that is setup to achieve high data protection modes.

-     Remote File Server (RFS) Process ? The RFS process runs on the standby database and is responsible for communication between the primary and the standby database. For the log transport service, the RFS on the standby database receives the redo records from the archiver or the log writer process of the primary database over Oracle Net and writes to filesystem on the standby site.

-     Fetch Archive Log (FAL) ? The FAL process has two components: FAL Client and FAL Server. Both processes are used for archive gap resolution. If the Managed Recovery Process (MRP) on the standby database site detects an archive gap sequence, it initiates a fetch request to the FAL client on the standby site. This action, in turn, requests the FAL server process on the primary database to re-transmit the archived log files to resolve the gap sequence. Archive gap sequences will be discussed later in this chapter.

Once the log transport service completes the transmission of redo records to the standby site, the log apply service starts applying the changes to the standby database. The log apply service operates solely on the standby database. The following processes on the standby site facilitate the log apply operations:

-     Managed Recovery Process (MRP) ? The MRP  applies the redo entries from the archived redo logs onto the physical standby database.

-     Logical Standby Process (LSP) ? The LSP applies the redo records from archived redo logs to the logical standby database. The Oracle database log miner engine is used by the logical standby process for the SQL apply operations. Using the log miner engine, the LSP process recreates the SQL statements from redo logs that have been executed on the primary database. These statements are then applied to the standby database to keep it current with the primary database.

=======================

Now coming back to the error, after configuring the dataguard, i first did some sanity checking using the below query and found that RFS processes are not running.

SQL> select thread#,sequence#,process,client_process,status,blocks from v$managed_standby;

   THREAD#  SEQUENCE# PROCESS   CLIENT_P STATUS           BLOCKS
---------- ---------- --------- -------- ------------ ----------
         0          0 ARCH      ARCH     CONNECTED             0
         0          0 ARCH      ARCH     CONNECTED             0
         0          0 ARCH      ARCH     CONNECTED             0
         0          0 ARCH      ARCH     CONNECTED             0
         0          0 ARCH      ARCH     CONNECTED             0
         0          0 ARCH      ARCH     CONNECTED             0
         0          0 ARCH      ARCH     CONNECTED             0
         0          0 ARCH      ARCH     CONNECTED             0
         1         99 MRP0      N/A      WAIT_FOR_LOG          0

9 rows selected.


I checked the alert log and found the message

Managed Standby Recovery starting Real Time Apply
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Media Recovery Waiting for thread 1 sequence 99
2017-02-26 21:44:59.709000 -05:00


Hmm, the communication between primary and standby is not yet fully established. So i logged into primary and ran the below queries

SQL> SELECT thread#, dest_id, gvad.status, error, fail_sequence FROM gv$archive_dest gvad, gv$instance gvi WHERE gvad.inst_id = gvi.inst_id AND destination is NOT NULL ORDER BY thread#, dest_id;

   THREAD#    DEST_ID STATUS
---------- ---------- ---------
ERROR                                                             FAIL_SEQUENCE
----------------------------------------------------------------- -------------
         1          1 VALID
                                                                              0

         1          2 ERROR
ORA-12541: TNS:no listener                                                    0


SQL> SELECT gvi.thread#, timestamp, message FROM gv$dataguard_status gvds, gv$instance gvi WHERE gvds.inst_id = gvi.inst_id AND severity in ('Error','Fatal') ORDER BY timestamp, thread#;

   THREAD# TIMESTAMP
---------- ------------------
MESSAGE
--------------------------------------------------------------------------------
         1 24-FEB-17
Error 12541 received logging on to the standby

         1 24-FEB-17
FAL[server, ARC0]: Error 12541 creating remote archivelog file 'PRODDR'



The establishment is not working. Then, i verified the pfile and found some discrepancies in the parameter *.log_archive_dest_2. This was initially pointing to the primary database services. Had to modify it to point to correct values

*.log_archive_dest_2='SERVICE=PRODDR LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=PRODDR OPTIONAL REOPEN=15 MAX_FAILURE=100 NET_TIMEOUT=300'


Restarted the database and listener. Again i ran the below query

SQL> SELECT gvi.thread#, timestamp, message FROM gv$dataguard_status gvds, gv$instance gvi WHERE gvds.inst_id = gvi.inst_id AND severity in ('Error','Fatal') ORDER BY timestamp, thread#;

no rows selected

SQL> /


No error message reported this time. Logged into standby and ran the query

SQL> select thread#,sequence#,process,client_process,status,blocks from v$managed_standby;

   THREAD#  SEQUENCE# PROCESS   CLIENT_P STATUS           BLOCKS
---------- ---------- --------- -------- ------------ ----------
         0          0 ARCH      ARCH     CONNECTED             0
         0          0 ARCH      ARCH     CONNECTED             0
         0          0 ARCH      ARCH     CONNECTED             0
         1        104 ARCH      ARCH     CLOSING             605
         0          0 ARCH      ARCH     CONNECTED             0
         0          0 ARCH      ARCH     CONNECTED             0
         0          0 ARCH      ARCH     CONNECTED             0
         0          0 ARCH      ARCH     CONNECTED             0
         1        105 MRP0      N/A      APPLYING_LOG    2048000
         1        105 RFS       LGWR     IDLE                  1
         0          0 RFS       UNKNOWN  IDLE                  0

   THREAD#  SEQUENCE# PROCESS   CLIENT_P STATUS           BLOCKS
---------- ---------- --------- -------- ------------ ----------
         0          0 RFS       UNKNOWN  IDLE                  0
         0          0 RFS       UNKNOWN  IDLE                  0



All is fine now. RFS processes are now running. We can also verify the same from alert log

RFS[1]: Assigned to RFS process (PID:13064)
RFS[1]: Selected log 3 for thread 1 sequence 105 dbid 345897223 branch 934034088
RFS[2]: Assigned to RFS process (PID:13068)
RFS[2]: Selected log 4 for thread 1 sequence 104 dbid 345897223 branch 934034088
2017-02-26 22:38:36.939000 -05:00
RFS[3]: Assigned to RFS process (PID:13076)
RFS[3]: Opened log for thread 1 sequence 100 dbid 345897223 branch 934034088
RFS[4]: Assigned to RFS process (PID:13080)
RFS[4]: Opened log for thread 1 sequence 99 dbid 345897223 branch 934034088


Reference:RFS Is not coming up on standby Due to ORA-12154 on transport (primary side) (Doc ID 2196182.1)

*******************************************************************
When you are using Dataguard, there are several scenarios when physical standby can go out of sync with the primary database.

Before doing anything to correct the problem, we need to verify that why standby is not in sync with the primary. In this particular article, we are covering the scenario where a log is missing from the standby but apart from the missing log, all logs are available.

Verify from v$archived_log that there is a gap in the sequence number. All the logs up to that gap should have APPLIED=YES and all the sequence# after the missing log sequence# are APPLIED=NO. This means that due to the missing log, MRP is not applying the logs on standby but the logs are still being transmitted to the standby and are available.

SQL> SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;

So for example, if the missing log sequence# is 400, then the above query should show that up to sequence#399, all have APPLIED=YES and starting from 401, all are APPLIED=NO.

There are few steps to be performed when the standby is not in sync with the primary because there is a gap of logs on standby.

These steps are:

STEP #1: Take an incremental backup of primary from the SCN where standby is lagging behind and apply on the standby server

STEP #2: If step#1 is not able to sync up, then re-create the controlfile of standby database from the primary

STEP #3: If after step#2, you still find that logs are not being applied on the standby, check the alert log and you may need to re-register the logs with the standby database.

*******************************************************************************************

STEP#1

1. On STANDBY database query the v$database view and record the current SCN of the standby database:

SQL> SELECT CURRENT_SCN FROM V$DATABASE;

CURRENT_SCN
-----------
1.3945E+10

SQL> SELECT to_char(CURRENT_SCN) FROM V$DATABASE;
TO_CHAR(CURRENT_SCN)

----------------------------------------
13945141914

2. Stop Redo Apply on the standby database:

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
*
ERROR at line 1:
ORA-16136: Managed Standby Recovery not active
If you see the above error, it means Managed Recovery is already off

You can also confirm from the view v$managed_standby to see if the MRP is running or not

SQL> SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;

3. Connect to the primary database as the RMAN target and create an incremental backup from the current SCN of the standby database that was recorded in step 1:

For example,

BACKUP INCREMENTAL FROM SCN 13945141914 DATABASE FORMAT '/tmp/ForStandby_%U' tag 'FOR STANDBY'

You can choose a location other than /tmp also.

4. Do a recovery of the standby database using the incremental backup of primary taken above:

On the Standby server, without connecting to recovery catalog, catalog the backupset of the incremental backup taken above. Before this, of course you need to copy the backup piece of the incremental backup taken above to a location accessible to standby server.

$ rman nocatalog target /
RMAN> CATALOG BACKUPPIECE '/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1';

Now in the same session, start the recovery

RMAN> RECOVER DATABASE NOREDO;

You should see something like:

Starting recover at 2015-09-17 04:59:57
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=309 devtype=DISK
channel ORA_DISK_1: starting incremental datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
....
..
..
.
channel ORA_DISK_1: reading from backup piece /dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1
channel ORA_DISK_1: restored backup piece 1
piece handle=/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1 tag=FOR STANDBY
channel ORA_DISK_1: restore complete, elapsed time: 01:53:08
Finished recover at 2015-07-25 05:20:3

Delete the backup set from standby:

RMAN> DELETE BACKUP TAG 'FOR STANDBY';
using channel ORA_DISK_1
List of Backup Pieces

BP Key  BS Key  Pc# Cp# Status      Device Type Piece Name
------- ------- --- --- ----------- ----------- ----------
17713   17713   1   1   AVAILABLE   DISK        /dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1

Do you really want to delete the above objects (enter YES or NO)? YES

deleted backup piece

backup piece handle=/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1 recid=17713 stamp=660972421

Deleted 1 objects

5. Try to start the managed recovery.

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;

If you get an error here, you need to go to STEP#2 for bringing standby in sync.

If no error, then using the view v$managed_standby, verify that MRP process is started and has the status APPLYING_LOGS.

6. After this, check whether the logs are being applied on the standby or not:

SQL> SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;

After doing a recovery using the incremental backup, you will not see the sequence#'s which were visible earlier with APPLIED=NO because they have been absorbed as part of the incremental backup and applied on standby during recovery.

The APPLIED column starts showing YES for the logs which are being transmitted now, this means logs are being applied.

Check the status of MRP process in the view v$managed_standby. The status should be APPLYING_LOGS for the duration that available logs are being applied and once all available logs have been applied, the status should be WAITING_FOR_LOGS

7. Another check to verify that primary and standby are in sync. Run the following query on both standby and primary:

SQL> select max(sequence#) from v$log_history.

Output should be same on both databases.

*******************************************************************************************

STEP #2:

Since Managed recovery failed after applying the incremental backup, we need to recreate the controlfile of standby. The reason for recreating the controlfile is that the state of the database was same because the database_scn was not updated in the control file after applying the incremental backup while the scn for datafiles were updated. Consequently, the standby database was still looking for the old file to apply.

A good MOSC note for re-creating the controlfile in such a scenario is 734862.1.

Steps to recreate the standby controlfile and start the managed recovery on standby:

1. Take the backup of controlfile from primary

rman target sys/oracle@proddb catalog rman/cat@emrep
backup current controlfile for standby;

2. Copy the controlfile backup to the standby system (or if it is on the common NFS mount, no need to transfer or copy) and restore the controlfile onto the standby database

Shutdown all instances (If standby is RAC) of the standby.

sqlplus / as sysdba
shutdown immediate
exit

Startup nomount, one instance.

sqlplus / as sysdba
startup nomount
exit
Restore the standby control file.

rman nocatalog target /
restore standby controlfile from '/tmp/o1_mf_TAG20070220T151030_.bkp';
exit
3. Startup the standby with the new control file.

sqlplus / as sysdba
shutdown immediate
startup mount
exit
4.  Restart managed recovery in one instance (if standby is RAC) of the standby database:

sqlplus / as sysdba
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
The above statement may succeed without errors but the MRP process will still not start. The reason is that since the controlfile has been restored from the primary, it is looking for datafiles at the same location as are in primary instead of standby. For example, if the primary datafiles are located at '+DATA/proddb_1/DATAFILE' and standby datafiles are at '+DATA/proddb_2/DATAFILE', the new controlfile will show the datafile's location as '+DATA/proddb_1/DATAFILE'. This can be verified from the query "select name from v$datafile" on the standby instance. We need to rename all the datafiles to reflect the correct location.

There are two ways to rename the datafiles:

1. Without using RMAN

Change the parameter standby_file_management=manual in standby's parameter file.

ALTER DATABASE RENAME FILE '+DATA/proddb_1/datafile/users.310.620229743' TO '+DATA/proddb_2/datafile/USERS.1216.648429765';

2. Using RMAN

rman nocatalog target /

Catalog the files, the string specified should refer to the diskgroup/filesystem destination of the standby data files.

RMAN> catalog start with '+diskgroup/<dbname>/datafile/';

e.g.:

RMAN> catalog start with '+DATA/proddb_2/datafile/';

This will give the user a list of files and ask if they should all be cataloged. The user should review and say YES if all the datafiles are properly listed.

Once that is done, then commit the changes to the controlfile

RMAN> switch database to copy;

Now start the managed recovery as:

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT

and check for processes in the view v$managed_standby.  MRP process should be there. It will also start applying all the archived logs that were missing since last applied log.  This process might take hours.

5. Another check to verify that primary and standby are in sync:

Run the following query on both standby and primary after all logs in v$archived_log show APPLIED=YES:

SQL> select max(sequence#) from v$log_history.

Output should be same on both databases.

*****************************************************************************************

STEP #3

After recreating the controlfile, you still find that logs are being transmitted but not being applied on the standby. Check the alert log of standby. For example, see if you find something similar to below snippet:

Fetching gap sequence in thread 1, gap sequence 74069-74095
Wed Sep 17 06:45:47 2015
RFS[1]: Archived Log: '+DATA/ipwp_sac1/archivelog/2008_09_17/thread_1_seq_74093.259.665649929'
Wed Sep 17 06:45:55 2015
Fetching gap sequence in thread 1, gap sequence 74069-74092
Wed Sep 17 06:45:57 2015
RFS[1]: Archived Log: '+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74094.258.665649947'
Wed Sep 17 06:46:16 2015
RFS[1]: Archived Log: '+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74095.256.665649957'
Wed Sep 17 06:46:26 2015
FAL[client]: Failed to request gap sequence
 GAP - thread 1 sequence 74069-74092
The contents of alert log shows that logs sequence# from 74069 to 74092 may have been transmitted but not applied. The view v$archived_log shows the sequence# starting from 74093 and APPLIED=NO.

So this situation means that logs up to 74068 were applied as part of the incremental backup and from 74069 to 74093 have been transferred to standby server but they must have failed to register with standby database. Try the following steps:

Locate the log sequence# shown in alert log (for example 74069 to 74092). For example, +DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74069.995.665630861
Register all these archived logs with the standby database.
alter database register logfile '+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74069.995.665630861';

alter database register logfile '+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74070.998.665631405';

alter database register logfile '+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74071.792.665633755';

alter database register logfile '+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74072.263.665633713';

??..

?.and so on till the last one.


Now check the view v$archived_log and finally should see the logs being applied. The status of MRP should change from ARCHIVE_LOG_GAP to APPLYING_LOGS and eventually WAITING_FOR_LOGS.







No comments:

Post a Comment