Saturday, October 10, 2015

adop fs_clone fails with error - The following nodes are not in sync

Issue:
EBS Application is multi node setup and a patch has been applied by DBA from Primary node using adop with hotpatch option. The patch is successful in one node and failed on other node. This is not observed by the DBA and he proceeded with other patch as well and that has also failed on second node.

DBA identified the issue on secondary node and fixed the issue with second node.

DBA fixed issue on Node2 and tried to apply patch on Slave node with allnodes=no

adop phase=apply patches=20012197 workers=8 hotpatch=yes allnodes=no action=nodb

adop returned successfully updated that Patch is successfull.

Now we need to sync the filesystems and tried to fs_clone on all nodes as below.

$ adop phase=fs_clone

Unfortunately fs_clone has failed with below errors..
Checking for pending adop sessions...
    No pending session exists.
    Staging new adop session...
    [UNEXPECTED]The following nodes are not in sync : oraapp02    Please bring the nodes in sync and then continue    [UNEXPECTED]Unrecoverable error occured. Exiting the current session.


DBA Tried to run prepare/cleanup and none of them helped.. all these got failed.

Solution:

Even though patch has been applied on failed nodes and it showed as successfull. ADOP activity didn't updated patchrun_id in table AD_ADOP_SESSION_PATCHES.
Below is the output seen:
select bug_number, patchrun_id, node_name from  AD_ADOP_SESSION_PATCHES where bug_number in ('20126243')
BUG_NUMBER                     PATCHRUN_ID            NODE_NAME                      
------------------------------ ---------------------- -------------------------------
20126243                       24398                  oraapp01                     
20126243                       -1                  oraapp02                     
   
Based on output patchrun_id for oraapp02 should be 24398 instead it is showing as "-1".

Before are two solutions to fix this issue:

Sol 1: 
If Patch that is applied is a small oneoff patch try applying patch on all nodes with force option & nodbportion as below

$ adop phase=apply  patches=20126243 workers=8 hotpatch=yes options=forceapply,nodatabaseportion

This step will update patchrun_id in table AD_ADOP_SESSION_PATCHES

Sol 2: 
Manually update patchrun_id for node 2 as value same as that of node1
SQL> update table AD_ADOP_SESSION_PATCHES set patchrun_id=24398 where bug_number='<bug_number>' and session_id='<adop_sess_id>' and node_name='oraapp02';
SQL> commit;

Run fs_clone now.