Tuesday 10 January 2012

If One Voting Disk is Corrupted - RAC

Voting disks are used in a RAC configuration for maintaining nodes membership. They are critical pieces in a cluster configuration. Starting with ORACLE 10gR2, it is possible to mirror the OCR and the voting disks. Using the default mirroring template, the minimum number of voting disks necessary for a normal functioning is two.

Scenario Setup
In this scenario it is simulated the crash of one voting disk by using the following steps:
1.    Identify votings:
crsctl query css votedisk
0. 0 /dev/raw/raw1
1. 0 /dev/raw/raw2
2. 0 /dev/raw/raw3
2.    corrupt one of the voting disks (as root):
dd if=/dev/zero /dev/raw/raw3 bs=1M


Recoverability Steps
1.    check the “$CRS_HOME/log/[hostname]/alert[hostname].log” file. The following message should be written there which allows us to determine which voting disk became corrupted:
[cssd(9120)]CRS-1604:CSSD voting file is offline: /opt/oracle/product/10.2.0/crs_1/Voting1. Details in /opt/oracle/product/10.2.0/crs_1/log/aut-arz-ractest1/cssd/ocssd.log.

2.    According to the above listing the Voting1 is the corrupted disk. Shutdown the CRS stack:
srvctl stop database -d fitstest -o immediate
srvctl stop asm -n aut-vie-ractest1
srvctl stop asm -n aut-arz-ractest1
srvctl stop nodeapps -n aut-vie-ractest1
srvctl stop nodeapps -n aut-arz-ractest1
crs_stat -t
On every node as root:
crsctl stop crs

3.    Pick a good voting from the remaining ones and copy it over the corrupted one:
dd if=/dev/raw/raw4 of=/dev/raw/raw3 bs=1M

4.    Start CRS (on every node as root):
crsctl start crs

5.    Check log file “$CRS_HOME/log/[hostname]/alert[hostname].log”. It should look like shown below:
[cssd(14463)]CRS-1601:CSSD Reconfiguration complete. Active nodes are aut-vie-ractest1 aut-arz-ractest1 .
2011-11-10 15:19:53.954
[crsd(14268)]CRS-1012:The OCR service started on node aut-vie-ractest1.
2011-11-10 11:20:53.987
[evmd(14228)]CRS-1401:EVMD started on node aut-vie-ractest1.
2011-11-10 11:20:55.861 [crsd(14268)]CRS-1201:CRSD started on node aut-vie-ractest1.

6.    After a couple of minutes check the status of the whole CRS stack:
[oracle@aut-vie-ractest1 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM2.asm application ONLINE ONLINE aut-...est1
ora....T1.lsnr application ONLINE ONLINE aut-...est1
ora....st1.gsd application ONLINE ONLINE aut-...est1
ora....st1.ons application ONLINE ONLINE aut-...est1
ora....st1.vip application ONLINE ONLINE aut-...est1
ora....SM1.asm application ONLINE ONLINE aut-...est1
ora....T1.lsnr application ONLINE ONLINE aut-...est1
ora....st1.gsd application ONLINE ONLINE aut-...est1
ora....st1.ons application ONLINE ONLINE aut-...est1
ora....st1.vip application ONLINE ONLINE aut-...est1
ora....test.db application ONLINE ONLINE aut-...est1
ora....t1.inst application ONLINE ONLINE aut-...est1
ora....t2.inst application ONLINE ONLINE aut-...est1

No comments:

Post a Comment