My Journey as an Oracle DBA: April 2012

Sunday 22 April 2012

Oracle Voting disk concept

Voting Disk is a file that resides in the shared storage area and must be accessible by all nodes in the cluster. All nodes in the cluster register their heart-beat information in the voting disk, so as to confirm that they are all operational. If heart-beat information of any node in the voting disk is not available that node will be evicted from the cluster. The CSS (Cluster Synchronization Service) daemon in the clusterware maintains the heart beat of all nodes to the voting disk. When any node is not able to send heartbeat to voting disk, then it will reboot itself, thus help avoiding the split-brain syndrome.

For high availability, Oracle recommends that you have a minimum of three or odd number (3 or greater) of voting disks.

According to Oracle – “An absolute majority of voting disks configured (more than half) must be available and responsive at all times for Oracle Clusterware to operate.” which means to survive from loss of ‘N’ voting disks, you must configure atleast ‘2N+1’ voting disks.

Suppose you have 5 voting disks configured for your 2 Node environment, then you can survive even after loss of 2 voting disks.

Keep in mind that, having multiple voting disks is reasonable if you keep them on different disks/volumes/san arrays so that your cluster can survive even during the loss of one disk/volume/array. So, there is no point in configuring multiple voting disks on a single disk/lun/array.

But there is a special scenario, where all the nodes in the cluster can see the all voting disks but the cluster-interconnect between the nodes failed, to avoid split-brain syndrome in this scenario, node eviction must happen. But the question here is which one?

According to Oracle – “The node with the lower node number will survive the eviction (The first node to join the cluster)”. So, the very first one that joined in the cluster will survive from eviction.

Operations

1.) Obtaining voting disk information –

         $ crsctl query css votedisk

2.) Adding Voting Disks

First shut down Oracle Clusterware on all nodes, then use the following commands as the root user.

          # crsctl add css [path of voting disk]

3.) Removing a voting disk:

First shut down Oracle Clusterware on all nodes, then use the following commands as the root user.

          # crsctl delete css [path of voting disk]

Do not use -force option for adding or removing voting disk while the Oracle Clusterware stack is active, it can corrupt cluster configuration. You can use it when cluster is down and can modify the voting disk configuration using either of these commands without interacting with active Oracle Clusterware daemons.

4.) Backing up Voting Disks

Perform backup operation whenever there is change in the configuration like add/delete of new nodes or add/delete of voting disks.

          $ dd if=current_voting_disk of=backup_file_name

If your voting disk is stored on a raw device, specify the device name -

 
          $ dd if=/dev/sdd1 of=/tmp/vd1_.dmp

5.) Recovering Voting Disks

A Bad voting disk can be recovered using a backup copy.

 
          $ dd if=backup_file_name of=current_voting_disk

Sunday 15 April 2012

Bug 9689310 Excessive child cursors / high VERSION_COUNT / OERI:17059 due to bind mismatch

We have observed High version Count for the below queries in ORCL database

Data Collected:

SQL_ID MODULE VERSION_COUNT

92bxfrxwu51v5 oracl@orcl01.xyz.in (TNS V1-V3) 17
087g1afdgd4fh oracl@orcl01.xyz.in (TNS V1-V3) 12
carh6s6nmqh5w oracl@orcl01.xyz.in (TNS V1-V3) 25
b0kxk28kwfhn9 otrans@picggx02.xyz.in (TNS V1-V3) 46
ftuhvswznfpv8 otrans@picggx02.xyz.in (TNS V1-V3) 17

SqlText ( b0kxk28kwfhn9)
INSERT INTO M_CACHE01_ITEM( MESG_TYP,SITE_ID,JOB_NO,JOB_DT,INV_S
ERNO,ITEM_NO,SCHEME_CD,RITC_CODE,DESC_GD01,DESC_GD02,DESC_GD03,Q
TY_UNITS,QUANTITY,ITEM_RATE,UNIT_OF_RATE,NO_OF_UNITS,PMV,NOTN_NO
,THIRD_PARTY,FILE_NAME,UNIQUE_ID) VALUES (:1,:2,:3,TO_DATE(:4,'Y
YYY-MM-DD HH24:MI:SS'),:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:1
6,:17,:18,:19,:20,:21)

Cause:
V$SQL_BIND_METADATA showed that the bind lengths are different among cursors for the same bind variables

Impact:
High amount of latch: library cache wait event is observed in the database due to high version count

Solution:
Apply Patch 9689310

Sunday 1 April 2012

Oracle Log Mining Steps on RAC

*The whole activity must be done on one session.

*Check the number of archive logs for which you want to do log mining.

Check the sequence of archive logs from the both for which you want to mining

Suppose,
10.10.10.1 - node1
Mon Dec 13 12:26:16 2010
Thread 1 advanced to log sequence 15289 (LGWR switch)

Mon Dec 13 16:02:50 2010
Thread 1 advanced to log sequence 15298 (LGWR switch)

10.10.10.2 - node2
Mon Dec 13 12:26:16 2010
Thread 2 advanced to log sequence 15062 (LGWR switch)

Mon Dec 13 16:22:04 2010
Thread 2 advanced to log sequence 15072 (LGWR switch)

Steps:
1)set utl_file_dir
alter system set utl_file_dir='/oracle/test' scope=spfile (The path/location where you put the archive logs)
Bounce the database

2)Extracting the LogMiner Dictionary to a Flat file (flat file is one of the option)
EXECUTE DBMS_LOGMNR_D.BUILD('dictionary.ora','/oracle/test',DBMS_LOGMNR_D.STORE_IN_FLAT_FILE);

3)Add Logfiles
EXECUTE DBMS_LOGMNR.ADD_LOGFILE(LOGFILENAME => '/oracle/test/1_15289_66678900.dbf',OPTIONS => DBMS_LOGMNR.new);
EXECUTE DBMS_LOGMNR.ADD_LOGFILE(LOGFILENAME => '/oracle/test/1_15290_66678900.dbf',OPTIONS => DBMS_LOGMNR.ADDFILE);
EXECUTE DBMS_LOGMNR.ADD_LOGFILE(LOGFILENAME => '/oracle/test/1_15291_66678900.dbf',OPTIONS => DBMS_LOGMNR.ADDFILE);
EXECUTE DBMS_LOGMNR.ADD_LOGFILE(LOGFILENAME => '/oracle/test/1_15292_66678900.dbf',OPTIONS => DBMS_LOGMNR.ADDFILE);

4)Filtering Data that is returned (Showing only Committed transactions)
EXECUTE DBMS_LOGMNR.START_LOGMNR(OPTIONS => DBMS_LOGMNR.COMMITTED_DATA_ONLY);

5)Extract relevant information from the view v$logmnr_contents
Example:
select seg_owner,operation,sql_redo,sql_undo from v$logmnr_contents where SEG_owner='SCOTT';

6)To end the Log mining process
exec dbms_logmnr.end_logmnr;

Note: Check the below doc for proper information
Using LogMiner, How to determine the cause of lots of redo generation. [ID 300395.1]
LogMiner Utility Release 8.1.x - 10g [ID 291686.1]