Knowledge increases exponentially. Today, you probably own more books than great universities of times past—Cambridge University owned less than two hundred books in the fifteenth century. First came the invention of writing, then alphabets, then paper, then the printing press, then mechanization. Each step caused an exponential increase in the collective human knowledge. In our generation, Al Gore invented the internet and the last barriers to the spread of knowledge have been broken. Today, everybody has the ability to contribute, communicate, and collaborate. We are all caught up in a tsunami, an avalanche, a conflagration, a veritable explosion of knowledge for the betterment of humankind. This is the blog of the good folks at Database Specialists, a brave band of Oracle database administrators from the great state of California. We bid you greeting, traveler. We hope you find something of value on these pages and we wish you good fortune in your journey.

A Sanity Check for External Redundancy

As with any DBA my days are filled with what seem to be unrelated tasks to the profession - writing reports, attending meetings, installing releases, planning capacity, answering alerts, patching binaries and  running upgrades.  It is easy to forget three things I should be focusing on as a remote DBA:

  • Security
  • Availability
  • Performance

Some aspects of these core functions can be delegated to other groups. For example Network Engineering may manage LDAP to augment database authentication (security), or as in this case external redundancy handled disk failures in ASM (availability).

Beware: YOU MAY DELEGATE AUTHORITY, HOWEVER RESPONSIBILITY CAN BE VERY STICKY.  Do not think that because you authorize someone or something to take over part of a task you escape responsibility when things go wrong.  In this recent example I noticed warning messages in the ASM alert log after engineers completed an FRU battery maintenance on a SAN.

ORA-19816: WARNING: Files may exist in db_recovery_file_dest that are not known to database.
ORA-17502: ksfdcre:4 Failed to create file +FRA
ORA-00600: internal error code, arguments: [kffbAddBlk04], [],

Note:  A recoverable backup was taken prior to the storage maintenance per standard operating procedures.

At this point we took the database out of cluster and redirected the db_recovery_file_dest to local storage, investigating why the FRA disk group mounted and then crashed the instance with an ORA-600 shortly afterwards.

Suspecting metadata corruption we attempted a repair:

SQL> alter diskgroup FRA check all repair
NOTE: starting check of diskgroup FRA
SUCCESS: check of diskgroup FRA  found no errors

A little more time goes by and then:

ORA-00600: internal error code, arguments: [kccpb_sanity_check_2]…
Shutting down instance (abort)

After contacting Oracle support and exhausting our options to mount the FRA we ended up initializing the disks with dd and rebuilding the disk group.

Summary: Backups to a Flash Recovery Area may not be as reliable as you think - especially with external redundancy on a single physical or virtual disk.  In these situations it is a good practice to maintain additional redo log members on at least two disk groups, and utilize RMAN to regularly copy archivelogs and backups to a secondary location.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>