Knowledge increases exponentially. Today, you probably own more books than great universities of times past—Cambridge University owned less than two hundred books in the fifteenth century. First came the invention of writing, then alphabets, then paper, then the printing press, then mechanization. Each step caused an exponential increase in the collective human knowledge. In our generation, Al Gore invented the internet and the last barriers to the spread of knowledge have been broken. Today, everybody has the ability to contribute, communicate, and collaborate. We are all caught up in a tsunami, an avalanche, a conflagration, a veritable explosion of knowledge for the betterment of humankind. This is the blog of the good folks at Database Specialists, a brave band of Oracle database administrators from the great state of California. We bid you greeting, traveler. We hope you find something of value on these pages and we wish you good fortune in your journey.

Oracle Introduces New Database Software

Oracle recently announced a new product, Oracle Big Data SQL. It’s designed to run on the Oracle Exadata database machine, creating a shared connection for exchanging data more efficiently between a company’s data silos. One of the long-time issues with collecting big data is that it’s difficult to compile and really maximize its value. Oracle Big Data SQL is designed to help companies leverage big data – getting more bang for their buck, so to speak. Oracle announced the new technology last month, and it’s now available for purchase through a remote database administrator or an Oracle product manager.

Integration
Big data has made a huge impact on the ways enterprise companies operate, from marketing programs to crucial decisions. Many businesses keep information captured from social media, relational databases and other sources in separate places. Oracle Big Data SQL can scan, cache and integrate data from a diverse range of sources, making the information more useful.

Compatibility
The database software uses a single query optimizer to connect data from Exadata, Hadoop, NoSQL and relational databases. It also offers a high performance Smart Scan and indexing capability. Oracle Big Data SQL runs on the Oracle Big Data Appliance and with Cloudera software. It’s compatible with Oracle 12c, which was released last summer. Companies migrating to 12c can discuss the need for the Exadata and Big Data SQL with their database administrator to determine whether it’s a good fit for their needs.

Security
In a time where security is an important element of any IT product, Big Data SQL software can protect data across a variety of connected engines. It uses levels of authentication and encryption that follow all of the rules and permissions used by Oracle, extending them to all of the user’s connected data sources. Big Data SQL can also do local queries on Hadoop and NoSQL databases without physically transferring the data, which eliminates risks that can potentially take place during data transfer.

Clearing Unused Corrupt (Nologging) Blocks

I recently had the unfortunate experience of having to restore / recover a database that had undergone ‘nologging’ operations after the latest backup. Of course, the presence of blocks recovered through a nologging operation causes ORA-01578 errors such as

ORA-01578: ORACLE data block corrupted (file # 11, block # 91658)
ORA-01110: data file 11: ‘+DATADG/tollprd/datafile/toll_t01.256.849581711′
ORA-26040: Data block was loaded using the NOLOGGING option

Thankfully the users were able to drop and recreate (and reload) the affected tables & lob segments and we rebuilt the indexes so the users were able to quickly get back to work on the system. However RMAN was still populating v$database_block_corruption i.e. the blocks that were previously marked corrupt remained. Although they were now no longer attached to segments were still showing as corrupt. Some of the blocks were even re-attached to new segments and still registering as corrupt. The problem is that once a block gets marked as corrupt it needs to be re-formatted to clear the corruption and that only happens when a new INSERT operation is performed on the block. So to clean the blocks we have to force an INSERT on them. If the block had been re-attached to a table you can reformatted it simply inserting rows into the table until the block gets used which will clean it. You do not need to commit the inserted rows the rollback does not undo the block formatting. If the corrupt block is ‘free space’ you need to lay down a dummy table on top of it and insert rows into the dummy table until the block gets used. At that point you can drop the dummy table.

Here is a more detailed description of the process we went through, the broad strokes are described in metalink note 336133.1. Firstly ensure that v$database_block_corruption is current which you can do by letting a full RMAN backup run or running a

RMAN> validate database;
or
RMAN> validate database check logical;
command.

For each combination of FILE#,BLOCK# in v$database_block_corruption check the status of the segment. As far as I could find there are only 3 possibilities:

a) the block is attached to a segment
b) the block is a segment header
c) the block is now free space

We only encountered a) & c) so will focus on these.

Case a) the block is attached to a segment
To figure out if a block is attached to a segment or is free space run the following SQL

SELECT *
FROM dba_extents
WHERE file_id = FILE_ID# AND
BLOCK# between block_id AND block_id + blocks - 1;

plug-in the values for FILE_ID# & BLOCK# from v$database_block_corruption. If a block does not appear attached to a segment check to see if it is free space using query

SELECT *
FROM dba_free_space
WHERE file_id = FILE_ID#
AND BLOCK# between block_id AND block_id + blocks - 1;

again plug-in the values for FILE_ID# & BLOCK# from v$database_block_corruption. Now remember in our situation we had already dropped or truncated all the tables that contained nologging blocks and rebuilt the indexes - so any nologging blocks now attached to a table segment should be empty. You can confirm this by checking for rows with that FILE#, BLOCK# using query

SELECT COUNT(*)
FROM < TABLE_NAME>
WHERE DBMS_ROWID.ROWID_TO_ABSOLUTE_FNO(ROWID) = FILE# AND
DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) = BLOCK#

the < TABLE_NAME> is the segment_name (of segment_type=TABLE of course) from the dba_extents query above. Assuming you get no rows from the above then the corrupt block is attached to the table segment, empty and marked corrupt you can clear it by simple inserting rows into the table. Once you insert into the affected block Oracle will re-format it and clear the corruption. You can monitor to see when the block is clean by running

RMAN> validate datafile FILE#;

and checking v$database_block_corruption. Again, we don’t need to commit the new rows a rollback is sufficient and does not undo the block formatting.

Case c) the block is now free space
The slightly more complicated situation is if the block is ‘free space’. As I mentioned above in this case we need to create a table on top of the corrupt block e.g.

create table dummy_nl
(
n number,
c varchar2(4000)
) nologging tablespace TOLL_T01 pctfree 99;

then manually allocate extents to the above table until you cover the block i.e. run

alter table dummy_nl allocate extent
(datafile ‘+DATADG/tollprd/datafile/toll_t01.256.849581711′ size 100M);

Obviously depending upon your situation you may want to change the extent size. You can check to see if you have covered the block by re-running the dba_extents query from above. Once the block appears into your dummy_nl table you want to ‘fill’ the segment I did this by simple inserting a row

INSERT INTO DUMMY_NL VALUES (1,RPAD(’*',4000,’*'))

and then rapidly doubling up the number of rows via repeated runs of the following

INSERT INTO DUMMY_NL SELECT * FROM DUMMY_NL

Obviously we can estimate the size of the segment we need to allocate using query

SELECT sum(bytes) FROM dba_segments WHERE segment_name = ‘DUMMY_NL’

As you approach this size you may want to tailor back the inserts using a rownum filter such as

INSERT INTO DUMMY_NL SELECT * FROM DUMMY_NL WHERE ROWNUM <= 500000

You just need to fill up the pre-allocated space. At any point you can run the rman

RMAN> validate datafile FILE#;

command and check v$database_block_corruption. Once you have overwritten the corrupt block it will clear from v$database_block_corruption of course. Finally once you have cleared the block you can drop the DUMMY_NL table.

Intel Introduces Custom Chip Design for Oracle

Intel has added one more custom processor to its list of tailored processor chips. Oracle’s new database machine, the Exadata Database X4-8, will now have core count and processor frequency dynamically scaled for its software workloads. This is ideal for businesses involved with integration of big data, a growing mobile community and more complex database needs.

Intel’s new Oracle chip is based on the Xeon E7 v2. The X4-8 uses 15 processing core chips for its database server hardware, making the most of its power with elastic acceleration of peak performance without increasing the data footprint.

The new Exadata Database X4-8 offers up to 12 terabytes of system memory per rack, a higher InfiniBand interconnect speed and double the local disk space. The new Intel chip was designed specifically to help Oracle software users maximize In-Memory. While the two companies have been collaborating for over a decade, the latest adaptation of the chip to create the Xeon E7-8895 v2 processor is a big step towards meeting the growing demands of modern businesses.

“This type of collaboration requires complete understanding of what the software is doing, and what the hardware can do better to take advantage of the software capabilities,” said Edward Goldman in a recent blog for Intel.

Oracle and Intel have collaborated on three new projects; Oracle Exadata Database Machine X4-8, Oracle Sun Server X4-8 and Oracle Database In-Memory. One of their goals is to help the available infrastructure to keep up with trends toward higher speeds, lower energy consumption, specific workloads and the realities of cloud computing.

Understanding UNDO

Sooner or later, every Oracle DBA will hit a problem of some sort that is related to the concept of Oracle “Undo”. The Oracle Undo performs a few very important functions in the database; these being the support of read-consistent queries and Flashback queries, supporting transactions (COMMIT and ROLLBACK), and recovering from database crashes.

Along with the SYSTEM, SYSAUX, and TEMP tablespaces, the UNDO tablespaces are a required part. If SYSTEM or UNDO are lost during a database crash, it’s very likely that whichever is corrupted will be required to be recovered from a backup before the database can be opened.

The function of Undo basically keeps a copy of uncommitted changes to the database in a separate region of the database, so that transactions can be rolled back, and also to allow the database to have a ‘read consistent’ view of the database. In fact, it is impossible to create/commit a transaction that is larger than the size of the undo segment that it’s assigned to; in other words, if you have a tiny UNDO tablespace (say 100M), you probably won’t be able to commit any transactions bigger than say half that size/50M, and will be guaranteed to not be able to commit a transaction bigger than 100M. So at a real minimum, UNDO has to be at least that big.

This can be a confusing concept; in Oracle, when you start a transaction, until you commit it, you will be looking at the information in the database as of the time you started a transaction, the so called ‘read-consistent’ feature. If you have a very long-running query (likely doing a full-table scan), and other processes are changing data in the table while the scan is running, the scan may at some point figure out that it cannot figure out what the block of data previously looked like. When this happens, you get the ‘ORA-01555′ error, which is a very common error, even today. It causes no corruption, but whatever statement was running is rolled back and killed.

Years ago (v7, v8) we had to manually configure ‘undo’, which was then called ‘rollback’, and it often ended up that the configuration wasn’t perfect – in fact, it was common practice to create a few extra-large rollback segments dedicated to very large transactions. Starting in v9, Oracle created a new method to keep undo information, in these new ‘undo’ segments in an undo tablespace. It will automatically creat and size new undo segments in the undo tablespace, and it will automatically grow them, and offline/drop them, dynamically, depending on the load.

So – Oracle today has the ability to auto-resize the undo tablespace if the UNDO_MANAGEMENT=AUTO parameter is set. The key parameter to tune retention is the UNDO_RETENTION parameter, which, if the undo tablespace is growable, will make a guarantee that that number of seconds of redo will not fail to be retrieved. So, if your longest query takes 20 minutes, if you set UNDO_RETENTION for 20 minutes, it will not get any ORA-01555 errors, even if the data changes in the table while it’s running.

You can force Oracle to try to implement the UNDO_RETENTION policy by setting an undo retention, then not allowing the undo data files to expand. However, when that happens, Oracle may not be able to supply the undo retention, and you will get ORA-01555s. If it’s too small, you’ll start to see application error logs (and the redo log & trace files) fill up with ORA-01555 errors.

There’s a lot of information available about UNDO; Oracle has some pretty good documentation here (for v11.2):  Oracle UNDO documentation for 11.2

Org version 7.8.11 with Emacs version 24

Recovery of ASM Disks Using Backup Blocks on AIX

There are some simple commands (not all well documented)  to debug an ORA-15032 “not all operations performed” when a disk group won’t mount.

  1. Validate the device permissions and ownership on all nodes with a simple “ls -l /dev/rdisk*” command and correct any issues found.
  2. Run the storage check of the cluster verify utility.
    cluvfy comp ssa -n node1,node2,node3,node4 -verbose
  3. Under the +ASM owner using ASM command line “asmcmd” list the devices and status with “lsdsk” and “lsdsk –candidate”.   This could also be done in SQL*Plus selecting from v$asm_disk.  The output should be the same on all nodes.

A common problem during SAN failures are multiple reboots of the host, and on successive restarts AIX can modify the ODM, causing a disk to become a ‘CANDIDATE’ rather than associated with the proper disk group (even if the device paths are maintained).   Review the referenced documents below prior to running the repair operation:

Assigning a Physical Volume ID (PVID) To An Existing ASM Disk Corrupts the ASM Disk Header (Doc ID 353761.1)
Corrective Action for ASM Diskgroup with Disks Having PVIDs on AIX (Doc ID 750016.1)

The “kfed” command can be used to detect and correct the issue using the disk backup blocks.  Note: Always have a backup of the database, OCR and CRS disks before attempting these commands.
  1. Confirm the disks backup blocks are available with the “kfed read” command and sample output below.
    kfed  read /dev/rhdisk# anum=1 blknum=254 > rhdisk#_backup_block.txt
    kfbh.block.blk: 254 ; 0×004: blk=254
    kfdhdb.dskname: RECO_DG_0004 ; 0×028: length=14
  2. Shtudown the database instances and ASM on all nodes
  3. Repair the disk
    kfed  repair /dev/rhdisk#
  4. Start ASM and verify the disk is online and part of the intended disk group.

Just the basics; managing Oracle disk space

Introduction

Here’s a quick quiz; of all of the problems that can beset an Oracle database, which one happens the most often? Do you think that it’s concurrency or locking issues? Issues with CPU being overwhelmed, or disk I/O rate being overwhelmed? Oracle internal bugs? Disk space issues? Problems encountered with backup & recovery? Account Security? Memory issues with either performance problems or not being large enough?

Think about that for a minute…

What’s you’re answer? If you answered disk space issues, collect your prize!

Indeed, in my experience with literally hundreds of Oracle databases over many years, the most common issue I encounter is in the management of disk space. This presentation will cover a few ideas on why this happens, as well as how to approach it.

Why do Oracle databases run out of disk space?

Here are the main reasons that this problem occurs so frequently:

  • Most Oracle database schemas are not designed around the lifecycle of data; from input, to updating, to archival.
  • Most Oracle databases only grow over time, never shrinking in terms of data volume retained.
  • When rows are deleted from a table, neither the table segment shrinks, or the index segment shrink, so these segments only get bigger over time by default.
  • When segments are removed from a tablespaces, there is no automatic shrinking of the datafiles involved – they only get bigger over time.
  • Oracle database performance is most often tied to disk I/O performance, and the IOPS that a disk subsystem can handle. Using cheaper lower-performing disk subsystems just won’t work. This means that it usually takes quite a bit of effort to order, purchse, install and configure the disk subsystem.
  • The amount of overhead space that Oracle requires is quite large. Extreme example; we want to store data in a 1-row, 1 column table with an index. Overhead is over 80%; Oracle stores data in blocks (not bits); Oracle doesn’t use the entire block by default by keeps 10% for future use; it allocates ITLs and other data structures in each block; indexes can effectivily double the amount of space a row requires; there’s overhead for Oracle internal uses like TEMPORARY space, REDO/UNDO, and the system data dictionary. This all amounts to each row stored requiring a LOT more space (+50% to +95%) than the raw data.
  • Being a database, it’s the nature of a database to grow over time as data is added.
  • It is unfortunately not monitored closely often.
  • It’s the nature of business to grow in chaotic spurts. This can completely obliterate any former planning that has been done.
  • Disk subsystems have been improving and getting cheaper every year since their invention. It doesn’t make economic sense to buy too much more disk space than what you’ll require today, if you won’t need it say in a year.

For the reasons above, and more, it’s just a reality that needs to be handled, and one that needs to be understood by those signing purchase orders for more.

Tips on managing and maximizing disk space

Tips on managing the growth itself

  • Regularly estimate how fast things are growing, and have regular reports to management re: how much time is left before there is no room assuming no change in the business requirements. This is done by tablespace.
  • If there are a lot of tablepaces, spread among many mount points or LUNs, it can be difficult to figure out which one(s) will fill up first. Fewer, larger LUNs can help quite a bit – and for this reason ASM usually makes this easier, because most ASM setups don’t have many different disk groups, typically a SYSTEM diskgroup, followed by a DATA diskgroup, and then a flash-recovery area.
  • Have a monitor program in place that will escalate warnings as a tablespace gets full, like Database Rx ®.

Conclusion

The main point to the presentation is this; even though disk space usage is often chaotic in nature, it needs to be closely monitored so that the database can continue to grow. Failing this responsibility will usually result in an application crashing, and being unable to support users, and the problem can take a considerable time to solve (days) because it often requires new hardware.

Date: 2014-03-24 Mon

Author: Jay Stanley

Org version 7.8.11 with Emacs version 24

Oracle Scheduled Job Email

Introduction

In Oracle v10, a new feature was introduced to replace the existing DBMS_JOB job scheduling software and API: the DBMS_SCHEDULER. Up until that point, the only ways to schedule reoccurring jobs inside the database was to use a very simplistic system, DBMS_JOB. Limitations included:

  • No logging of individual runs successful or failed logs
  • Could only call stored procedures,
  • Had no method of ‘chaining’ jobs together so that if one failed, special action could be taken
  • No method of notification if a job failed,
  • No concept of execution windows of time, or of using Oracle Services, or RAC in scheduling jobs.
  • Lots of other features that are too large to list

The original DBA_JOBS API can be thought of similar to the Unix crond package.

In this short paper, I’ll show how you can enable the new DBMS_SCHEDULER package to send email to a set of recipients if a job does not complete successfully.

Using the new DBMS_SCHEDULER

The upside to using DBMS_SCHEDULER is that it is far more flexible, and it contains many new features over DBMS_JOB.

The disadvantage is that it is more complicated to work with.

Let’s start by creating a test stored procedure, and then enable the scheduler to send email when it fails.

Oracle considers any PLSQL_BLOCK type job to fail, when it exits with unhandled exceptions. If the code to a called PLSQL_BLOCK looks like this:

BEGIN
  mytest;
EXCEPTION
  WHEN OTHERS THEN
    NULL;
END;

Then it will never fail according to how Oracle measures things. Any exception that mytest will raise, will be silently handled by the exception block.

Here’s a test stored procedure that will fail every time:

create or replace procedure showme
as
begin
raise NO_DATA_FOUND;
end;
/

Every time this procedure is called, it will exit with an unhandled exception – that being NO_DATA_FOUND, which is a pre-defined Oracle exception.

Let’s try running this in sql*plus:

SQL> execute showme;
BEGIN showme; END;
*
ERROR at line 1:
ORA-01403: no data found
ORA-06512: at "JSTANLEY.SHOWME", line 5
ORA-06512: at line 1
SQL>

Let’s create a job that will run this hourly at 50 minutes after each hour. In this example, the schema name is ‘JSTANLEY’.

BEGIN
dbms_scheduler.create_job(
JOB_NAME => 'JSTANLEY.JOB_SHOW_ME',
job_type => 'PLSQL_BLOCK',
repeat_interval => 'FREQ=hourly;byminute=50',
start_date => sysdate,
enabled    =>  TRUE,
comments   => 'test',
job_action=>'BEGIN showme; END;'
);
END;
/

We can check this by viewing the DBA_SCHEDULED_JOBS table.

Now that we have the job in there, let’s enable full logging for that job, as well as email if it fails.

SQL> execute dbms_scheduler.set_attribute('JSTANLEY.JOB_SHOW_ME','LOGGING_LEVEL', DBMS_SCHEDULER.LOGGING_FULL);

First, we need to configure DBMS_SCHEDULER so that it knows how to send mail – where the mailserver is, and what the sender’s email address is. You can find out if these are configured with the following script: note that if they’re NULL, email will not work. This quick script will print out all of the attributes. In this example, we’re really only checking the EMAIL_SENDER, EMAIL_SERVER, and MAX_JOB_SLAVE_PROCESSES.

set serveroutput on size 100000
set lines 200 pages 10000
declare
  v_value varchar2(2000);
begin
  dbms_scheduler.get_scheduler_attribute('current_open_window',v_value);
  dbms_output.put_line('current open window '||v_value);
  dbms_scheduler.get_scheduler_attribute('default_timezone',v_value);
  dbms_output.put_line('default_timezone '||v_value);
  dbms_scheduler.get_scheduler_attribute('email_sender',v_value);
  dbms_output.put_line('email_sender '||v_value);
  dbms_scheduler.get_scheduler_attribute('email_server',v_value);
  dbms_output.put_line('email_server '||v_value);
  dbms_scheduler.get_scheduler_attribute('event_expiry_time',v_value);
  dbms_output.put_line('event_expiry_time '||v_value);
  dbms_scheduler.get_scheduler_attribute('log_history',v_value);
  dbms_output.put_line('log_history '||v_value);
  dbms_scheduler.get_scheduler_attribute('max_job_slave_processes',v_value);
  dbms_output.put_line('max_job_slave_processes '||v_value);
end;
/

To set them, run the DBMS_SCHEDULER.SET_SCHEDULER_ATTRIBUTE() call as follows:

SQL> execute  dbms_scheduler.set_scheduler_attribute('email_sender','jstanley@dbspecialists.com');
SQL> execute  dbms_scheduler.set_scheduler_attribute('email_server','localhost');

Now that it is configured, we need to enable emailing for the job in question.

BEGIN
dbms_scheduler.add_job_email_notification(
job_name=>'JSTANLEY.JOB_SHOW_ME',
recipients=>'jstanley@dbspecialists.com',
sender=>'jstanley@dbspecialists.com',
events=>'JOB_FAILED,JOB_BROKEN,JOB_SCH_LIM_REACHED,JOB_CHAIN_STALLED,JOB_OVER_MAX_DUR'
);
END;
/

OK!

Now, assuming that the SMTP host is correct, and that it doesn’t refuse email sent from the IP address of the Oracle database or object to the sender’s email address, when the job fails, it will send an email!

Below is a sample email in full, including headers:

X-From-Line: jstanley@dbspecialists.com  Wed Mar 19 13:50:59 2014
Return-Path: <jstanley@dbspecialists.com>
Received: from localhost (localhost [127.0.0.1])
by dbspecialists.com (8.13.1/8.13.1) with SMTP id s2JKoxOC010760
for jstanley@dbspecialists.com; Wed, 19 Mar 2014 13:50:59 -0700
Date: Wed, 19 Mar 2014 13:50:59 -0700
From: Jay Stanley <jstanley@dbspecialists.com>
X-Gnus-Mail-Source: pop:jstanley@dbrxdev.databaserx.com
Message-Id: <201403192050.s2JKoxOC010760@dbspecialists.com>
To: jstanley@dbspecialists.com
Subject: Oracle Scheduler Job Notification - JSTANLEY.JOB_SHOW_ME JOB_FAILED
X-IMAPbase: 1371854655 571
Status: O
X-UID: 571
X-Content-Length: 235
X-Keywords:
Content-Length: 235
Lines: 13
Xref: emacs1 Mail.Other:10705
Job: JSTANLEY.JOB_SHOW_ME
Event: JOB_FAILED
Date: 19-MAR-14 01.50.57.838644 PM -07:00
Log id: 13776
Job class: DEFAULT_JOB_CLASS
Run count: 25
Failure count: 22
Retry count: 0
Error code: 1403
Error message:
ORA-01403: no data found

Conclusion

This shows a relatively easy way to enable the DBMS_SCHEDULER API to inform users when a scheduled job exits with an unhandled exception.

Note that setting up email for DBMS_SCHEDULER is completely unrelated to setting it up for Enterprise Manager, or to the DBMS_MAIL package – it is self-contained.

Date: 2014-03-19 Wed

Author: Jay Stanley

Org version 7.8.11 with Emacs version 24

Just the basics; Upgrading the Oracle RDBMS

Introduction

This is the first installment of a series of shorter whitepapers that just cover the basics of a particular aspect of the Oracle RDBMS. The first one is on the subject of upgrading Oracle. After doing literally hundreds of Oracle RDBMS upgrades and patches over the past 18 years, these are my recommendations.

Why upgrade?

The first question that you need to ask is why? Most often, it is to patch a particular bug that is actively causing problems in production. Less often, it’s to gain extra features that the old version lacks. Sometimes, it’s to insure that the database’s security is as good as it can be. There’s one other reason to upgrade; if Oracle is going to stop supporting and patching the version you are on.

All of these are very reasonable reasons to perform an upgrade. If it’s for another reason, you might ask why? – if something is working well and is stable, why change things?

The next question is: what version are you going to upgrade to? If at all possible, stay away from upgrading to the 1st release of a particular version of Oracle. Oracle 7.0 was terrible, while 7.3.4 was arguably one of most stable versions ever; 8.0.1 was terrible, but by 8.2.0.4 it was quite stable; 9.0.1 was extremely buggy, but it was by 9.2.0.5; 10.0.1 was horrible but things were fixed by 10.2.0.5, and I will not be suprised if 12.0.1 will be terrible in terms of the number of open bugs and stability problems. Those early versions are like fire; mess with them and you WILL get burned.

Recommendations for the upgrade itself

  • Definitely do a trial upgrade of a copy of your target database. Yes, it is a lot of work, but there is no better way to really understand if you run into any issues, and to understand how long it’ll take.
  • Also, do a trial rollback after the upgrade. It’s surpising how often when problems do develop in a production upgrade, that nobody tried it first.
  • Be sure to keep your COMPATIBLE setting at the old version until the upgrade is succesful and you are sure that you will not be rolling it back.
  • Upgrade into a different $ORACLE_HOME if possible. Since v11.2, Oracle has changed the way that most upgrades happen; a new, patched $ORACLE_HOME is created, and then the database is shut down, restarted with the new $ORACLE_HOME, and upgraded. This is a much, much better method as it allows you to fail back to the old version if the upgrade fails, and there is no downtime while the patch is applied to the binaries making the maintenence window shorter.
  • Shut down the listener if possible during the upgrade – this keeps application sessions from connecting. A client connection starting work in the middle of an upgrade can really mess things up.
  • Thoroughly read the latest updated My Oracle Support (MOS) document on the exact upgrade(s) that are happening, and make notes. This can be a long read, but any part of it can become immediately important if an upgrade has trouble. Create a separate full plan document that has each step, what to expect, how long each should take, and any difficulties encountered.
  • Be very sure that you have a good backup before starting the upgrade. Another option if using Enterprise Edition is to create a Guaranteed Restore Point, so that it’s relatively trivial and fast to undo the upgrade.
  • Be sure that there is enough tablespace (especially SYSTEM, SYSAUX and TEMP) for the upgrade. If these fill up, it can again cause a lot of problems.
  • Take extra care to review memory pools as well – some upgrades really require a JAVA_POOL memory pool for example, and will fail if there isn’t enough room.
  • Be sure to de-schedule all dba_scheduler & dba_jobs database jobs for the upgrade, possibly by setting JOB_QUEUE_PROCESSES=0. Note that export-datapump (and import-datapump) requires a non-zero JOB_QUEUE_PROCESSES to work. It is likely a good idea to make a copy of the DBA_JOBS and DBA_SCHEDULER_JOBS tables before starting for reference. Also, de-schedule any jobs scheduled through crontab like backups as well.
  • Communicate to operations management very clearly regarding how long it will take for the upgrade, including how long it will take to back it out worst-case, and how long it will take to restore the database from backup if everything goes horribly wrong.
  • Be sure that either a QA team, or someone who does quality control, can check the database & application after upgrade to make sure there are no problems. This should be done after the ‘trial’ upgrade, as well as the production upgrade.
  • Take a backup of any parameter files (Oracle RDBMS, ASM) before starting – upgrades can change parameter values in spfiles, so if a database is rolled back it can come in quite handy.
  • If you have separate development/QA-test/production databases, start by upgrading your development first, then QA, and finally production. This should make it more unlikely that new issues are encountered when the most critical database(s) (production) are upgraded, and it gives time to the developers and QA folks to find any potential issues before the upgrade happens.
  • Schedule the upgrade to minimize impact. If the busiest day of the week for the database is over a weekend, don’t do an upgrade on Friday evening. If it’s least busy at 2:00 am on a Tuesday, consider doing it then.

Conclusion

Following my recommendations above will very likely make any Oracle RDBMS upgrade go smoother.

RAC Attack - Installing a 2-node RAC Database on Your Laptop

Terry Sutton and I recently had the pleasure of volunteering at a session of 12cR1 “RAC Attack” lab held at the November Conference of the North California Oracle Users Group (NoCOUG). Jimmy Brock from NoCOUG also volunteered and we had several other experts drop by and help out. For those of you unfamiliar with a “RAC Attack” lab, it is a hands-on session where attendees follow the instructions from the RAC Attack Wikibook either

http://en.wikibooks.org/wiki/RAC_Attack_-_Oracle_Cluster_Database_at_Home for 11gR2 or
http://en.wikibooks.org/wiki/RAC_Attack_-_Oracle_Cluster_Database_at_Home/RAC_Attack_12c for 12cR1

and install a 2-node Oracle RAC database on their laptop. This was the first time that NoCOUG had held such a lab and our first involvement with one but it was well attended with over 30 attendees and since they had to turn out the lights to get us to leave, it appeared to be a great success. A more concise description of the material is available at the above (12c) URL


RAC Attack is a free curriculum and platform for hands-on learning labs related to Oracle RAC (cluster database), motivated by the belief that the best way to learn RAC is through lots of hands-on experience. The original contributors were Jeremy Schneider, Dan Norris and Parto Jalili. This curriculum has been used since 2008 by organizers of events, by instructors in classes and by individuals at home. Its goal is to help students learn about Oracle RAC cluster databases through guided examples.

RAC Attack differs in depth from other tutorials currently available.
Every keystroke and mouse click is carefully documented here.
The process is covered from the very beginning to the very end - from the very first installation of the Virtual Hypervisor on your laptop to various experiments on your running cluster database… with everything in between.
The labs in the main workbook have been tested thoroughly and repeatedly.

Of course the 12c material is relatively new and “Part II: Exploring RAC” is still under development but it is already excellent and can be used without any major issues to complete the install.

Limitations: Of course this virtualization setup is purely for education and it necessarily omits a few features that a production RAC configuration would typically have for example jumbo frames on the interconnect, bonded network interfaces and multi-path I/O.

We came across a few issues in the lab most of them unrelated to the material as such but more about the preparation and environment. Here is a set of tips for anybody wanting to complete the material by themselves (or in a future lab).

1.
We used the latest available version of Virtualbox available from http://www.virtualbox.org which is v4.3.2. Jimmy was our “Mac Guy” and confirmed the lab works fine with this version on Macs.

2.
As with all downloads it is highly desirable to verify the downloads by comparing the digest against the source to ensure the download is correct. One promising lab attempt was brought to an end by an apparently corrupt Oracle Linux .iso.

3.
When unzipping the four Oracle Grid Infrastructure and Database 12c (12.1.0.1) downloads you want to end up with only two directories one called ‘grid’ and one called ‘database’. Some unzipping alternatives leave you with 4 directories in which case you should merge them back into the above two.

4.
One annoying problem several attendees encountered was due to their “hardware virtualization extensions”. These appeared as a few errors but the most common was people getting “kernel requires an x86_64 cpu, but only detected an i686 cpu unable to boot - please use a kernel appropriate for your cpu” when Oracle Linux attempts to start towards the end of the install. This issue is discussed in detail in the VirtualBox FAQ at

https://forums.virtualbox.org/viewtopic.php?f=24&t=8669

but a summary is “…a 64bit GUEST NEEDS VT-X/AMD-V to be active and enabled….”. If you get the above error you need to enable these extensions by changing your BIOS settings. Usually in the Chipset or Processor menus but possibly even in Security Settings or other non-standard menu names. If anybody knows of a simple test to ensure you have these extensions enabled please let us know. We know of the VMware Processor Check for 64-Bit Compatibility, but it unfortunately appears to require registration.

5.
In a lab environment with people working on their laptops there is an ergonomic consideration. It is tricky on a relatively small single screen to constantly shuffle backwards and forwards between the RAC Attack Wikibook and the Virtual Machine windows. If you are following the material in an environment with multiple large monitors this is not a concern but in a lab, on a laptop, it is a hassle. The NoCOUG folks were kind enough to provide attendees with a nice printed manual which helps greatly. However some of the commands themselves are complicated enough that they need to be pasted from an electronic source rather than manually typed and ‘bind’ is very fussy about indentation! In short when the instructions say ‘cut and paste the following command’ we found it most reliable to cut and paste from the wikibook. If you do not have access to the web page ensure you have a suitable electronic source available.

6.
If you only have 8G of RAM available on your laptop we suggest allocating 1536M to each VM. The RAC Attack material is currently a little inconsistent here on this point e.g. they show 3072MB as a “Memory Size” in some screens but have “Base Memory: 1500MB” in others.

7.
Before step 11 on URL http://en.wikibooks.org/wiki/RAC_Attack_-_Oracle_Cluster_Database_at_Home/RAC_Attack_12c/Configure_Second_Linux_VM (i.e. before rebooting the collabn2 VM) check your /etc/sysconfig/network-scripts/ifcfg-eth2 file and if it contains a line

PEERDNS=YES

then change it to

PEERDNS=no

This step is mentioned in the collabn1 setup but not when cloning collabn2. Without this step I found that my /etc/resolv.conf file was getting overwritten on collabn2 which messes up the dns lookups.

In summary, the RAC Attack material is a powerful resource for anybody wanting to learn about RAC. Its creators and contributors have done a great job of documenting the process so thoroughly that anybody can complete it in a few hours.

Many thanks to Iggy Fernandez. Jimmy Brock and the rest of the NoCOUG Board for their invaluable assistance and support during the preparation and running of the lab.

A quick introduction to dbms_pipe

One thing about working with PL/SQL is the fact that it runs inside the database (Oracle Forms nonwithstanding), rather than in a normal OS process. This means that any interaction with the database server OS itself isn’t easily possible. You can create a library in ‘C’ and call that from pl/sql, but this isn’t usually a trivial exercise.

So, if you have a need for your pl/sql program to communicate with an outside process, there is one really simple way to go about it – to use an API called dbms_pipe. This API is available under all versions of Oracle Server, including Standard Edition One, it has been around for a very long time, since Oracle v7, and in my experience it is very stable.

The Oracle dbms_pipe API is not related to Unix (or Posix) pipes – it is its own thing, that only communicates between sql sessions. It’s a very quick method that you can use to inform an external process immediately when attention is needed. For example, one way to tell if something is needed in a database is to poll a table or query every few seconds – if the query is resource-intensive, this can really eat up a lot of CPU, I/O and memory. Instead of polling, the dbms_pipe API can be used to reduce this load considerably.

dbms_pipe -vs- AQ

Oracle also has a product that can be licensed for Enterprise Edition called AQ, or Advanced Queuing. This allows you to create message queues in the database, where basically you have one or more publisher processes which add things to the queue, and one or more subscriber processes which take messages from the queue, though it also supports a point-to-point model. AQ is a very advanced and complicated package; messages can queue within or between systems, you can have rules and transformations for these messages in the database, and it can interface with other systems using HTTP, HTTPS or SMTP. However, AQ is an extra license on top of the Enterprise Edition, so it’s not cheap and it’s not possible to use with a Standard Edition database.

The dbms_pipe is not that sophisticated at all. Using dbms_pipe, one process creates a message for the named database pipe, and other processes can retrieve it, very much like a FIFO queue (first-in first-out). The process that creates the message is never blocked. When the process that is reading the message asks for one, and there isn’t one available, it will be blocked there, and will ‘hold’ there for a timeout period.

Adding a message to a dbms_pipe

Before using dbms_pipe, the schema or owner of the procedure using it will need access;

SQL> grant execute on dbms_pipe to myuser;

The API requires that you name the pipe over which messages will flow. I believe the name can be anything, but the processes that put messages into the pipe, and take messages out, will need to use the same pipe name.

To send a message, the dbms_pipe.pack_message() is called, followed by the dbms_pipe.send_message() procedure like this:

create or replace procedure send_pipe(p_message in varchar2)
is
  v_status number;
begin
  dbms_pipe.pack_message(length(p_message));
  dbms_pipe.pack_message(p_message);
  v_status := dbms_pipe.send_message('BIGWIDGET');
  if (v_status != 0) then
     raise_application_error(-20099,'put error');
  end if;
end send_pipe;

Note that in the above code, the pipe name BIGWIDGET is given as the 1st argument to send_message. There are actually two messages sent; the first being the length of the message (numeric), and the second being the message itself. This makes it easy for the routine which retrieves the message(s) to know that they got the entire message – obviously this could become more sophisticated still (maybe a CRC for the message as well?), but it isn’t strictly necessary.

When the above code is run, a message is added to the queue named BIGWIDGET, and it will stay there.

Retrieving a message from dbms_pipe

The important thing to remember with regards to retrieving messages from the pipe is that by default, the routine will not return until a message is sent to the same named pipe.

Here’s some code to retrieve the message:

procedure receive_pipe(p_message out varchar2)
is
   v_length  number;
   v_message varchar2(2000);
   v_status  number;
begin
   v_status := dbms_pipe.receive_message('BIGWIDGET');
   if (v_status = 0) then
      dbms_pipe.unpack_message(v_length);
      dbms_pipe.unpack_message(v_message);
      p_message := v_message;
   else
     raise_application_error(-20091,'get error');
   end if;
end recieve_pipe;

Note that if there is a message for that pipe, this will return immediately. If there are no messages, it will just sit there and wait until there is one.

Example in real-life

Your company, Big and Small Widgets Inc make really big widgets, and also some small ones. The big ones are so large and expensive that it is a major accomplishment when your company sells one. The company has an IRC server where a great bulk of the interoffice business gets taken care of.

The Big Boss would like a message to appear in your IRC channel whenever there is a sale of a Big Widget. The problem is that it is a very expensive query to determine if a Big Widget has been sold – the query typically takes a minute or more, and it consumes a lot of database resources. It is important to the Big Boss that the message be sent immediately when the sale closes.

It’s easy enough to create an IRC ‘bot’ that will do exactly what you need in terms of connecting to the IRC server, polling messages in the database, and sending them, perhaps using the Perl POE::Component::IRC perl module.

The problem here is that the IRC Bot won’t be able to just execute that really big query over and over without affecting other things that the database is being used for.

This is where using dbms_pipe would come in really handy.

Step 1: in the IRC Bot, have a call to receive_pipe() above. The bot will execute this, and it will basically pause until a message is received. Following the call to receive_message(), it will perform the large query to figure out the details of the sale, and then it will broadcast it on the IRC channel(s) desired.

Step 2: change the sales logic so that as soon as a big widget is sold, a call to the send_pipe() procedure is called.

In this way, when a sale of a Big Widget happens, the IRC bot will immediately know, and will immediately inform the IRC channel as soon as it completes the (resource intensive) query. They query will not run, unless a sale has just taken place, reducing the load on the database.

Conclusion

Although the dbms_pipe API is relatively simplistic, it can definitely is very useful when one SQL process needs to be informed by another SQL process that something has happened.

HTML generated by org-mode 6.33x in emacs 23