Knowledge increases exponentially. Today, you probably own more books than great universities of times past—Cambridge University owned less than two hundred books in the fifteenth century. First came the invention of writing, then alphabets, then paper, then the printing press, then mechanization. Each step caused an exponential increase in the collective human knowledge. In our generation, Al Gore invented the internet and the last barriers to the spread of knowledge have been broken. Today, everybody has the ability to contribute, communicate, and collaborate. We are all caught up in a tsunami, an avalanche, a conflagration, a veritable explosion of knowledge for the betterment of humankind. This is the blog of the good folks at Database Specialists, a brave band of Oracle database administrators from the great state of California. We bid you greeting, traveler. We hope you find something of value on these pages and we wish you good fortune in your journey.

Exploring Common and Uncommon Bottlenecks

They are nearly everywhere, preventing you from moving forward with your everyday business – bottlenecks. Whether it’s a capacity, resource or a performance issue – your business cycle is affected by such bottlenecks. There are several common bottlenecks that you can easily recognize and adapt your business to improve upon, but then there are also a few uncommon bottlenecks that can be more difficult to recognize and fix.
Some of the general bottlenecks that we are going to focus on how to solve are:
-    Decision Control: If the decision process has to flow through one central team member, then a bottleneck is inevitable.  A growing company cannot sustain growth with just one person in the decision-making position.
-    Solution: Allow for others within your business to make decisions.
-    Quality Assurance: This is the last step in the production process before it reaches its end customer. There are multiple steps to assure the quality of the product, and you can’t skip any steps because that could compromise the quality of it.  Therefore, it can sometimes cause a bottleneck.
-    Solution: If you have a QA department, make sure there are enough people available to help with the work load.
In addition to process and general bottlenecks, there are IT bottlenecks where the flow of data and information can completely stop if the system isn’t set up to process it all efficiently.
-    Capacity: When the system has insufficient capacity, the flow of information can be slowed or even stopped completely. Slow performance can snowball into a plethora of other performance problems for the company.
-    Solution: Improve the system’s ability to process data including storage size and memory.
-    Workloads: Networks can slow tremendously when there are a variety of processes and workloads running at the same time.
-    Solution: Isolate and organize workloads so that the data can be handled and found more efficiently.
When companies use Oracle for as their database software, they can experience bottlenecks without necessarily realizing that they aren’t running to their full level of productivity.  To find out more information about how to solve your Oracle bottlenecks and improve your productivity, read our full whitepaper here.

Leveraging Data: How to Improve Your Business

Understanding the information that your business is collecting is crucial, but learning how to leverage that information to better your business relationships is by far more important. Often times, the information is gathered but not necessarily sorted into a digestible or useful forms and therefore requires you to sift through the information and figure out how to use it to help improve upon your business.
Analysis & Organization
When your company collects data, there are 3 basic types of information that you will gather from your customers including:
- CRM: Which not only carries contact information for your customer but, product sensor data, barcode scanners, etc… to help track a product’s information and journey through the supply chain process.
- Customer Transactions: You can track what your customer buys and how often they buy it, the method of payment they used, and whether or not someone helped them during the buying process.
- Human Input – Work with your customers to gather the data you need by using satisfaction surveys, email queries, case studies and social media posts. They can tell you all you need to know when you ask them.
Why Outsourcing Helps
When it comes to understanding how to best leverage all of the data within your database, it sometimes makes the most sense to partner with a remote database administrator (DBA) to help. They carefully monitor the information to ensure that you’re properly managing and gathering data. In addition, DBA services will focus on improving the performance of your database, and plan out how you can leverage the information to better connect with your customers. Such services include:
- Security: a DBA specialist will provide the right monitoring tools and provide security checks and precautions to keep your information safe.
- Infrastructure: a DBA specialist will ensure that the right infrastructure is put in place for cohesiveness and ease of use.
- Repair & Recovery: if there is ever a problem, a DBA specialist can secure, recover and repair your database.
Improving Your Business
So, how can all of this actually improve your business? Well, by careful management and analysis – you can better connect with your audience and serve your customers. Here are few ways in which you can improve with better management:
- Increase sales – look at the data and see where your customers are coming from and trends and optimize upon that.
- Cross-selling – know what products your customers are buying and see how you can cross-sell other products or up-sell other products.
- Making a better product – whether it’s good feedback or negative, your company can take that feedback and improve your current product.
Overall, gathering information is helpful, but the most critical part of having a database is knowing how to read, analyze and leverage that information to connect better with your customers and drive more sales for your business. To find out more information, read our full whitepaper here.

Oracle Introduces New Database Software

Oracle recently announced a new product, Oracle Big Data SQL. It’s designed to run on the Oracle Exadata database machine, creating a shared connection for exchanging data more efficiently between a company’s data silos. One of the long-time issues with collecting big data is that it’s difficult to compile and really maximize its value. Oracle Big Data SQL is designed to help companies leverage big data – getting more bang for their buck, so to speak. Oracle announced the new technology last month, and it’s now available for purchase through a remote database administrator or an Oracle product manager.

Integration
Big data has made a huge impact on the ways enterprise companies operate, from marketing programs to crucial decisions. Many businesses keep information captured from social media, relational databases and other sources in separate places. Oracle Big Data SQL can scan, cache and integrate data from a diverse range of sources, making the information more useful.

Compatibility
The database software uses a single query optimizer to connect data from Exadata, Hadoop, NoSQL and relational databases. It also offers a high performance Smart Scan and indexing capability. Oracle Big Data SQL runs on the Oracle Big Data Appliance and with Cloudera software. It’s compatible with Oracle 12c, which was released last summer. Companies migrating to 12c can discuss the need for the Exadata and Big Data SQL with their database administrator to determine whether it’s a good fit for their needs.

Security
In a time where security is an important element of any IT product, Big Data SQL software can protect data across a variety of connected engines. It uses levels of authentication and encryption that follow all of the rules and permissions used by Oracle, extending them to all of the user’s connected data sources. Big Data SQL can also do local queries on Hadoop and NoSQL databases without physically transferring the data, which eliminates risks that can potentially take place during data transfer.

Clearing Unused Corrupt (Nologging) Blocks

I recently had the unfortunate experience of having to restore / recover a database that had undergone ‘nologging’ operations after the latest backup. Of course, the presence of blocks recovered through a nologging operation causes ORA-01578 errors such as

ORA-01578: ORACLE data block corrupted (file # 11, block # 91658)
ORA-01110: data file 11: ‘+DATADG/tollprd/datafile/toll_t01.256.849581711′
ORA-26040: Data block was loaded using the NOLOGGING option

Thankfully the users were able to drop and recreate (and reload) the affected tables & lob segments and we rebuilt the indexes so the users were able to quickly get back to work on the system. However RMAN was still populating v$database_block_corruption i.e. the blocks that were previously marked corrupt remained. Although they were now no longer attached to segments were still showing as corrupt. Some of the blocks were even re-attached to new segments and still registering as corrupt. The problem is that once a block gets marked as corrupt it needs to be re-formatted to clear the corruption and that only happens when a new INSERT operation is performed on the block. So to clean the blocks we have to force an INSERT on them. If the block had been re-attached to a table you can reformatted it simply inserting rows into the table until the block gets used which will clean it. You do not need to commit the inserted rows the rollback does not undo the block formatting. If the corrupt block is ‘free space’ you need to lay down a dummy table on top of it and insert rows into the dummy table until the block gets used. At that point you can drop the dummy table.

Here is a more detailed description of the process we went through, the broad strokes are described in metalink note 336133.1. Firstly ensure that v$database_block_corruption is current which you can do by letting a full RMAN backup run or running a

RMAN> validate database;
or
RMAN> validate database check logical;
command.

For each combination of FILE#,BLOCK# in v$database_block_corruption check the status of the segment. As far as I could find there are only 3 possibilities:

a) the block is attached to a segment
b) the block is a segment header
c) the block is now free space

We only encountered a) & c) so will focus on these.

Case a) the block is attached to a segment
To figure out if a block is attached to a segment or is free space run the following SQL

SELECT *
FROM dba_extents
WHERE file_id = FILE_ID# AND
BLOCK# between block_id AND block_id + blocks - 1;

plug-in the values for FILE_ID# & BLOCK# from v$database_block_corruption. If a block does not appear attached to a segment check to see if it is free space using query

SELECT *
FROM dba_free_space
WHERE file_id = FILE_ID#
AND BLOCK# between block_id AND block_id + blocks - 1;

again plug-in the values for FILE_ID# & BLOCK# from v$database_block_corruption. Now remember in our situation we had already dropped or truncated all the tables that contained nologging blocks and rebuilt the indexes - so any nologging blocks now attached to a table segment should be empty. You can confirm this by checking for rows with that FILE#, BLOCK# using query

SELECT COUNT(*)
FROM < TABLE_NAME>
WHERE DBMS_ROWID.ROWID_TO_ABSOLUTE_FNO(ROWID) = FILE# AND
DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) = BLOCK#

the < TABLE_NAME> is the segment_name (of segment_type=TABLE of course) from the dba_extents query above. Assuming you get no rows from the above then the corrupt block is attached to the table segment, empty and marked corrupt you can clear it by simple inserting rows into the table. Once you insert into the affected block Oracle will re-format it and clear the corruption. You can monitor to see when the block is clean by running

RMAN> validate datafile FILE#;

and checking v$database_block_corruption. Again, we don’t need to commit the new rows a rollback is sufficient and does not undo the block formatting.

Case c) the block is now free space
The slightly more complicated situation is if the block is ‘free space’. As I mentioned above in this case we need to create a table on top of the corrupt block e.g.

create table dummy_nl
(
n number,
c varchar2(4000)
) nologging tablespace TOLL_T01 pctfree 99;

then manually allocate extents to the above table until you cover the block i.e. run

alter table dummy_nl allocate extent
(datafile ‘+DATADG/tollprd/datafile/toll_t01.256.849581711′ size 100M);

Obviously depending upon your situation you may want to change the extent size. You can check to see if you have covered the block by re-running the dba_extents query from above. Once the block appears into your dummy_nl table you want to ‘fill’ the segment I did this by simple inserting a row

INSERT INTO DUMMY_NL VALUES (1,RPAD(’*',4000,’*'))

and then rapidly doubling up the number of rows via repeated runs of the following

INSERT INTO DUMMY_NL SELECT * FROM DUMMY_NL

Obviously we can estimate the size of the segment we need to allocate using query

SELECT sum(bytes) FROM dba_segments WHERE segment_name = ‘DUMMY_NL’

As you approach this size you may want to tailor back the inserts using a rownum filter such as

INSERT INTO DUMMY_NL SELECT * FROM DUMMY_NL WHERE ROWNUM <= 500000

You just need to fill up the pre-allocated space. At any point you can run the rman

RMAN> validate datafile FILE#;

command and check v$database_block_corruption. Once you have overwritten the corrupt block it will clear from v$database_block_corruption of course. Finally once you have cleared the block you can drop the DUMMY_NL table.

Intel Introduces Custom Chip Design for Oracle

Intel has added one more custom processor to its list of tailored processor chips. Oracle’s new database machine, the Exadata Database X4-8, will now have core count and processor frequency dynamically scaled for its software workloads. This is ideal for businesses involved with integration of big data, a growing mobile community and more complex database needs.

Intel’s new Oracle chip is based on the Xeon E7 v2. The X4-8 uses 15 processing core chips for its database server hardware, making the most of its power with elastic acceleration of peak performance without increasing the data footprint.

The new Exadata Database X4-8 offers up to 12 terabytes of system memory per rack, a higher InfiniBand interconnect speed and double the local disk space. The new Intel chip was designed specifically to help Oracle software users maximize In-Memory. While the two companies have been collaborating for over a decade, the latest adaptation of the chip to create the Xeon E7-8895 v2 processor is a big step towards meeting the growing demands of modern businesses.

“This type of collaboration requires complete understanding of what the software is doing, and what the hardware can do better to take advantage of the software capabilities,” said Edward Goldman in a recent blog for Intel.

Oracle and Intel have collaborated on three new projects; Oracle Exadata Database Machine X4-8, Oracle Sun Server X4-8 and Oracle Database In-Memory. One of their goals is to help the available infrastructure to keep up with trends toward higher speeds, lower energy consumption, specific workloads and the realities of cloud computing.

Understanding UNDO

Sooner or later, every Oracle DBA will hit a problem of some sort that is related to the concept of Oracle “Undo”. The Oracle Undo performs a few very important functions in the database; these being the support of read-consistent queries and Flashback queries, supporting transactions (COMMIT and ROLLBACK), and recovering from database crashes.

Along with the SYSTEM, SYSAUX, and TEMP tablespaces, the UNDO tablespaces are a required part. If SYSTEM or UNDO are lost during a database crash, it’s very likely that whichever is corrupted will be required to be recovered from a backup before the database can be opened.

The function of Undo basically keeps a copy of uncommitted changes to the database in a separate region of the database, so that transactions can be rolled back, and also to allow the database to have a ‘read consistent’ view of the database. In fact, it is impossible to create/commit a transaction that is larger than the size of the undo segment that it’s assigned to; in other words, if you have a tiny UNDO tablespace (say 100M), you probably won’t be able to commit any transactions bigger than say half that size/50M, and will be guaranteed to not be able to commit a transaction bigger than 100M. So at a real minimum, UNDO has to be at least that big.

This can be a confusing concept; in Oracle, when you start a transaction, until you commit it, you will be looking at the information in the database as of the time you started a transaction, the so called ‘read-consistent’ feature. If you have a very long-running query (likely doing a full-table scan), and other processes are changing data in the table while the scan is running, the scan may at some point figure out that it cannot figure out what the block of data previously looked like. When this happens, you get the ‘ORA-01555′ error, which is a very common error, even today. It causes no corruption, but whatever statement was running is rolled back and killed.

Years ago (v7, v8) we had to manually configure ‘undo’, which was then called ‘rollback’, and it often ended up that the configuration wasn’t perfect – in fact, it was common practice to create a few extra-large rollback segments dedicated to very large transactions. Starting in v9, Oracle created a new method to keep undo information, in these new ‘undo’ segments in an undo tablespace. It will automatically creat and size new undo segments in the undo tablespace, and it will automatically grow them, and offline/drop them, dynamically, depending on the load.

So – Oracle today has the ability to auto-resize the undo tablespace if the UNDO_MANAGEMENT=AUTO parameter is set. The key parameter to tune retention is the UNDO_RETENTION parameter, which, if the undo tablespace is growable, will make a guarantee that that number of seconds of redo will not fail to be retrieved. So, if your longest query takes 20 minutes, if you set UNDO_RETENTION for 20 minutes, it will not get any ORA-01555 errors, even if the data changes in the table while it’s running.

You can force Oracle to try to implement the UNDO_RETENTION policy by setting an undo retention, then not allowing the undo data files to expand. However, when that happens, Oracle may not be able to supply the undo retention, and you will get ORA-01555s. If it’s too small, you’ll start to see application error logs (and the redo log & trace files) fill up with ORA-01555 errors.

There’s a lot of information available about UNDO; Oracle has some pretty good documentation here (for v11.2):  Oracle UNDO documentation for 11.2

Org version 7.8.11 with Emacs version 24

Recovery of ASM Disks Using Backup Blocks on AIX

There are some simple commands (not all well documented)  to debug an ORA-15032 “not all operations performed” when a disk group won’t mount.

  1. Validate the device permissions and ownership on all nodes with a simple “ls -l /dev/rdisk*” command and correct any issues found.
  2. Run the storage check of the cluster verify utility.
    cluvfy comp ssa -n node1,node2,node3,node4 -verbose
  3. Under the +ASM owner using ASM command line “asmcmd” list the devices and status with “lsdsk” and “lsdsk –candidate”.   This could also be done in SQL*Plus selecting from v$asm_disk.  The output should be the same on all nodes.

A common problem during SAN failures are multiple reboots of the host, and on successive restarts AIX can modify the ODM, causing a disk to become a ‘CANDIDATE’ rather than associated with the proper disk group (even if the device paths are maintained).   Review the referenced documents below prior to running the repair operation:

Assigning a Physical Volume ID (PVID) To An Existing ASM Disk Corrupts the ASM Disk Header (Doc ID 353761.1)
Corrective Action for ASM Diskgroup with Disks Having PVIDs on AIX (Doc ID 750016.1)

The “kfed” command can be used to detect and correct the issue using the disk backup blocks.  Note: Always have a backup of the database, OCR and CRS disks before attempting these commands.
  1. Confirm the disks backup blocks are available with the “kfed read” command and sample output below.
    kfed  read /dev/rhdisk# anum=1 blknum=254 > rhdisk#_backup_block.txt
    kfbh.block.blk: 254 ; 0×004: blk=254
    kfdhdb.dskname: RECO_DG_0004 ; 0×028: length=14
  2. Shtudown the database instances and ASM on all nodes
  3. Repair the disk
    kfed  repair /dev/rhdisk#
  4. Start ASM and verify the disk is online and part of the intended disk group.

Just the basics; managing Oracle disk space

Introduction

Here’s a quick quiz; of all of the problems that can beset an Oracle database, which one happens the most often? Do you think that it’s concurrency or locking issues? Issues with CPU being overwhelmed, or disk I/O rate being overwhelmed? Oracle internal bugs? Disk space issues? Problems encountered with backup & recovery? Account Security? Memory issues with either performance problems or not being large enough?

Think about that for a minute…

What’s you’re answer? If you answered disk space issues, collect your prize!

Indeed, in my experience with literally hundreds of Oracle databases over many years, the most common issue I encounter is in the management of disk space. This presentation will cover a few ideas on why this happens, as well as how to approach it.

Why do Oracle databases run out of disk space?

Here are the main reasons that this problem occurs so frequently:

  • Most Oracle database schemas are not designed around the lifecycle of data; from input, to updating, to archival.
  • Most Oracle databases only grow over time, never shrinking in terms of data volume retained.
  • When rows are deleted from a table, neither the table segment shrinks, or the index segment shrink, so these segments only get bigger over time by default.
  • When segments are removed from a tablespaces, there is no automatic shrinking of the datafiles involved – they only get bigger over time.
  • Oracle database performance is most often tied to disk I/O performance, and the IOPS that a disk subsystem can handle. Using cheaper lower-performing disk subsystems just won’t work. This means that it usually takes quite a bit of effort to order, purchse, install and configure the disk subsystem.
  • The amount of overhead space that Oracle requires is quite large. Extreme example; we want to store data in a 1-row, 1 column table with an index. Overhead is over 80%; Oracle stores data in blocks (not bits); Oracle doesn’t use the entire block by default by keeps 10% for future use; it allocates ITLs and other data structures in each block; indexes can effectivily double the amount of space a row requires; there’s overhead for Oracle internal uses like TEMPORARY space, REDO/UNDO, and the system data dictionary. This all amounts to each row stored requiring a LOT more space (+50% to +95%) than the raw data.
  • Being a database, it’s the nature of a database to grow over time as data is added.
  • It is unfortunately not monitored closely often.
  • It’s the nature of business to grow in chaotic spurts. This can completely obliterate any former planning that has been done.
  • Disk subsystems have been improving and getting cheaper every year since their invention. It doesn’t make economic sense to buy too much more disk space than what you’ll require today, if you won’t need it say in a year.

For the reasons above, and more, it’s just a reality that needs to be handled, and one that needs to be understood by those signing purchase orders for more.

Tips on managing and maximizing disk space

Tips on managing the growth itself

  • Regularly estimate how fast things are growing, and have regular reports to management re: how much time is left before there is no room assuming no change in the business requirements. This is done by tablespace.
  • If there are a lot of tablepaces, spread among many mount points or LUNs, it can be difficult to figure out which one(s) will fill up first. Fewer, larger LUNs can help quite a bit – and for this reason ASM usually makes this easier, because most ASM setups don’t have many different disk groups, typically a SYSTEM diskgroup, followed by a DATA diskgroup, and then a flash-recovery area.
  • Have a monitor program in place that will escalate warnings as a tablespace gets full, like Database Rx ®.

Conclusion

The main point to the presentation is this; even though disk space usage is often chaotic in nature, it needs to be closely monitored so that the database can continue to grow. Failing this responsibility will usually result in an application crashing, and being unable to support users, and the problem can take a considerable time to solve (days) because it often requires new hardware.

Date: 2014-03-24 Mon

Author: Jay Stanley

Org version 7.8.11 with Emacs version 24

Oracle Scheduled Job Email

Introduction

In Oracle v10, a new feature was introduced to replace the existing DBMS_JOB job scheduling software and API: the DBMS_SCHEDULER. Up until that point, the only ways to schedule reoccurring jobs inside the database was to use a very simplistic system, DBMS_JOB. Limitations included:

  • No logging of individual runs successful or failed logs
  • Could only call stored procedures,
  • Had no method of ‘chaining’ jobs together so that if one failed, special action could be taken
  • No method of notification if a job failed,
  • No concept of execution windows of time, or of using Oracle Services, or RAC in scheduling jobs.
  • Lots of other features that are too large to list

The original DBA_JOBS API can be thought of similar to the Unix crond package.

In this short paper, I’ll show how you can enable the new DBMS_SCHEDULER package to send email to a set of recipients if a job does not complete successfully.

Using the new DBMS_SCHEDULER

The upside to using DBMS_SCHEDULER is that it is far more flexible, and it contains many new features over DBMS_JOB.

The disadvantage is that it is more complicated to work with.

Let’s start by creating a test stored procedure, and then enable the scheduler to send email when it fails.

Oracle considers any PLSQL_BLOCK type job to fail, when it exits with unhandled exceptions. If the code to a called PLSQL_BLOCK looks like this:

BEGIN
  mytest;
EXCEPTION
  WHEN OTHERS THEN
    NULL;
END;

Then it will never fail according to how Oracle measures things. Any exception that mytest will raise, will be silently handled by the exception block.

Here’s a test stored procedure that will fail every time:

create or replace procedure showme
as
begin
raise NO_DATA_FOUND;
end;
/

Every time this procedure is called, it will exit with an unhandled exception – that being NO_DATA_FOUND, which is a pre-defined Oracle exception.

Let’s try running this in sql*plus:

SQL> execute showme;
BEGIN showme; END;
*
ERROR at line 1:
ORA-01403: no data found
ORA-06512: at "JSTANLEY.SHOWME", line 5
ORA-06512: at line 1
SQL>

Let’s create a job that will run this hourly at 50 minutes after each hour. In this example, the schema name is ‘JSTANLEY’.

BEGIN
dbms_scheduler.create_job(
JOB_NAME => 'JSTANLEY.JOB_SHOW_ME',
job_type => 'PLSQL_BLOCK',
repeat_interval => 'FREQ=hourly;byminute=50',
start_date => sysdate,
enabled    =>  TRUE,
comments   => 'test',
job_action=>'BEGIN showme; END;'
);
END;
/

We can check this by viewing the DBA_SCHEDULED_JOBS table.

Now that we have the job in there, let’s enable full logging for that job, as well as email if it fails.

SQL> execute dbms_scheduler.set_attribute('JSTANLEY.JOB_SHOW_ME','LOGGING_LEVEL', DBMS_SCHEDULER.LOGGING_FULL);

First, we need to configure DBMS_SCHEDULER so that it knows how to send mail – where the mailserver is, and what the sender’s email address is. You can find out if these are configured with the following script: note that if they’re NULL, email will not work. This quick script will print out all of the attributes. In this example, we’re really only checking the EMAIL_SENDER, EMAIL_SERVER, and MAX_JOB_SLAVE_PROCESSES.

set serveroutput on size 100000
set lines 200 pages 10000
declare
  v_value varchar2(2000);
begin
  dbms_scheduler.get_scheduler_attribute('current_open_window',v_value);
  dbms_output.put_line('current open window '||v_value);
  dbms_scheduler.get_scheduler_attribute('default_timezone',v_value);
  dbms_output.put_line('default_timezone '||v_value);
  dbms_scheduler.get_scheduler_attribute('email_sender',v_value);
  dbms_output.put_line('email_sender '||v_value);
  dbms_scheduler.get_scheduler_attribute('email_server',v_value);
  dbms_output.put_line('email_server '||v_value);
  dbms_scheduler.get_scheduler_attribute('event_expiry_time',v_value);
  dbms_output.put_line('event_expiry_time '||v_value);
  dbms_scheduler.get_scheduler_attribute('log_history',v_value);
  dbms_output.put_line('log_history '||v_value);
  dbms_scheduler.get_scheduler_attribute('max_job_slave_processes',v_value);
  dbms_output.put_line('max_job_slave_processes '||v_value);
end;
/

To set them, run the DBMS_SCHEDULER.SET_SCHEDULER_ATTRIBUTE() call as follows:

SQL> execute  dbms_scheduler.set_scheduler_attribute('email_sender','jstanley@dbspecialists.com');
SQL> execute  dbms_scheduler.set_scheduler_attribute('email_server','localhost');

Now that it is configured, we need to enable emailing for the job in question.

BEGIN
dbms_scheduler.add_job_email_notification(
job_name=>'JSTANLEY.JOB_SHOW_ME',
recipients=>'jstanley@dbspecialists.com',
sender=>'jstanley@dbspecialists.com',
events=>'JOB_FAILED,JOB_BROKEN,JOB_SCH_LIM_REACHED,JOB_CHAIN_STALLED,JOB_OVER_MAX_DUR'
);
END;
/

OK!

Now, assuming that the SMTP host is correct, and that it doesn’t refuse email sent from the IP address of the Oracle database or object to the sender’s email address, when the job fails, it will send an email!

Below is a sample email in full, including headers:

X-From-Line: jstanley@dbspecialists.com  Wed Mar 19 13:50:59 2014
Return-Path: <jstanley@dbspecialists.com>
Received: from localhost (localhost [127.0.0.1])
by dbspecialists.com (8.13.1/8.13.1) with SMTP id s2JKoxOC010760
for jstanley@dbspecialists.com; Wed, 19 Mar 2014 13:50:59 -0700
Date: Wed, 19 Mar 2014 13:50:59 -0700
From: Jay Stanley <jstanley@dbspecialists.com>
X-Gnus-Mail-Source: pop:jstanley@dbrxdev.databaserx.com
Message-Id: <201403192050.s2JKoxOC010760@dbspecialists.com>
To: jstanley@dbspecialists.com
Subject: Oracle Scheduler Job Notification - JSTANLEY.JOB_SHOW_ME JOB_FAILED
X-IMAPbase: 1371854655 571
Status: O
X-UID: 571
X-Content-Length: 235
X-Keywords:
Content-Length: 235
Lines: 13
Xref: emacs1 Mail.Other:10705
Job: JSTANLEY.JOB_SHOW_ME
Event: JOB_FAILED
Date: 19-MAR-14 01.50.57.838644 PM -07:00
Log id: 13776
Job class: DEFAULT_JOB_CLASS
Run count: 25
Failure count: 22
Retry count: 0
Error code: 1403
Error message:
ORA-01403: no data found

Conclusion

This shows a relatively easy way to enable the DBMS_SCHEDULER API to inform users when a scheduled job exits with an unhandled exception.

Note that setting up email for DBMS_SCHEDULER is completely unrelated to setting it up for Enterprise Manager, or to the DBMS_MAIL package – it is self-contained.

Date: 2014-03-19 Wed

Author: Jay Stanley

Org version 7.8.11 with Emacs version 24

Just the basics; Upgrading the Oracle RDBMS

Introduction

This is the first installment of a series of shorter whitepapers that just cover the basics of a particular aspect of the Oracle RDBMS. The first one is on the subject of upgrading Oracle. After doing literally hundreds of Oracle RDBMS upgrades and patches over the past 18 years, these are my recommendations.

Why upgrade?

The first question that you need to ask is why? Most often, it is to patch a particular bug that is actively causing problems in production. Less often, it’s to gain extra features that the old version lacks. Sometimes, it’s to insure that the database’s security is as good as it can be. There’s one other reason to upgrade; if Oracle is going to stop supporting and patching the version you are on.

All of these are very reasonable reasons to perform an upgrade. If it’s for another reason, you might ask why? – if something is working well and is stable, why change things?

The next question is: what version are you going to upgrade to? If at all possible, stay away from upgrading to the 1st release of a particular version of Oracle. Oracle 7.0 was terrible, while 7.3.4 was arguably one of most stable versions ever; 8.0.1 was terrible, but by 8.2.0.4 it was quite stable; 9.0.1 was extremely buggy, but it was by 9.2.0.5; 10.0.1 was horrible but things were fixed by 10.2.0.5, and I will not be suprised if 12.0.1 will be terrible in terms of the number of open bugs and stability problems. Those early versions are like fire; mess with them and you WILL get burned.

Recommendations for the upgrade itself

  • Definitely do a trial upgrade of a copy of your target database. Yes, it is a lot of work, but there is no better way to really understand if you run into any issues, and to understand how long it’ll take.
  • Also, do a trial rollback after the upgrade. It’s surpising how often when problems do develop in a production upgrade, that nobody tried it first.
  • Be sure to keep your COMPATIBLE setting at the old version until the upgrade is succesful and you are sure that you will not be rolling it back.
  • Upgrade into a different $ORACLE_HOME if possible. Since v11.2, Oracle has changed the way that most upgrades happen; a new, patched $ORACLE_HOME is created, and then the database is shut down, restarted with the new $ORACLE_HOME, and upgraded. This is a much, much better method as it allows you to fail back to the old version if the upgrade fails, and there is no downtime while the patch is applied to the binaries making the maintenence window shorter.
  • Shut down the listener if possible during the upgrade – this keeps application sessions from connecting. A client connection starting work in the middle of an upgrade can really mess things up.
  • Thoroughly read the latest updated My Oracle Support (MOS) document on the exact upgrade(s) that are happening, and make notes. This can be a long read, but any part of it can become immediately important if an upgrade has trouble. Create a separate full plan document that has each step, what to expect, how long each should take, and any difficulties encountered.
  • Be very sure that you have a good backup before starting the upgrade. Another option if using Enterprise Edition is to create a Guaranteed Restore Point, so that it’s relatively trivial and fast to undo the upgrade.
  • Be sure that there is enough tablespace (especially SYSTEM, SYSAUX and TEMP) for the upgrade. If these fill up, it can again cause a lot of problems.
  • Take extra care to review memory pools as well – some upgrades really require a JAVA_POOL memory pool for example, and will fail if there isn’t enough room.
  • Be sure to de-schedule all dba_scheduler & dba_jobs database jobs for the upgrade, possibly by setting JOB_QUEUE_PROCESSES=0. Note that export-datapump (and import-datapump) requires a non-zero JOB_QUEUE_PROCESSES to work. It is likely a good idea to make a copy of the DBA_JOBS and DBA_SCHEDULER_JOBS tables before starting for reference. Also, de-schedule any jobs scheduled through crontab like backups as well.
  • Communicate to operations management very clearly regarding how long it will take for the upgrade, including how long it will take to back it out worst-case, and how long it will take to restore the database from backup if everything goes horribly wrong.
  • Be sure that either a QA team, or someone who does quality control, can check the database & application after upgrade to make sure there are no problems. This should be done after the ‘trial’ upgrade, as well as the production upgrade.
  • Take a backup of any parameter files (Oracle RDBMS, ASM) before starting – upgrades can change parameter values in spfiles, so if a database is rolled back it can come in quite handy.
  • If you have separate development/QA-test/production databases, start by upgrading your development first, then QA, and finally production. This should make it more unlikely that new issues are encountered when the most critical database(s) (production) are upgraded, and it gives time to the developers and QA folks to find any potential issues before the upgrade happens.
  • Schedule the upgrade to minimize impact. If the busiest day of the week for the database is over a weekend, don’t do an upgrade on Friday evening. If it’s least busy at 2:00 am on a Tuesday, consider doing it then.

Conclusion

Following my recommendations above will very likely make any Oracle RDBMS upgrade go smoother.