Understanding Oracle Redo
In the Oracle RDBMS, one of the most frequently misunderstood concepts I see in doing remote database administration, is the role and importance of redo. From the Oracle 10G Concepts manual, the definition of redo is:
The primary function of the redo log is to record all changes made to data. If a failure prevents modified data from being permanently written to the datafiles, then the changes can be obtained from the redo log, so work is never lost.
To me, that’s not an entirely good explanation.
It’s easier to understand redo if we look at the history of a transaction. A transaction here refers to a session that issues a set of SQL statements that do SQL inserts, updates and/or deletes, and then issues a ‘commit’ statement.
The first thing that happens while the inserts/updates/deletes are happening, is that there is a record of every datafile block change made, and these are inserted into a memory structure called the ‘redo log buffer’. When the session does a ‘commit’, a process (LGWR) flushes the redo log buffer information to the online redo log files. Every transaction that Oracle completes is written to these online redo log files; under normal circumstances, when a session commits changes to the database, the session does not regain control until information about those changes have been written to the online redo log file(s).
These online redo log files are of a fixed size; when they fill up, LGWR switches the online redo log file it is writing to, to the next redo log. What happens next depends on how the Oracle database is configured.
Oracle can be run in two different ways. The first is called archivelog mode, in which online redo log files are copied to a new destination when LGWR is done writing to them; these are called archived redo log files. Or, Oracle can run in noarchivelog mode, in which case these copies do not take place, and there is no ARCH process. As these file copies do take resources, running a database in noarchivelog mode can make it run faster; however, this also means that it will be impossible to recover a database in the future, so almost all databases run in archivelog mode. The copy of online redo log files to archived redo log files is done by a process called ‘ARCH’, the redo archiver. ARCH only copies files that LGWR has finished writing to.
The main advantage of running a database in archivelog mode, is that if you have a cold backup of the database (or a valid hot backup, available only in archivelog mode), you can recover the database, replaying the transactions in the archived (and online) redo logs, so that these transactions are not lost, and a full recovery is possible. Without them, you can only recover to the time that the backup took place, so transactions will very likely be lost.
Since ARCH only copies completed online redo logs if the database is in archivelog mode, the LGWR process will refuse to write to an online redo log file that has not yet completed archiving. If the LGWR process ends up waiting on the ARCH process, every session in the database that issues a ‘commit’ statement will need to wait; control will not be returned to the client program; no new commits are accepted. Sessions just wait for Oracle to return control to them, and this basically means that transaction througput goes to 0. No session is able to continue (after their commit), until this is ARCH completes, which then enables LGWR to work again.
Over time, these archived redo log files will eventually fill up whatever target directory they are being written to, so some process (there are several ways to do this) needs to remove them before that happens. If this is not accomplished, the target directory fills up. You can see from the model above, that the ARCH program won’t be able to complete archiving an online redo log file, the LGWR fills up all other available online redo log files, until finally LGWR ends up waiting on ARCH to complete, until disk space is available again.
There are a few interesting facts about this setup, which is virtually identical with every disk-based modern relational database:
- Redo is only generated by database changes (commonly inserts/updates/deletes). A read-only database will generate no redo.
- If the disk containing the online redo logs can’t keep up in pure bandwidth (Mb/second), transactions will slow down; there will be a pause after every client ‘commit’.
- If the disks can’t keep up in latency (if they take a long time for each write to complete), again, transactions will slow down. This is different than than bandwidth; if there are 1000 sessions committing every second, then there will likely be a demand of 1000 writes/second to the online redo log files. Most modern magnetic disks can only do about 100-200 operations per second, so this is a very common problem, usually solved by using RAID.
- If archiving is overall slower than the redo rate, then eventually LGWR will need to wait for ARCH to finish as explained above. Again, sessions end up waiting on the ‘commit’ statement.
- If a client application issues a lot of small transactions with frequent commits, LGWR will need to do more separate writes to the online redo log files. Doing fewer commits means that the IOPS of the online redo log files is reduced, improving throughput if this is a bottleneck.
- This has very little to do with the writes that happen to the datafiles themselves; this is a completely separate process (DBWR), and this happens using asychronous operating system calls. LGWR uses operating system calls which are synchronous; they will not return until it is confirmed that the data really is on disk.
- When ARCH copies an online redo log file and archives it, it does it continuously, so the target archivelog directory typically doesn’t require the IOPS that online redo log files do.
- If online redo log files are too small, since LGWR has to pause a bit when it switches log groups, this also causes sessions to wait on ‘commits’.
There are a lot of considerations about redo; I have not touched on the topics of standby databases, supplemental logging, or duplexed online redo log files, and there are many others.
To see if your database is suffering because of a slow redo stream, look for the ‘log file sync’ wait event in the Database Specialists DBRX portal if you are a customer, or you can use Enterprise Manager, AWR reports, statspack reports, or other tools to find this.