Intelligent date handling
There is an often overlooked issue when people design new databases; how to accurately store dates and times. Often, little or no thought is given to the best way to do this.
Date and time columns in a database are often used to answer key business questions, such as how long does it take us to fill an order?. Date and time columns are also used often to aggregate data from several sources, for example to pinpoint exact timespans of production issues, and this can include servers that span timezones.
This paper gives some insight into the best way to approach this issue.
Overview of available datatypes in Oracle
The main datatypes for storing dates/times in Oracle are:
|DATE||Stores date down to 1 second resolution; no timezone given|
|TIMESTAMP||Stores date with given resolution, down to 9 digits of a fractional second|
|TIMESTAMP WITH TIMEZONE||Same as TIMESTAMP, but in addition the TIMEZONE is stored.|
|TIMESTAMP WITH LOCAL TIMEZONE||Same as TIMESTAMP, but all dates normalized to a particular timezone, and no timezone information stored.|
Old-school; store dates as
DATE datatypes in the local timezone
There are many, many database that choose to simply store dates as of the date in the local timezone; in other words, if the database is physically housed in New York City, NY, the ‘EST5EDT’ timezone is used.
This actually works fairly well for many applications, until data needs to be stored that originates somewhere else, or needs to be presented to a user in a different timezone. For data that would originated in London, UK, to store it accurately will require any loading process to calculate the time in EST5EDT given time in UTC.
The largest problem, however, has to do with daylight savings time; each year, for most timezones, an hour is lost (disappears), and is then added as daylight savings time comes into and out of force each year.
This can really cause havoc on all sorts of calculations; for example, if you are graphing orders by hour, you will have a time in the fall where there will be an hour that has NO orders, and another hour in the spring where there will be two hours of orders compressed into 1 hour of time. This can even affect monthly calculations, as the month when daylight savings time ends will be short an hour.
It can also cause havoc on scheduled jobs, say using ‘cron’ on a Unix machine for hourly jobs. You wil have an hour in the spring where no jobs will be run, and another in the fall where jobs will run twice in an hour.
A further issue also arises; when a company expands into markets that are not in the original timezone, often this causes a lot of problems, because there can be only one ‘official’ timezone for the company’s data. This makes it more difficult to aggregate data from sources in separate timezones.
New-school; store timezone information
The ‘new’ way to handle this, is to store a timestamp along with a time-zone, using the
TIMESTAMP WITH TIMEZONE or
TIMESTAMP WITH LOCAL TIMEZONE datatype. Unfortunately, this also will show problems when you graph things during daylight-savings-time switches. Another disadvantage is that it takes a lot more storage to store a TIMESTAMP WITH TIMEZONE than it does a DATE.
The other issue with this, is that we’ve seen many, many changes in the past 3-4 years with regards to when each timezone is specified to gain or lose an hour each year. This means that if you use this method, you’re going to need to insure that all of your timezone patches, that typically include not only the Oracle database, but also the OS and any end-user language libraries (ie Java) are up to date.
Post-modern school; just use UTC everywhere!
There is one timezone that is guaranteed to never have any daylight-savings-time issues; that being UTC, otherwise known as GMT.
The great thing about UTC (Universal Coordinated Time), is that it is generally accepted as being the best timezone to use for all date calculations; it’s used often in science, for instance, for that very reason.
You will never lose or gain an hour each year using UTC.
And, UTC can be very reliably used to calculate the date in any timezone in history.
If you store your dates as a
DATE datatype, and use UTC, it should be realatively easy for any client program to calculate the date/time to any timezone that your user wishes.
The imporance of using ntp, the Network Time Protocol
When you use the database to set times and dates in an application (ie using sysdate), you’re making the assumption that the time on the database server is correct.
There is really only one way that it can be correct with modern hardware, and that requires that the server is running the ‘ntp’, or Network Time Protocol daemon, and that it is slaved with known-good time servers. It is a good idea to have more than one time server that it syncs with.
All servers, no matter how expensive, cannot keep very accurate time; the time will ‘drift’ after a period of days/months. WHen this happens, all of the timestamps recorded in a database can be off.
It is especially important to insure that ntp is running correctly and reliably when you use Oracle RAC; RAC really depends on extremely accurate date/time synchronization across all of the nodes in the RAC cluster to perform reliably.
Conclusion; use UTC everywhere; let clients calculate time for end-users
For the reasons above, I would suggest that:
- All servers/network equipment be based in UTC,
- All dates are stored as a ‘date’ datatype in UTC in the database,
- Any user interface should calculate the dates to display based upon the user’s or application’s preferences
- Definitely set up ntp to insure that the server’s time/date is correct at all times.
Date: February 16, 2011
HTML generated by org-mode 6.34c in emacs 22