.\" $NetBSD: parsedate.3,v 1.26 2021/05/16 19:42:35 kre Exp $
.\"
.\" Copyright (c) 2006 The NetBSD Foundation, Inc.
.\" All rights reserved.
.\"
.\" This code is derived from software contributed to The NetBSD Foundation
.\" by Christos Zoulas.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
.\" POSSIBILITY OF SUCH DAMAGE.
.\"
.Dd May 16, 2021
.Dt PARSEDATE 3
.Os
.Sh NAME
.Nm parsedate
.Nd date parsing function
.Sh LIBRARY
.Lb libutil
.Sh SYNOPSIS
.In util.h
.Ft time_t
.Fn parsedate "const char *datestr" "const time_t *time" "const int *tzoff"
.Sh DESCRIPTION
The
.Fn parsedate
function parses a date and time from
.Ar datestr
described in English relative to an optional
.Ar time
point,
and an optional timezone offset (in minutes behind/west of UTC)
specified in
.Ar tzoff .
If
.Ar time
is
.Dv NULL
then the current time is used.
If
.Ar tzoff
is
.Dv NULL ,
then the current time zone is used.
.Pp
The
.Ar datestr
is a sequence of white-space separated items.
The white-space is optional if the concatenated items are not ambiguous.
The string contains data which can specify a base time (used in
conjunction with the
.Ar time
parameter, totally replacing that parameter's value if sufficient data
appears in
.Ar datestr
to do so), and data specifying an offset from the base time.
Both of those are optional.
If no data specifies the base time, then
.Nm
simply uses the value given by
.Ar \&*time
.Pq "or now" .
If there is no offset data then no offset is applied.
An empty
.Ar datestr ,
or a
.Ar datestr
containing nothing but whitespace,
is equivalent to midnight at the start of the day specified by
.Ar \&*time
.Pq "or today" .
.Pp
The following words have the indicated numeric meanings:
.Dv last =
\-1,
.Dv this =
0,
.Dv first , next ,
or
.Dv one =
1,
.Dv second
is unused so that it is not confused with
.Dq seconds ,
.Dv two =
2,
.Dv third
or
.Dv three =
3,
.Dv fourth
or
.Dv four =
4,
.Dv fifth
or
.Dv five =
5,
.Dv sixth
or
.Dv six =
6,
.Dv seventh
or
.Dv seven =
7,
.Dv eighth
or
.Dv eight =
8,
.Dv ninth
or
.Dv nine =
9,
.Dv tenth
or
.Dv ten =
10,
.Dv eleventh
or
.Dv eleven =
11,
.Dv twelfth
or
.Dv twelve =
12.
.Pp
The following words are recognized in English only:
.Dv AM ,
.Dv PM ,
.Dv a.m. ,
.Dv p.m. ,
.Dv midnight ,
.Dv mn ,
.Dv noon .
.Pp
The months:
.Dv january ,
.Dv february ,
.Dv march ,
.Dv april ,
.Dv may ,
.Dv june ,
.Dv july ,
.Dv august ,
.Dv september ,
.Dv october ,
.Dv november ,
.Dv december ,
and common abbreviations for them.
When a month name (or its ordinal number) is given,
the number of some particular day of that month is required to accompany it.
This is generally true of any data that specifies a period
with a duration longer than a day, so simply specifying a year,
or a month, is invalid, as also is specifying a year and a month.
.Pp
The days of the week:
.Dv sunday ,
.Dv monday ,
.Dv tuesday ,
.Dv wednesday ,
.Dv thursday ,
.Dv friday ,
.Dv saturday ,
and common abbreviations for them.
Weekday names are typically ignored if any other data
is given to specify the date, even if the name given
is not the day on which the specified date occurred.
.Pp
Time units:
.Dv year ,
.Dv month ,
.Dv fortnight ,
.Dv week ,
.Dv day ,
.Dv hour ,
.Dv minute ,
.Dv min ,
.Dv second ,
.Dv sec ,
.Dv tomorrow ,
.Dv yesterday .
.Pp
Timezone names:
.Dv gmt (+0000) ,
.Dv ut (+0000) ,
.Dv utc (+0000) ,
.Dv wet (+0000) ,
.Dv bst (+0100) ,
.Dv wat (-0100) ,
.Dv at (-0200) ,
.Dv nft (-0330) ,
.Dv nst (-0330) ,
.Dv ndt (-0230) ,
.Dv ast (-0400) ,
.Dv adt (-0300) ,
.Dv est (-0500) ,
.Dv edt (-0400) ,
.Dv cst (-0600) ,
.Dv cdt (-0500) ,
.Dv mst (-0700) ,
.Dv mdt (-0600) ,
.Dv pst (-0800) ,
.Dv pdt (-0700) ,
.Dv yst (-0900) ,
.Dv ydt (-0800) ,
.Dv hst (-1000) ,
.Dv hdt (-0900) ,
.Dv cat (-1000) ,
.Dv ahst (-1000) ,
.Dv nt (-1100) ,
.Dv idlw (-1200) ,
.Dv cet (+0100) ,
.Dv met (+0100) ,
.Dv mewt (+0100) ,
.Dv mest (+0200) ,
.Dv swt (+0100) ,
.Dv sst (+0200) ,
.Dv fwt (+0100) ,
.Dv fst (+0200) ,
.Dv eet (+0200) ,
.Dv bt (+0300) ,
.Dv it (+0330) ,
.\".Dv zp4 (+0400) ,
.\".Dv zp5 (+0500) ,
.Dv ist (+0530) ,
.\".Dv zp6 (+0600) ,
.Dv ict (+0700) ,
.Dv wast (+0800) ,
.Dv wadt (+0900) ,
.Dv awst (+0800) ,
.Dv awdt (+0900) ,
.Dv cct (+0800) ,
.Dv sgt (+0800) ,
.Dv hkt (+0800) ,
.Dv jst (+0900) ,
.Dv cast (+0930) ,
.Dv cadt (+1030) ,
.Dv acst (+0930) ,
.Dv acdt (+1030) ,
.Dv east (+1000) ,
.Dv eadt (+1100) ,
.Dv aest (+1000) ,
.Dv aedt (+1100) ,
.Dv gst (+1000) ,
.Dv nzt (+1200) ,
.Dv nzst (+1200) ,
.Dv nzdt (+1300) ,
.Dv idle (+1200) .
.Pp
The timezone names simply specify an offset from
Coordinated Universal Time (UTC)
and do not imply validating the time/date to be reasonable in any zone
that happens to use the abbreviation specified.
.Pp
A variety of unambiguous dates are recognized:
.Bl -tag -compact -width "20 Jun 1994"
.It 9/10/69
For years between 69-99 we assume 1900+ and for years between 0-68
we assume 2000+.
.It 2006-11-17
An ISO-8601 date.
Note that when using the ISO-8601 format date and time with the
.Sq T
designator to separate date and time-of-day,
this must appear at the start of the input string,
with no preceding whitespace.
Other modifiers may optionally follow.
.It 67-09-10
The year in an ISO-8601 date is always taken literally,
so this is the year 67, not 2067.
.It 10/1/2000
October 1, 2000; the common, but bizarre, US format.
.It 20 Jun 1994
.It 23jun2001
.It 1-sep-06
Other common abbreviations.
.It 1/11
The year can be omitted.
A missing year is taken from the
.Ar \&*time
value, or
.Dq now
if
.Ar time
is NULL.
Again, this is the US month/day format (the 11th of January).
.El
.Pp
Standard e-mail (RFC822, RFC2822, etc)
formats and the output from
.Xr date 1 ,
and
.Xr asctime 3
are all supported as input,
as is cvs date format (where years < 100 are treated as
20th century).
.Pp
Times can also be specified in common forms:
.Bl -tag -compact -width 12:11:01.000012
.It 10:01
.It 10:12pm
.It 12:11:01.000012
.It 12:21-0500
.El
Fractions of seconds (after a decimal point, or comma) are parsed, but ignored.
If no time is given, midnight on the specified date is assumed.
If a time is given without a date, that time on the day
specified by
.Ar \&*time
.Pq or now
is used.
Missing minutes, or seconds, are taken to be zero.
.Pp
A variety of forms for relative items to specify
an offset from the base time are also supported:
.Bl -tag -compact -width "this thursday"
.It -1 month
.It last friday
.It one week ago
.It this thursday
.It next sunday
.It +2 years
.El
.Pp
Note that, as a special case for
.Dv midnight
with the name of a day only,
.Dq "midnight tuesday"
implies 00:00 at the beginning of Tuesday,
.Pq "the midnight before Tuesday"
whereas
.Dq "Sat mn"
implies 00:00 at the end of Saturday
.Pq "midnight after Saturday"
.Pq "i.e. early Sunday morning" .
.Pp
Seconds since epoch, UTC, (also known as UNIX time) are also supported
to specify the base time:
.Bl -tag -compact -width "E.g.:\ @735275209\ \ \ \ "
.It "E.g.: @735275209"
to specify: Tue Apr 20 03:06:49 UTC 1993
.El
provided that the value given is within the range
that can be represented as a
.Va "struct tm" .
Negative values
(times before the epoch)
are permitted, but no other significant data as part of
the base time \(en the value given specifies year, month,
day, hour, minute, and second, there is no more.
An offset from this base time may still be included.
Thus
.Dq "@735275209 +2 months 5 hours 15 minutes"
produces a time_t which represents
.Dq "Sun Jun 20 08:21:49 UTC 1993" .
.Pp
Text in
.Ar datestr
enclosed in parentheses
.Ql \&(
and
.Ql \&)
is treated as a comment, and ignored.
Parentheses nest (the comment ends when there have
been the same number of closing parentheses as there
were opening parentheses.)
There is no escape character in comments,
.Ql \&)
always ends
(or decreases the nesting level of)
the comment.
.Sh RETURN VALUES
.Fn parsedate
returns the number of seconds passed since,
or before (if negative,)
the Epoch, or
.Dv \-1
if the date could not be parsed properly.
A non-error result of
.Dv \-1
can be distinguished from an error by setting
.Va errno
to
.Dv 0
before calling
.Fn parsedate ,
and checking the value of
.Va errno
afterwards.
.Sh ENVIRONMENT
If the
.Ar tzoff
parameter is given as
.Dv NULL ,
then:
.Bl -tag -width iTZ
.It Ev TZ
The timezone to which the input is relative,
when no zone information is otherwise specified in the
.Ar datestr
input.
.El
.Sh SEE ALSO
.Xr date 1 ,
.Xr touch 1 ,
.Xr errno 2 ,
.Xr ctime 3 ,
.\" WTF ???? eeprom(8)!! Why? Just because it calls this function? Weird!
.Xr eeprom 8
.Sh HISTORY
The parser used in
.Fn parsedate
was originally written by Steven M. Bellovin while at the University
of North Carolina at Chapel Hill.
It was later tweaked by a couple of people on Usenet.
Completely overhauled by Rich $alz and Jim Berets in August, 1990.
Further mangled during its residence with
.Nx .
.Pp
The
.Fn parsedate
function first appeared in
.Nx 4.0 .
.Sh BUGS
.Bl -tag -compact -width 1
.It 1
The
.Fn parsedate
function is not re-entrant or thread-safe.
.It 2
The
.Fn parsedate
function assumes years less than 0 mean \(mi
.Fa year ,
and in non ISO formats,
that years less than 69 mean 2000 +
.Fa year ,
otherwise
years less than 100 mean 1900 +
.Fa year .
That is except in the CVS format, where years less than 100
mean 1900 +
.Fa year .
.It 3
The
.Fn parsedate
function accepts
.Dq "12 am"
where
.Dq "12 midnight"
is correct, and similarly
.Dq "12 pm"
for
.Dq "12 noon" .
The correct forms are also accepted.
.It 4
There are various weird cases that are hard to explain,
but are nevertheless considered correct.
.It 5
It is very hard to specify years BC,
and in any case,
conversions of times before the
commencement of the modern Gregorian calendar
(when that occurred depends upon location,
but late 16th century is a rough guide)
are suspicious at best,
and depending upon context,
often just plain wrong.
.It 6
Despite what is stated above,
.Dq next
is actually 2.
The input
.Dq "next January" ,
instead of producing a timestamp for January of the
following year, produces one for January 2nd, of the
current year.
Use caution with
.Dq next
it rarely does what humans expect.
For example, on a Sunday
.Dq "next sunday"
means the following Sunday (7 days hence)
whereas
.Dq "next monday"
means the monday that follows that (8 days hence)
rather than
.Dq tomorrow
or just
.Dq Mon
.Pq without the Dq next
which is the nearest subsequent Monday.
.El