What is the date for a blog entry?
by ZetaGecko | 2 Comments | Atom/RSS
This blog entry will be dated Tuesday, July 13, 2004 at 8:51 AM, MDT. What does that mean? Can it change? Should it change? What should be required before it is changed? Is it the "right" date? What other dates might be associated with this blog entry? Are there dates associated with it that no one cares about? Who does care? About what? What format should the date be expressed in when this entry appears in a feed? What about timezone information? These are some of the questions we're attempting to answer on the AtomPub Working Group Mailing List.
A lot of discussion has gone into the question of dates in Atom, but yesterday, it seemed we weren't getting much nearer any conclusions. The 0.3 version of the format had three date elements: created, modified, and issued. The meanings of the first two seems straightforward at first glance, and it doesn't take too long to get the point across that "issued" has something to do with when an entry is published. But is it supposed to express the instant in time of publishing, or is it like a magazine, where the August issue comes out in July? The modified date also turns out not to be as straightforward as it seems: specifically, what constitutes a "modification"? Is a spelling correction a modification? Or does it have to be something bigger? A big area of disagreement was over whether dates should be expressible in any timezone, or whether they should all be converted to UTC (aka GMT).
Everyone in the conversation seemed to be operating on different assumptions.
One of the reasons why I didn't post anything in any of my blogs yesterday was that I'd decided in the morning to take a stab at organizing this whole discussion, getting everyone on the same page, and seeing if we could get on course to put the issue to rest. That started off with an email a little too long and detailed to reproduce here in full...or not. I think it illustrates a point I'd like to make. So here it is (feel free to speed read or even skip it completely):
Dates (where "date" = "date and time") that might be associated with an entry:
1) objective creation date
2) each objective major modification date
3) each objective minor modification date
4) each objective publication date
5) any subjective creation date the author may wish to associate with the entry
6) any subjective major modification date the author may wish to associate with the entry
7) any subjective minor modification date the author may wish to associate with the entry
8) any subjective publication date the author may wish to associate with the entry
Questions:
A) Which of these should appear in the feed?
B) Which of these can be consolidated into a single element in a feed?
C) What timezone requirements/recommendations should each carry?
Interested parties:
I) Author: Most interested in 8. Prefers their local timezone.
II) Publishing client software: Just an intermediary--probably doesn't care about anything.
III) Publishing server software: Most interested in 2-4, possibly including all iterations. No strong timezone preference.
IV) Aggregators: Most interested in 2-4, perhaps including only the first, last, or first and last of each. Prefers UTC.
V) Feed reader software: Most interested in 2-3 for ordering and determining whether to call something "new" or "updated". No strong timezone preference.
VI) Human reader: Most interested in the last of each in 2-3 and one of 4 or 8 (depends on the person)--possibly also interested in the first of 4 or 8--and may wish the timezone to be either the author's local timezone or their own local timezone.
Comments (correct me if I'm wrong or you disagree--these are my perceptions and opinions):
i) The publishing software can store the dates in whatever timezone they want. They can present it to the author in the author's local timezone, and can publish it as best benefits everyone else.
ii) Nobody really cares about 5-7.
iii) Nobody reading the feed really cares about 1, so including it in the feed should be optional. I suppose there may be uses for it, but I wouldn't complain if it weren't even supported.
iv) Nobody reading the feed cares much about anything that occurred before the first 4.
Conclusions:
a) 2-4 and 8 are the only dates that may need to be in the feed.
b) At most the first and last of each should appear in the feed.
c) Only 8 should be allowed to omit the timezone.
d) 2 could be used instead of 4 in the feed.
Proposed Date Constructs:
atom:first-issued or atom:first-published (first 4)
	An objective publication date. REQUIRED. MUST specify a timezone (which might be -00:00 if the timezone is unknown) which MUST be numeric.
atom:issued or atom:published (last 8)
	A subjective publication date. OPTIONAL. It is RECOMMENDED that it include a timezone, that the timezone be numeric, and that it be the author's timezone, or the timezone relevant to the content of the entry (for example, if I, in Utah, am writing about something that happened in Iraq, I might use Iraq's timezone).
atom:modified (last 2)
	The objective date when the last major change--a change which alters or non-trivially expands the message--was made to the entry. REQUIRED if different from atom:first-issued|published. It MUST specify a timezone (which might be -00:00 if the timezone is unknown), which MUST be UTC.
atom:updated (last 3)
	The objective date when the last minor change--for example a spelling correction, clarification which doesn't expand the message, etc.--was made to the entry. RECOMMENDED if different from atom:first-issued|published and later than atom:modified. MUST NOT be included if earlier than atom:modified. MUST specify a timezone (which might be -00:00 if the timezone is unknown), which MUST be UTC.
I would also propose that the spec text for each Date Construct, or some text appearing before the list of Date Constructs, point out that the requirements for each are different because some are intended to use by software and others for presentation to humans.
The "objective" dates should have spec text to point out that these dates MUST NOT be modified except to correct for errors in the actual timestamp (for example, if the computers clock was incorrect).
Note that the dates that are required to be specified in UTC may be converted by client software to the timezone indicated in atom:issued|published if it appears in the feed. All dates may also be converted to the user's local timezone.
Why so much detail? Because when arguments get really messy, sometimes you have to go back to the beginning and build from the very most basic concepts, including the one's you know from the beginning that you're going to throw out, to make sure the everything has been covered, and that you can clearly identify exactly which details people disagree on. Emphasizing that again, clearly identifying points of agreement and disagreement is key to resolving differences of opinion, especially on complex matters.
Interestingly, the process of writing that email led even me to some conclusions that I wouldn't have agreed with if someone had stated them in isolation--specifically, that only one of the dates attached to an entry has any need to have a non-UTC timezone attached to it.
A little after I posted the email, I got a much appreciated stamp of approval from Sam Ruby, who reposted the entire message under a new subject in order to emphasize that he liked my approach (even if he didn't necessarily agree with my conclusions). Laboriously argued messages like this aren't always appreciated, since sometimes they really are little more than laborious.
After a day of discussion, this morning I posted a follow up with I think summarizes where the opinion of the group sits, and made suggestions on unresolved issues. Here is it. Assuming I haven't misconstrued the consensus, you can see how clear the issues to be debated can become once a discussion has been carefully organized. Note that some of the things I'm asserting that we agree on were definitely NOT agreed on yesterday morning (and I may be assuming too much in assuming that they are now):
1) objective creation date
2) each objective major modification date
3) each objective minor modification date
4) each objective publication date
5) any subjective creation date the author may wish to associate with the entry
6) any subjective major modification date the author may wish to associate with the entry
7) any subjective minor modification date the author may wish to associate with the entry
8) any subjective publication date the author may wish to associate with the entry
Let me see if I can get a sense of what we agree on and what we don't. Would I be correct to assume that we can all agree that:
A) to support legacy systems/data which have subjective dates, possibly without timezones, we have to support #8?
B) That A (#8) be REQUIRED?
C) that A (#8) be the only REQUIRED date construct (again, to support legacy systems & data)?
D) that A (#8) be the only subjective date?
E) that all Date constructs MUST specify a timezone, which MUST be -00:00 if it is unknown?
F) that the timezones MUST be numeric, except that either Z or +00:00 MAY be used for UTC?
G) that all dates except A (#8) MUST be in UTC (Z or +00:00), unless unknown?
I've only heard one indirect, and uncommon, argument against my having omitted the creation date from my original proposed list. Does anyone have a use case for why that needs to be in the feed?
What we don't agree on, or consensus is unclear:
i) Can the issued/published date ever change?
ii) If it can, do we want to preserve a "first issued" date in the feed? (first #4)
iii) Do we want different Date Constructs for specifying major and minor changes, imperfect though the distinction may be? (last #2 and #3)
iv) If so, what if any guidelines do we want to provide for when to change each?
v) What assumptions do we want people to be able to make about omitted Date Constructs? (This will be easier to answer after we've decided on which Date Constructs are going to exist).
vi) What do we want to call each of the Date Constructs?
Finally, my answers to the above questions:
i) Yes, as long as we track the first issued date.
ii) Yes.
iii) Yes.
iv) A major update alters or non-trivially expands the scope of the message. I minor update does not, and includes such things as spelling corrections and rewording for clarification of existing meaning.
v) If a major update date exists, but no minor update date exists, they are the same. (Also, if both exist, the minor update must not be earlier than the major update). If no update dates exist, but the objective issued date does, the update dates are the same as the issued date. If only the subjective date exists, no other assumptions are possible.
vi) My heart is not set on anything in particular, but here's what comes to mind: subjective date: issued or published; first issued date, first-issued or first-published (can't think of an unambiguous single word); last major modification: modified; last minor modification: updated.
This stuff sure can be fun for people like me who like to knuckle down and solve problems.






July 13th, 2004 at 9:31 am
Here's an interesting note: the entry was date-stamped at 8:51--that's the time I STARTED writing it. I didn't post it till 9:27--not even as a draft. So you might say (and I would say) that the date-stamp is wrong. I could change it, but under the circumstances, I'll let it sit as is.
July 14th, 2004 at 6:33 pm
CORRECTION: Sam Ruby pointed out to me today that -00:00 doesn't mean that the timezone is indeterminate, it means that the time is in UTC, but the timezone from which it originated is unknown.