Do feeds need to be able to carry non- text or HTML data?
by ZetaGecko | Add Your Comments | Atom/RSS
A proposal was added to the Atom wiki this morning for drastically simplifying the options for what types of content can appear in an Atom feed. My first reaction was, "Woah! Let's not hamstring the format!" But the more I think about it, the more I like it. The proposal suggests limiting content constructs, which could formerly carry virtually any content type, to just three: plain text, escaped (X)HTML and inline XHTML. Base64 encoded content would be gone completely.
The rational behind the proposal is that we've come up with a few good methods for linking to other types of data, so there's no longer a need to support any other types in content constructs. Simplifying content constructs would make tool developers' jobs a lot easier, and does not appear likely to make any real-world, or even likely, uses of Atom more difficult.
I looked back through this blog at entries where I explored future uses for digests, and couldn't find a single one that would more difficult if this proposal were adopted. That was when I became convinced. As long as we come up with a good solution for clearly defining which linked content is supposed to be rendered with the feed, and which is merely linked to by the feed for the benefit of those who might like to view it, I'm all for the change.
Sam Ruby raised the question on the mail list of whether plain-text is needed. I tried coming up with a case where it would be, and was surprised at first to fail, but eventually found an issue: whitespace. In XHTML, multiple spaces collapse down to a single space, but in plain text, they are preserved. Likewise, line breaks are preserved in plain text but not in XHTML. Aside from that, I think it's likely that tools would render some text incorrectly, creating the same kind of confusion as exists with RSS. If an inline-xhtml content construct contained "©", but no namespaced element to define the © entity, it should render as "©". I suspect that many tools would render it as "©".