Why should RSS and Atom be XML?
by ZetaGecko | Add Your Comments | Atom/RSS
RSS and Atom are both XML-based formats. One of the complaints many Atom advocates have about RSS is that it has become commonly permissible for RSS feeds to be invalid XML--RSS readers compensating for a variety of errors made by publishers. Not all in the Atom community advocate cracking down on invalid feeds, but promoting more validity is certainly among the community's goals. A question that arises, partly out of the challenges related to generating valid XML is whether RSS (and Atom) feeds need to be XML at all. What benefit is there to using XML in digest formats? Is it worth the pain?
A major benefit of basing a file format on XML is that many tools already exist for processing XML. A developer writing a feed reader can simply pass the feed off to an XML parser and get it back all chopped up into its parts and ready for further handling. Of course, this only works if the feed is valid XML--otherwise the XML parser reports an error and generally does not continue to process the document.
Another benefit of using XML verses inventing another format is that a lot of thought has gone into designing XML and a variety of related standards (XSLT, XPath, XLink, etc.). Many of the stumbling blocks to creating a format that works have been dealt with already. Unless there's a significant benefit to going a different route, there's no point reinventing the wheel.
I've seen a number of proposals for plain-text digest formats that look something like:
item
title=Hi there
link=http://www.xyz.com
summary=This is a website, blah, blah, blah
item
title=...
While the simplicity of such a format has its appeal, it also limits the format to doing simple things. For example, what if you want your summary to contain the following:
The XYZ digest format can contain a title that looks like this:
title=This is the summary
Everything between the equals sign and the end of the line is part of the title.
First, how to you put linebreaks in the summary? Second, how do you know that the "title=" inside the summary is not the start of a title? You can always encode the linebreaks as \n or something, but then you need to figure out how to encode \n too, and so on. In the end, you either have a complex format (so you may as well use XML instead) or you have to tell people they can't use linebreaks in field values.
Another thing that makes the example text format simpler in one way and more complex in another is that each item doesn't have to be terminated by anything special. When you start a new item, you know that the last item has ended. While this makes generating a feed easier, it makes reading it more complex, because the parser has to understand that "item" starts a new item, and need similar knowledge of any other hierarchical aspects of the format. In XML, the XML parser portion of the program only needs to know that <something> starts an element and </something> ends that element (or <something /> starts and ends the element). It doesn't have to know anything about each particular element in the particular format--just about XML. You could always add something like "/item" at the end of each item or other hierarchical element, but once again, that just gets you closer to XML. By the time you're done, you may find that you've added the rest of XML into your format to solve the little problems that keep cropping up.
XML is useful not just because it's buzz-word compliant, but because a lot of work has gone into designing it to solve real-world problems. It's not the only solution, but it's certainly a good choice. Using valid XML ensures that the benefits of all that design work, and the work already done on tools to work with XML, are used to their full potential.