How to Embed One XML Format in Another
by ZetaGecko | Add Your Comments | Atom/RSS, XML
I've got an idea I'd like to suggest, but don't feel inclined to champion right now. So I post it here and either come back to it later, or let someone else run with it.
First, a little background.
While working on the Atom feed format, one issue we ran into was how to embed XHTML content in an Atom feed. We defined 3 special values for indicating the type of content: text, html, and xhtml.
- "text" means that after unescaping any XML entitites (eg. < for <), the resulting content should be treated as text. Thus, if the resulting content contained <img src="...">, that should NOT be treated as an HTML image tag -- it should be treated as text.
- "html" means that after unescaping, the resulting content should be treated as HTML code. In that case, for example, <img src="..."> would be treated as an HTML image tag.
- "xhtml" means that the content isn't escaped -- it's an XHTML fragment. "Fragment" means that it's not a complete XHTML document (with an html tag, head section, body tag, etc.), but a fragment from the body section of an XHTML document.
One difficulty with embedding XHTML had to do with namespaces. The XHTML tags either had to have a namespace prefix, or the default namespace had to be changed (assuming you'd set the default namespace to the Atom namespace, which is what's usually done). The question was where to put the namespace declaration.
You can't do this:
<content type="xhtml" xmlns="(XHTML's namespace)">XHTML content goes here</content>
...because the atom:content tag isn't in the XHTML namespace. Instead, you'd have to do something like this:
<feed xmlns="(Atom's namespace)" xmlns:atom="(Atom's namespace)">
...
<atom:content type="xhtml" xmlns="(XHTML's namespace)">XHTML content goes here</atom:content>
That way, you can use the default namespace for most of your Atom tags, and the "atom:" prefix for any where you're changing the default namespace to XHTML's.
It works. But it's a little weird -- declaring the same namespace as the default and with a prefix.
Here's another solution:
<content type="xhtml"><div xmlns="(XHTML's namespace)">...
This is what you'll actually see in most feeds. But the question (if we hadn't specified Atom the way we did) would be, is that "div" part of the content, or was it just tacked on in order to change the default namespace? Without going into all the odd things that might happen if a consuming application guesses wrong, I'll just say that we didn't want to leave any ambiguity there.
So I proposed that whenever the "xhtml" type was used, the entire content of the atom:content element be required to be surrounded with a div which was not to be considered part of the content.
It was a fairly controversial suggestion. I don't particularly like it myself. But I think it was the best option we had.
Here's my suggestion
What I'm suggesting now, that would have solved the problem for us, is an addition to XML. XML has a few predefined attributes like xml:base, xml:lang, etc. The "xml:" namespace does not have to be declared. What I'd propose is a new element named something like "xml:embed" to be used when embedding one XML format in another.
If we'd had xml:embed, we could have done this:
<content type="xhtml"><xml:embed xmlns="(XHTML's namespace)">XHTML content goes here</xml:embed></content>
Clean and unambiguous. Even if you don't read the Atom spec and discover that the required div isn't part of the content.