Linebreaking conventions
by ZetaGecko | Add Your Comments | Technology
As a side note to an issue that arose on the Atom syntax mail list yesterday, I'd like to rant a little about the variety of linebreaking conventions in use today.
Operating systems and internet standards define a variety of ways to indicate a linebreak. UNIX uses a newline (a.k.a. linefeed) character. Windows uses a carriage return followed by a linefeed. MacOS 9 and lower used...just a linefeed? I don't remember. MacOS X, now that it's UNIX-based, uses just a newline. Good--one step has been made toward unification. Now we just need to get Windows to drop the extra character and we're one big happy family, right? Nope.
First of all, even if future versions of Windows do make the change (I can dream, but I don't know that I expect Redmond to be too anxious to change. They seem more likely to try to make everyone else do things there way), a lot of people are still using Windows 95 and 98, not to mention ME, XP, etc. There are also plenty of people still using MacOS 9 and lower.
Second of all, non-newline linebreaks are part of at least one internet standard: SMTP. When emails are transferred between SMTP servers, linebreaks are coded with the Windows-style CR/LF.
But isn't SMTP about due for replacement anyway (to get rid of SPAM)? Once again, I can dream, but it seems more likely that it will be updated with various anti-SPAM schemes and live on indefinitely. I suppose if the new additions are effective at throttling SPAM without making doing business online unnecessarily expensive, then SMTP is fine with me (but that's a topic for another blog). Can we modify SMTP to just use newlines though? We'd have to maintain the ability to do it either way for backwards compatibility, which would require intermediate SMTP servers to sometimes translate newlines. This is sounding very unlikely to happen.
Maybe the best that we can hope for is that as new standards are developed which operate at a level where they get to specify linebreaking conventions, they'll all use the same convention. My vote goes to using the newline character.
What does this have to do with Atom? Nothing. Discussion of how to treat whitespace in plain text content just got me thinking.