Archives » January 1st, 2003

January 1, 2003

No More Gloom and Doom

Okay, no more gloom and doom today. That kind of stuff is so last year. Today? Back to tech.

A lot of people out there seem to think the Web should be entirely built on XML. They seem to think that simply saying “This is a heading, this is a paragraph, this is a list” doesn’t go far enough to describe the oh-so-important stuff they’re writing. After all, doesn’t every weblog need to be machine readable? They think the Semantic Web should be mandatory and widespread, rather than a niche corner. And they’re very vocal about their beliefs and, of course, are very disapproving of anyone who disagrees. Well, Mark Pilgrim shot them down Monday, as only he can.

I’m not familiar with all the arguments, but saying that everything on the Web should be semantically marked-up XML is just wrong. Some things should be, of course. And it’s great that XML is catching on and that machines are going to be able to use it to send data back and forth with little hassle and all that. But, if you look at the majority of what’s out there on the Web, it doesn’t need to be semantic! It’s all designed for one purpose, reading and comprehension by humans. All the browsers need to know is that “this is a heading, this is a paragraph”. It doesn’t go beyond that. HTML and CSS are all you need to give humans the visual and contextual cues they need to figure out the semantics for themselves. Not to say semantics don’t have their place, though. Any weblog or CMS or other suitably complex system should have semantics built into it on the backend, either as a database or a series of XML files or something. But those semantics don’t need to carry over to the Web. There has to be some kind of transformation from a semantically rich backend to a simply structured HTML doc. The places the semantics are needed are on other channels, like the Web Services front, but that interface has to be separate from the human-readable side of things. There’s just no reason to be sending semantically rich XML to a web browser. People aren’t going to be reading it, and the browser isn’t going to be doing anything with it.

If you think your weblog needs to be machine readable, that’s when you use RSS. If you want other computers to be able to mine your data, you provide some kind of Web Services interface. What you don’t want to do is replace XHTML with XML in places where a human is going to be reading the document. There’s just no point.

(Yes, I realize XHTML is really XML. And yes, I realize XHTML has its own built-in semantics: header, paragraph, list. I’m not saying those are pointless. But those are very simple, and in being simple, they serve their purpose very well. A browser doesn’t need to know what’s in the paragraph, whether it be an author’s name, a timestamp, or a recipe. That’s the type of data XML is good at describing, but it’s all irrelevant to a web browser.)