Stop Comparing JSON and XML — No, Really

Stop comparing XML and JSON. No, really. You are comparing two things that are different in purpose. By saying unequivocally that JSON is ‘better’ than XML or vice-versa, you are doing the equivalent of saying that a fork is unequivocally better than a spoon. Yes, you can use one of them to do the same thing as the other some of the time. But I wouldn’t want to try to eat soup with a fork.

What is the difference? Let’s ask Tim Bray, the ‘father of XML’ who is also now responsible for codifying the JSON specification into RFCs for the IETF. He says:

If it looks like a document, use XML. If it looks like an object, use JSON. It’s that simple.

Now, there are some things which cross the ‘simple’ boundary established here — syndication feeds, for instance, are collections of objects (articles) containing documents (the content of the articles). RSS and Atom were both born in the pre-JSON era, when the difference between documents and objects was mostly ignored in popular serialization theory anyway, so they decided that the ‘document’ part was more important than the ‘collection of objects’ part. And there some other contemporary XML-based technologies, like SVG, which really should not have gone down the XML road either. (In SVG you can tell this especially, because they had to invent some entirely different languages and syntaxes and stuff them inside attribute values.)

For reliably telling the difference between documents and objects, let me introduce Naggum’s rule of thumb (originally formulated to decide when to use attributes vs elements in schema design, but equally applicable to the question of whether a ‘markup language’ is what you actually need in the first place). Erik Naggum said:

Even attributes is a good idea when the textual element contents is the “real meat” of the document and attributes only aid processing, so that the printed version of a fully marked-up document has the same characters as the document sans tags.

Does it make sense to print out the thing you’re serializing as text characters? Would the characters on the page be essentially identical to the characters in the marked-up document file stripped of markup? If so, you should probably use XML. If not, JSON may be the way to go.

Two more things

  1. Now, I fully grant that XML is not itself a great technology. I think a lot of the effort that went into XML would have been much better spent developing better, simpler tools to process SGML. However, XML is better to use than SGML for the simple reason that the tools used to manipulate XML are still generally being maintained; by contrast, the maintainers of OpenJade/OpenSP say that the current release is probably the last.

    XML has a wealth of excellent tools like libxml2 and Saxon which are built for processing document data.

    JSON also has implementations in every major language, which is not (yet) true of some of its arguably-superior competitors like MessagePack, CBOR, and Protocol Buffers. Nonetheless, I suggest keeping an eye on those formats in case you could improve performance or rigour by using them.

  2. Lastly I’d just like to recommend that if you’re still reading this, you should go and check out Sean B. Palmer’s essay on Technobunkum which explains why I wasted my time writing this page.