Markin’ Up

Markup languages are used to provide an additional layer of data to plain text, like formatting and providing semantic description for information. I might be understating things here, but there are a lot of markup languages. If you include all the variations on XML, you could probably count hundreds, if not thousands, of different markups used for different purposes.

As a technical writer, though, I don’t concern myself with the vast majority of markup languages; I’m primarily concerned with those used to craft documentation. A huge number of technical writers, myself included, spend much of the day looking at and editing documentation in a markup language, which is then processed to produce other, more consumer-friendly formats like web pages or printed documents. Given that limitation, however, we’ve still got a lot choices. Here are the most prominent I’ve run into in the course of my work:

  • HTML
  • Markdown
  • reStructuredText
  • TeX
  • Wikitext
  • XML

Unfortunately, there’s no one markup language for every application. If you’re working exclusively with the web, for example, HTML is the way to go. If guaranteed cross-platform consistency is a requirement, XML is perhaps the only choice. Others fall on a continuum of features, cross-platform support, and popularity. While some tools, like Pandoc make it possible to move from one markup to another, the choice of a markup language is still a deeply difficult one, since other writers—including your future self—may come to hate your decision. Below I’ve tried to round up common markup languages for documentation and to provide an overview of each, to help guide you to a selection and to avoid some common pitfalls.

HTML

HTML is the One True Format for the web. All web browsers use it to make the web happen. It’s hugely important. Because of its centrality to the web, it’s also the lingua franca of markup languages: for practically every markup language out there, there’s a tool which will convert it to HTML. Unfortunately, HTML is somewhat cumbersome to actually work in. Partly because the syntax itself is a little on the heavy side, but also because there are varying ideas of how it should actually be used. To illustrate both points, consider that there’s more than one way to mark a bit of text for emphasis:

<!-- Both of these are (usually) bold -->
<b>Some text</b>
<strong>Some other text</strong>

<!-- Both of these are (usually) in italics -->
<i>Some text</i>
<em>Some other text</em>

Though this doesn’t stop some people from writing excellent documentation in HTML. HTML’s somewhat compelling for this purpose actually: practically anything you would care to do—code samples, footnotes, embedded media, etc.—can be done in HTML, but you pay for it in increased overhead doing simple things, like basic formatting and hyperlinks. Though it’s an increasingly less common concern for technical writers, producing printed documentation may be difficult with HTML, since it’s wholly unintended for such use.

Markdown

In contrast to HTML, Markdown is probably the most minimalist of markup languages used by writers. Markdown was created by John Gruber and Aaron Swartz. It’s quite nice to look at without even converting to HTML for use in a browser:

*emphasized*
**strong**
_or_ __like this__.

A [hyperlink][1]
[1]: Looks like this.

As much as I like working in Markdown (many posts here are first written in Markdown and then converted for use with WordPress), Markdown suffers from the lack of an active maintainer. Thus there are several Markdown variants and implementations; while this or that system may claim to offer Markdown support, discovering which features are actually supported is oftentimes a matter of experimentation. Otherwise, it’s a flexible markup, with a wide range of tools to produce pretty, consumable output.

reStructuredText

reStructuredText, a portion of Python’s docutils, is similar to Markdown in that many common formatting tasks, like hyperlinks and emphasis, are easier to do than the equivalent HTML. reStructuredText provides a broader feature set—like sophisticated table formatting and shorthand substitutions—while still maintaining a simple syntax. A cursory look at reStructuredText doesn’t reveal its flaws, however. I use it on a daily basis, and I’ve found that it suffers from somewhat unusual design decisions. For example, you cannot nest formatting: a specific hyperlink cannot be made bold and bold text cannot be italicized. Despite such limitations, it’s a very powerful markup language for documentation, particularly when coupled with Sphinx, which provides tables of contents, multiple output formats, and more.

TeX

TeX (and its most popular variant, LaTeX) is a popular tool among academics, for its ability to handle complex mathematical formulae and for high-quality typesetting. Of all the markups listed here, it’s probably the most capable of certain difficult tasks than any other, but at the expense of ease-of-use. While you can get (La)TeX to produce beautiful output just the way you like it, it may take a lot of effort to get there, particularly if you’re starting from scratch. (La)TeX particularly shines as an intermediary format for other tools going to print. Even if you’re not using it as a primary authoring markup, you may find it in your tool chain nonetheless.

Wikitext

Wikitext often refers to the markup used with Mediawiki (of Wikipedia fame), though other wiki software implements similar markup. Wikitext appears in the context of documentation since so much documentation is now composed collaboratively with wiki software. Unfortunately, different wiki software packages frequently use different, incompatible markups, making the reuse of documentation vastly more difficult than using a markup common outside wikis.

XML

XML stands for eXtensible Markup Language, owing to its core feature of providing a highly flexible, but verifiable format. XML strove to be what HTML has become in many ways: a common markup used the world over. Unfortunately, its flexibility has, in practice, created many competing XML-derived toolkits, like DITA and DocBook. These toolkits are rather complex, making the markup aspect of XML somewhat limited; instead, special authoring tools are typically used to create content which is then stored using the markup. Such tools are the primary hurdle to using an XML-based format: they’re often proprietary and add (yet another) new piece of software to learn and write with.

Document your thoughts