Your content's author

I’m reading content on the Web daily. I find it important to know who the author of a specific article is and when was it published.

Many blogs and even news sites don’t get this right. This is especially obvious when using Instapaper or related apps which extract content from the site, thus removing it from its usual context where you were able to hunt for that “about” link if you had the time.

This is why I wanted to make authorship explicit on every post on my site by putting this byline directly after the title of an article:

By Mislav Marohnić on 17 January 2012

This makes it obvious to people, but what about machines? What is the proper markup for authorship information?

My goal was to enable Instapaper, Readability and Google to pick up this information. Here’s how to achieve that.

Markup formats

Instapaper text parser seems to support multiple different markup formats for authorship information, although it was difficult to test reliably due to its caching strategy. While it might support the Person microdata schema and the <meta name=author> tag, it definitely supports the rel=author attribute:

  By <a href="/" rel=author lang=hr>Mislav Marohnić</a>
  on <time pubdate datetime="...">17 Jan 2012</time>

The <time pubdate> element holds machine-readable information about publish date.

Readability advises marking up your content with hNews, which basically falls down to combining hAtom and hCard microformats.

<article class=hentry>
  <p class="author vcard">
    By <a href="/" class=fn lang=hr>Mislav Marohnić</a>
    on <time pubdate class=published datetime="...">17 Jan 2012</time>

Notice the class=published addition to <time>. This is part of hAtom.

Readability will also extract and display the article summary if it finds one marked up hNews (“entry-summary”) or Article microdata (“description”).

Google instructs to link your content to your Google+ profile. This results in a nice side-effect that is well-formatted snippet in Google search results that includes your name and profile picture.

All of the above can be tested with Google’s rich snippets test tool, which recognizes many different microdata/RDFa schemas and microformats.

Ignoring duplicate content

Now that I’ve fed authorship and published date information to Instapaper and Readability text parsers, I don’t want them to show this byline twice (once in their own UI, and once in my content). Fortunately, they both allow means to mark certain elements to be ignored by their text parsers.

Instapaper respects the “instapaper_ignore” classname, while Readability defines the “entry-unrelated” extension to hAtom:

<article class=hentry>
  <p class="instapaper_ignore entry-unrelated">
    By <a href="/" lang=hr>Mislav Marohnić</a>

I doubt that other clients respect these class names, but that’s fine since we’re not sure that other clients will present author & date in their own UI. For those that don’t, such as Safari Reader feature, it’s best to keep having this byline as part of your article content.