An Introduction to Microformats

« article index

1. Introduction

Microformats 1 are a set of open data conventions built upon existing and widely adopted standards for embedding semantics for specific problem domains in human-readable XHTML documents, RSS/Atom feeds, or anywhere XHTML can be embedded.

Put simply, microformats provide easy conventions for marking up data on the Web to convey semantics.

For example, when a traditional web page contains information about an event such as a business meeting, the markup used conveys little more than the formatting of the text describing the event. The markup fails to describe what the text means. While a human reading the page can recognise and process the semantics of the event, for a machine any semantics are lost.

However, by applying the microformat conventions for describing events (1.3.1. hCalendar) and people (1.3.2. hCard) to the event, information such as the start time, venue, and the individuals attending can be unambiguously identified by both human and machine.

1.1. Microformats and the Lowercase Semantic Web

Microformats constitute what has been dubbed the lowercase semantic web (as opposed to the ongoing Semantic Web project led by the Web's creator Tim Berners Lee. 2)

The lowercase semantic web is defined 3 as adding simple semantics through microformats, as evolutionary rather than revolutionary (i.e. seeking to add semantics to today's Web rather than waiting for the Semantic Web to fulfil its promise), and as designed for humans first, and machines second.

1.2. Elemental and Compound Microformats

Microformats can be divided into two distinct types: elemental and compound.

Elemental microformats represent a minimal solution to a problem domain, built using standard XHTML elements.

Compound microformats are an XHTML rendition of a common data type, built from other elemental microformats and selected XHTML elements. Compound microformats are often a 1:1 mapping of an existing standardised schema that describes a compound data type (1.3.1. hCalendar, 1.3.2. hCard).

1.3. Microformats in Practice

1.3.1. hCalendar

hCalendar 4 is a 1:1 mapping into semantic XHTML of the iCalendar 5 format, an open and widely adopted IETF specification for calendar data exchange.

The iCalendar to hCalendar mapping is relatively straightforward, and is based on the following rules:

  • iCalendar object/property names are converted to lower-case XHTML class names, and nested iCalendar objects are mapped directly to nested XHTML
  • URL in iCalendar becomes <a class="url" href="...">...</a> inside the XHTML element with class="vevent"
  • ATTENDEE, CONTACT, and ORGANIZER in iCalendar can be represented by the addition of a hCard microformat
  • A named LOCATION (optionally with an address and/or geo) in iCalendar can be represented by a nested hCard. An address LOCATION can be represented by an adr 6 microformat, while a geo (latitude and longitude) LOCATION can be represented by a geo 7 microformat
  • UID in iCalendar becomes another semantic applied to a specific URL

For singular properties (e.g. N and FN from vCard), the first descendant element with that class takes effect, with any others being ignored. For plural properties (e.g. TEL from vCard), each class instance should create an instance of that property.

Plural properties with subtypes (e.g. TEL with WORK, HOME, CELL from vCard) can be optimised to share a common element for the property itself, with each instance of subtype being an appropriately classed descendant of the property element.

When an XHTML <abbr> element is used for a property, the title attribute of the element is the value of the property. The contents of the element are instead used to provide a human presentable version of the value. <abbr> elements are used for the DTSTART, DTEND, DURATION, RDATE, RRULE iCalendar properties.

A sample mapping ilustrates the process:

BEGIN:VCALENDAR
PRODID:-//XYZproduct//EN
VERSION:2.0
BEGIN:VEVENT
URL:http://www.comp.dit.ie/k268/
DTSTART:20050901
SUMMARY:Course code changed from K268 to DT249
LOCATION:DIT Kevin Street, School of Computing
END:VEVENT
END:VCALENDAR

<span class="vcalendar">
  <span class="vevent">
	<a class="url" href="http://www.comp.dit.ie/k268/">
	  <span class="dtstart">20050901</span>
	  <span class="summary">Course code changed from K268 to DT249</span>: 
	  <span class="location">DIT Kevin Street, School of Computing</span>
	</a>
  </span>
</span>
					

iCalendar PRODID is not required – when converting from hCalendar back to iCalendar the transforming engine should add its own product ID. The surrounding element is optional as the context of vcalendar is implied when vevent is encountered. The implied scope is that of the document. iCalendar VERSION information is not required.

1.3.2. hCard

hCard 8 is a 1:1 mapping into semantic XHTML of the vCard 9 format, an open and widely adopted IETF specification for the representation of people, companies, and organisations.

vCard to hCard mapping is based on the following rules:

  • vCard object/property names are converted to lower-case XHTML class names, and nested vCard objects are mapped directly to nested XHTML
  • URL in vCard becomes <a class="url" href="...">...</a> inside the XHTML element with class="vcard"
  • EMAIL in vCard becomes <a class="email" href="mailto:...">...</a>
  • PHOTO in vCard becomes <img class="photo" src="..." alt="Photo of ..." /> or <object class="photo" data="..." type="...">Photo of ...</object>
  • UID in vCard becomes another semantic applied to a specific URL

Singular and plural properties, and use of the XHTML <addr> element are handled in the same manner as with hCalendar (1.3.1 hCalendar). Other mapping caveats and optimisations (e.g. N, FN) are detailed in the official specification.

A sample mapping illustrates the process:

BEGIN:VCARD
VERSION:3.0
N:Lawless;Derek
FN:Derek Lawless
EMAIL:hi@dereklawless.net
URL:http://dereklawless.net/
ORG:TransGloboCorp
END:VCARD

<div class="vcard">
  <a class="url fn" href="http://dereklawless.net/">Derek Lawless</a>
  <a class="email" href="mailto:hi@dereklawless.net" />
  <div class="org">TransGloboCorp</div>
</div>
					

1.4. Advantages

Built on Existing Standards

By leveraging XHTML, microformats can be transparently introduced wherever XHTML can be parsed. Further, by codifying schemas from existing IETF specifications such as iCalendar and vCard as compound microformats, data sharing and transformation are made much simpler.

Designed for Humans First

Microformats are designed for humans first, and machines second. One of the principles of microformats is that they must not only be parseable; they must also be presentable. They are intended to be easy to write and decipher.

Because of this, the format for both human and machine consumption is that same and microformats are embedded into content directly. This also leads to a reduction in the redundancy that is caused by having one resource for humans and another for machines (such as a separate RSS news feed).

Modular and Embeddable

Microformats are designed with modularity in mind and by avoiding unnecessary dependencies on external resources (such as the URI of the hosting page); they can be easily embedded in any XHTML content.

It is also easy to compose microformats using other microformats, creating new compound microformats in the process.

Implicit Metadata

Implicit metadata can be added to content by virtue of using the correct microformat convention. As microformats are machine-readable, search engines and other applications can conceivably use this metadata to infer semantics, or to establish relationships and connections between individuals and content.

Plug and Play JavaScript

Because microformats leverage XHTML they can be targeted and manipulated using JavaScript and the DOM. An application of this can be seen in the Mozilla Firefox extension Tails Export that supports the detection and export of microformat conventions found on a web page.

1.5. Disadvantages

Dependant on XHTML

A microformat is an inherently dependent rather than standalone format, in that it uses XHTML. As such, in order to utilise a microformat, XHTML must also be supported in the host document.

Verbose

As microformats are implemented using XHTML, they inherit the verbosity common with XML-based languages. Of more concern is that by using the same format for human and machine consumption, services such as the syndication of Atom or RSS feeds may become much more bandwidth intensive 10.

Harvestable

As microformats are open conventions that seek to apply semantic value to XHTML content, the risk of codified content being successfully harvested for less than desirable purposes is increased. The abuse of hCard for spamming purposes is of particular concern.

1.6. Summary

In this article a definition of microformats has been provided and the issues of conveying semantics on the Web that microformats seek to address have been highlighted.

The microformat hierarchy has been discussed along with further examination of the two compound microformats of specific interest to this project, hCalendar and hCard, and how provide semantic XHTML representations of the IETF iCalendar and vCard specifications.

Finally, some of the advantages and disadvantages of microformats have been examined.

References

  1. Microformats.org (external link)
  2. Semantic Web (external link)
  3. Celik T., Real World Semantics (external link)
  4. hCalendar (external link)
  5. Internet Engineering Task Force, RFC 2445: Internet Calendaring and Scheduling Core Object Specification (external link)
  6. adr microformat (external link)
  7. geo microformat (external link)
  8. hCard (external link)
  9. Internet Engineering Task Force, RFC 2426: vCard MIME Directory Profile (external link)
  10. Pilgrim M., Syndication is not publication (external link)

Colophon

Last updated November 12, 2007. Creative Commons licensed.