microformats2 parsing specification

From Microformats Wiki
Jump to navigation Jump to search

microformats2 is a simple, open format for marking up data in HTML. The microformats2 parsing specification describes how to implement a microformats2 parser, independent of any specific vocabularies.

Status
This is a Living Specification with several interoperable implementations. This specification is stable, subject to editorial changes only for improving clarity of existing meaning. While substantive changes are unexpected, it is a living specification subject to substantive change by issues and errata filed in response to implementation experience, requiring consensus among participating implementers (since 2015-01-21) as part of an explicit change control process. There are currently no draft or proposed new features in this specification, and if any were to be added, they would be explicitly labeled as such.
Note: This specification is only marked as a "Draft Specification" because of pending edits from resolved issues before 2016-06-20. Once those edits have been completed, the link to [[Category:Draft Specifications]] at the bottom of this document should be changed to [[Category:Specifications]].
Participate
Open Issues
Resolved issues before 2016-06-20
IRC: #microformats on Libera
Editor
Tantek Çelik
License
Per CC0, to the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work. In addition, as of 2025-06-11, the editors have made this specification available under the Open Web Foundation Agreement Version 1.0.

algorithm

parse a document for microformats

To parse a document for microformats, follow the HTML parsing rules and do the following:

  • start with an empty JSON "items" array and "rels" & "rel-urls" hashes:
{
 "items": [],
 "rels": {},
 "rel-urls": {}
}

Parsers may simultaneously parse the document for both class and rel microformats (e.g. in a single tree traversal).

parse an element for class microformats

To parse an element for class microformats:

  • parse element class for root class name(s) "h-*" and if none, backcompat root classes
    • if none found, parse child elements for microformats (depth first, doc order)
    • else if found, start parsing a new microformat
      • keep track of whether the root class name(s) was from backcompat
      • create a new { } structure with:
        • type: [array of unique microformat "h-*" type(s) on the element sorted alphabetically],
        • properties: { } - to be filled in when that element itself is parsed for microformats properties
        • if the element has a non-empty id attribute:
          • id: string value of element's id attribute
      • parse child elements (document order) by:
        • if parsing a backcompat root, parse child element class name(s) for backcompat properties
        • else parse a child element class for property class name(s) "p-*,u-*,dt-*,e-*"
        • if such class(es) are found, it is a property element
          • add properties found to current microformat's properties: { } structure
        • parse a child element for microformats (recurse)
          • if that child element itself has a microformat ("h-*" or backcompat roots) and is a property element, add it into the array of values for that property as a { } structure, add to that { } structure:
            • value:
              • if it's a p-* property element, use the first p-name of the h-* child
              • else if it's an e-* property element, re-use its { } structure with existing value: inside.
              • else if it's a u-* property element and the h-* child has a u-url, use the first such u-url
              • else use the parsed property value per p-*,u-*,dt-* parsing respectively
          • else add found elements that are microformats to the "children" array
      • imply properties for the found microformat (see below)

The "*" for root (and property) class names consists of an optional vendor prefix (series of 1+ number or lowercase a-z characters i.e. [0-9a-z]+, followed by '-'), then one or more '-' separated lowercase a-z words.

parse an element for properties

parsing a p- property

To parse an element for a p-x property value (whether explicit p-* or backcompat equivalent):

  • Parse the element for the value-class-pattern. If a value is found, return it.
  • If abbr.p-x[title] or link.p-x[title], return the title attribute.
  • else if data.p-x[value] or input.p-x[value], then return the value attribute
  • else if img.p-x[alt] or area.p-x[alt], then return the alt attribute
  • else return the textContent of the element after:
    • dropping any nested