Does the Google/Bing/Yahoo Schema.org Markup Promote Invalid HTML?

Posted on 2 minute read

× This article was imported from this blog's previous content management system (WordPress), and may have errors in formatting and functionality. If you find these errors are a significant barrier to understanding the article, please let me know.

[Update on 10-Jun-2011: The answer to the question of the title is “not really” – see the update at the bottom of this post and the comments for more information.]

Yesterday Google, Microsoft Bing, and Yahoo! announced a project to promote machine-readable markup for structured data on web pages.

Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure.

- schema.org - Home

The problem is, I think, that the markup they describe on there site generates invalid HTML. Did they really do this?

Take this example from the Event description page:

< !DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>Test</title>
</head>
<body>
<div itemscope itemtype="http://schema.org/Event">
  <a itemprop="url" href="nba-miami-philidelphia-game3.html">
  NBA Eastern Conference First Round Playoff Tickets:
  Miami Heat at Philadelphia 76ers - Game 3 (Home Game 1)
  </a>

  <time itemprop="startDate" datetime="2011-04-21T20:00">
    Thu, 04/21/11
    8:00 p.m.
  </time>

  <div itemprop="location" itemscope itemtype="http://schema.org/Place">
    <a itemprop="url" href="wells-fargo-center.html">
    Wells Fargo Center
    </a>
    <div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
      <span itemprop="addressLocality">Philadelphia</span>,
      <span itemprop="addressRegion">PA</span>
    </div>
  </div>

  <div itemprop="offers" itemscope itemtype="http://schema.org/AggregateOffer">
    Priced from: <span itemprop="lowPrice">$35</span>
    <span itemprop="offerCount">1,938</span> tickets left
  </div>
</div>
</body>
</html>

The problem is in the first <div> line and the attribute ‘itemscope’ that has no value associated with it. If you copy-and-paste that markup into the W3 validator (using the “Validate by Direct Input” option and manually removing the space between the less-than sign and the exclamation point in the first line), it comes back with:

Line 7, Column 16: required character (found i) (expected =)

A bare attribute may be valid in some forms of HTML, but it certainly isn’t valid XML, and I think that will cause all sorts of problems down the line. Does anyone else see this as an issue?

Update

I heard back from one of the keepers of W3C’s validator6, and the xmlns="http://www.w3.org/1999/xhtml" attribute of the html tag was triggering the XML version of the validator. The bare itemscope attribute is valid HTML but invalid XML (important for the XML serialization of HTML), but can be fixed by making it itemscope="itemscope". See the comments for more information.