Does the Google/Bing/Yahoo Schema.org Markup Promote Invalid HTML?

[Update on 10-Jun-2011: The answer to the question of the title is "not really" -- see the update at the bottom of this post and the comments for more information.]

Yesterday Google, Microsoft Bing, and Yahoo! announced a project to promote machine-readable markup for structured data on web pages.

Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure.

The problem is, I think, that the markup they describe on there site generates invalid HTML. Did they really do this?

Google Search Engine Adds Support for RDFa, Or Do They?

Via a post and an interview on the O’Reilly Radar blog, Google announced limited support for parsing RDFa statements and microformat properties in web page HTML coding and using those statements to enhance the relevance of search results as so-called “rich snippets”. In looking at the example review markup outlined in the O’Reilly post, though, I was struck by some unusual and unexpected markup. Specifically, that the namespace was this http://rdf.data-vocabulary.org/ thing that I had never seen before, and the “rating” property didn’t have any corresponding range that would make that numeric value useful in a computational sense.

PHP Script for hCalendar to iCalendar Conversion

I try to do the “right thing” in postings on DLTJ. In the context of this discussion “right” is an attempt to be progressive: including hCalendar microformat markup for postings that include mention of events. The latest example of this was yesterday’s posting of the Learning, Libraries and Technology Conference. Embedded in the first paragraph is markup that another application reading the DLTJ feed can use to understand that the posting is talking about an event. (The Technorati Events” service is one example.) The key parts of the HTML are bolded below: