Does the Google/Bing/Yahoo Markup Promote Invalid HTML?

[Update on 10-Jun-2011: The answer to the question of the title is “not really” — see the update at the bottom of this post and the comments for more information.]

Yesterday Google, Microsoft Bing, and Yahoo! announced a project to promote machine-readable markup for structured data on web pages.

Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure.

The problem is, I think, that the markup they describe on there site generates invalid HTML. Did they really do this?

“The Challenges of User Consent” — Handling Shibboleth User Attributes

One of the great things about the Shibboleth inter-institution single sign-on software package is the ability for the Identity Provider to limit how much a Service Provider knows about a user’s request for service. (Not familiar with those capitalized terms? Read on for definitions.) But with this capability comes great flexibility, and with the flexibility can come lots of management overhead. So I was intrigued to see the announcement for an online webinar from the InCommon Shibboleth Federation with the title “The Challenges of User Consent” covering the issues of managing who gets access to what information about users.

Recordings from Code4Lib Virtual Lightning Talks Available

Thanks to everyone for participating in the first Code4Lib Virtual Lightning Talks on Friday. In particular, my gratitude goes out to Ed Corrado, Luciano Ramalho, Michael Appleby, and Jay Luker being the first presenters to try this scheme for connecting library technologists. My apologies also to those who couldn’t connect, in particular to Elias Tzoc Caniz who had signed up but found himself locked out by a simultaneous user count in the presentation system. Recordings of the presentation audio and screen capture video are now up in the Internet Archive.

Name Topic
Edward M. Corrado CodaBox: Using E-Prints for a small scale personal repository

What To Do With ISO 2709:2008?

My employer recently became a member of NISO and I was made the primary representative. This is my first formal interaction with the standards organization heirarchy (NISOANSIISO) and as one of the side effects I’m being asked to provide advice to NISO on how its vote should be cast on relevant ISO ballots. Much of it has been pretty routine so far, but today one jumped out at me — the systematic review for the standard ISO 2709:2008, otherwise blandly known as Information and documentation — Format for information exchange. You might know it as the underlying structure of MARC. (Though, to describe it accurately, MARC is a subset or profile of ISO 2709.) And the voting options are: Confirm (as is), Revise/Amend, Withdraw (the standard), or Abstain (from the vote).

Iron Mountain to Close its Virtual File Store Service

About two years ago I wrote a blog post wondering if we could outsource the preservation of digital bits. What prompted that blog post was an announcement from Iron Mountain of a Cloud-Based File Archiving service. Since then there have been a number of other services that have sprung up that are more attuned to the needs of cultural heritage communities (DuraCloud and Chronopolis come to mind), but I have wondered if the commercial sector had a way to do this cheaply and efficiently. The answer to that question is “maybe not” as Iron Mountain has told Gartner Group (PDF archive) that it is closing its Virtual File Store services and its Archive Service Platform.

IPv4 Address Space Disappearing, Here Comes IPv6

Last week in DLTJ Thursday Threads I posted an entry about running out of IP addresses. Since I posted that, I’ve run across a couple of other stories and websites that bring a little more context to the consequences of last week’s distribution of the last blocks of IP addresses from the world-wide pool of available addresses. The short version: channel any panic you might be feeling into making sure your systems are ready to communicate using both the existing network standard (IPv4) and the new network standard (IPv6).

The Imagined Frequently Asked Questions

New Web Expectations and Mobile Web Techniques

Late last year I was asked to put together a 20-minute presentation for my employer (LYRASIS) on what I saw as upcoming technology milestones that could impact member libraries. It was a good piece, so I thought I’d share what I learned with others as well. The discussion was in two parts — general web technologies/expectations and mobile applications/web.

Slight Tweak to WordPress Broken Link Checker Plugin

In a futile effort to fight link rot on DLTJ, I installed the Broken Link Checker plugin by “White Shadow”. I like the way it scans the entire content of this blog — posts, pages, comments, etc. — looking for pages linked from here that don’t respond with an HTTP 200 “Ok” status code. The dashboard of problem links has a nice interface for updating or deleting these links, including the ability to add a CSS style deleted links to note that they were formerly there. One of the things I wished it did, though, was to add a message to posts/pages that noted a link was changed or deleted. You know — just to document that something changed since the page was first published. Tonight I hacked into the code to add this function. And with apologies to the original author of this beautifully structured object-oriented PHP code, it is a gruesome hack.

The PERL Way to Add OmniFocus Inbox Entries from Twitter

Over the weekend I got the bright idea of asking OmniGroup to ask an iPhone voice recognition application (like Dragon Dictation) to add a link to the OmniFocus iPhone application. That way I could simply dictate new inbox items on the iPhone rather than laboriously typing them with the on-screen keyboard. Before making the suggestion, I searched the OmniFocus User Forum for “voice recognition” to see if anyone else had suggested the same thing. As it turns out, there were a few posts that had instructions from people using Twitter as an intermediary. Unfortunately, they either required a desktop Twitter client to be running all of the time or used the now deprecated BasicAuth-based Twitter authentication scheme. So I created my own.

Using Twitter For Service Outage Awareness

Emily Clasper of the Suffolk County Library posted about some work she had done to embed status messages in the catalog using Twitter. This sounded like a really great idea because it is an out-of-band (e.g. something that doesn’t rely on OhioLINK infrastructure for reporting downtimes) way to get messages to member staff and users. But I didn’t get a chance to work on my implementation for a while, so for over a year ideas have bubbled around in my head about ways to apply this technique and improve on it. I finally carved out some spare time to actually work on it, and came up with my take on the concept. The result is the OhioLINK Status-Via-Twitter service.

A demo of the TwitterJS implementation using a copy of the OhioLINK homepage.