Interesting Shibboleth Use Case: Enforcing Geographic Restrictions

Last month’s HathiTrust newsletter had an interesting technical tidbit at the top about access to out-of-print and brittle or missing items:

One of the lawful uses of in-copyright works HathiTrust has been pursuing is to provide access on an institutional basis to works that fall under United States Copyright Law Section 108 conditions: works in HathiTrust that are not available on the market at a fair price, and for which print copies owned by HathiTrust member institutions are damaged, deteriorating, lost or stolen. As a part of becoming a member, institutions are required to submit information about their print holdings for fee calculation purposes. We have also been requesting information about the holdings status and condition of works, to facilitate uses of works where permissible by law (specifications for HathiTrust holdings data are available at http://www.hathitrust.org/print_holdings).

E-mail Phishing Attempts Get Trickier: Fake bounced mail and Fake mail-from-scanner

Two phishing1 attempts made it through the work spam filter earlier this month, and they show the creativity of bad guys as they try to get access to your machine. The attempts at social engineering were interesting enough I thought I’d describe them here. We’re getting pretty close the line where we can’t tell a legitimate e-mail from ones with nasty side effects.

The Fake Bounced Message


This message has the appearance of being a bounced e-mail from a server called ‘cyber.net.pk’.
Screenshot of a fake bounced e-mail message.

Screenshot of a fake bounced e-mail message.

W3C Incubator Group Report on Library Linked Data Published

This morning the World Wide Web Consortium (W3C) announced the publication of the final report of the Library Linked Data Incubator Group. The abstract is reproduced below.

The mission of the W3C Library Linked Data Incubator Group, chartered from May 2010 through August 2011, has been “to help increase global interoperability of library data on the Web, by bringing together people involved in Semantic Web activities — focusing on Linked Data — in the library community and beyond, building on existing initiatives, and identifying collaboration tracks for the future.” In Linked Data, data is expressed using standards such as Resource Description Framework (RDF), which specifies relationships between things, and Uniform Resource Identifiers (URIs, or “Web addresses”). This final report of the Incubator Group examines how Semantic Web standards and Linked Data principles can be used to make the valuable information assets that library create and curate — resources such as bibliographic data, authorities, and concept schemes — more visible and re-usable outside of their original library context on the wider Web.

IETF May Form Working Group on “Reputation Services”

Last week I saw a post on the IETF Announcement List seeking feedback on the possible formation of a “Reputation Services” working group. That posting has more information, but the basic abstract is posted below. Now I will admit up front that I tend to see the world through librarian-colored glasses, but creating a mechanism that helps uses make a “meaningful choice about the handling of content requires an assessment of its safety or ‘trustworthiness'” sounds like something librarians should be involved with.

Fixing a Bad SSH authorized_keys under Amazon EC2

I was doing some maintenance on the Amazon EC2 instance that underpins DLTJ and in the process managed to mess up the .ssh/authorized_keys file. (Specifically, I changed the permissions so it was group- and world-readable, which causes `sshd` to not allow users to log in using those private keys.) Unfortunately, there is only one user on this server, so effectively I just locked myself out of the box.

$ ssh -i .ssh/EC2-dltj.pem me@dltj.org
Identity added: .ssh/EC2-dltj.pem (.ssh/EC2-dltj.pem)
Permission denied (publickey).

After browsing the Amazon support forums I managed to puzzle this one out. Since I didn’t see this exact solution written up anywhere, I’m posting it here hoping that someone else will find it useful. And since you are reading this, you know that they worked.

Call for Public Comment — W3C Library Linked Data Incubator Group

The W3C Library Linked Data (LLD) Incubator Group invites librarians, publishers, linked data researchers, and other interested parties to review and comment on drafts of reports to be published later this year. The LLD group has been chartered from May 2010 through August 2011 to prepare a series of reports on the existing and potential use of Linked Data technology for publishing library data. The group is currently preparing:

  • A report describing Benefits of LLD, an Overview of Existing Vocabularies and Data Sets, Relevant Technologies, Implementation Challenges, and Recommendations
  • A survey report of use cases describing existing projects

PPTP VPN for iOS with AT&T Uverse and DD-WRT

Wandering into public or semi-public wireless networks makes me nervous because I know how my network traffic can be easily watched, and because I’m a geek with control issues I’m even more nervous when using devices that I can’t get to the insides of (like phones and tablets). One way to tamp down my concerns is to use a Virtual Private Network (VPN) to tunnel the device’s network connection through the public wireless network to a trusted end-point, but most of those options require a subscription to a VPN service or a VPN installed in a corporate network. I thought about using one of the open source VPN implementations with an Amazon EC2 instance, but it isn’t possible with the EC2 network configuration judging from the comments on the Amazon Web Services support forums. (Besides, installing one of the open source VPN software implementations looks far from turnkey.) Just before I lost hope, though, I saw a reference to using the open source DD-WRT consumer router firmware to do this. After plugging away at it for an hour or so, I made it work with my home router, a AT&T U-verse internet connection, and iOS devices. It wasn’t easy, so I’m documenting the steps here in case I need to set this up again.

Does the Google/Bing/Yahoo Schema.org Markup Promote Invalid HTML?

[Update on 10-Jun-2011: The answer to the question of the title is “not really” — see the update at the bottom of this post and the comments for more information.]

Yesterday Google, Microsoft Bing, and Yahoo! announced a project to promote machine-readable markup for structured data on web pages.

Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure.

The problem is, I think, that the markup they describe on there site generates invalid HTML. Did they really do this?

“The Challenges of User Consent” — Handling Shibboleth User Attributes

One of the great things about the Shibboleth inter-institution single sign-on software package is the ability for the Identity Provider to limit how much a Service Provider knows about a user’s request for service. (Not familiar with those capitalized terms? Read on for definitions.) But with this capability comes great flexibility, and with the flexibility can come lots of management overhead. So I was intrigued to see the announcement for an online webinar from the InCommon Shibboleth Federation with the title “The Challenges of User Consent” covering the issues of managing who gets access to what information about users.

Recordings from Code4Lib Virtual Lightning Talks Available

Thanks to everyone for participating in the first Code4Lib Virtual Lightning Talks on Friday. In particular, my gratitude goes out to Ed Corrado, Luciano Ramalho, Michael Appleby, and Jay Luker being the first presenters to try this scheme for connecting library technologists. My apologies also to those who couldn’t connect, in particular to Elias Tzoc Caniz who had signed up but found himself locked out by a simultaneous user count in the presentation system. Recordings of the presentation audio and screen capture video are now up in the Internet Archive.

Name Topic
Edward M. Corrado CodaBox: Using E-Prints for a small scale personal repository