Skip to content
Solely for the Purpose of Catching $PAMRZ

Google and DataNet: Two Ships Passing in the Night, or Maybe Something More?

Wired Magazine’s blog network says “Google to Host Terabytes of Open-Source Science Data” while the National Science Foundation (NSF) is reviewing submissions to the DataNet solicitation “to catalyze the development of a system of science and engineering data collections that is open, extensible and evolvable.” On the surface, you might think they are working on the same project, but there is more here than meets the eye (or, rather, the ear listening to these two sound-bites).

Disclosure: OhioLINK is a named party in a submission by The Ohio State University to the NSF DataNet solicitation. We’re looking forward to a positive reception to our proposal in the first round of DataNet reviews.

As with most things Google, the real nuts and bolts of their strategy are unknown until Google chooses to unveil them. This much seems to be known: under the moniker of “Google Research” the company will make large datasets available to the world for free. According to the Wired article, “two planned datasets are all 120 terabytes of Hubble Space Telescope data and the images from the Archimedes Palimpsest, the 10th century manuscript that inspired the Google dataset storage project.” Sources at Google told Wired that the Research site will offer YouTube-style annotation features and data visualization technology purchased from Gapminder last year. Part of the plan also includes the shipping and loaning of large disk packs so the data doesn’t have to flow across the internet. The presumed home of Google Research is http://research.google.com/. At this point, that URL describes contributions by Google staff to the research community, but I’m guess that will change when the new service is brought public.

On the other hand, the NSF DataNet solicitation envisions a new type of organization that “will integrate library and archival sciences, cyberinfrastructure, computer and information sciences, and domain science expertise to: provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data…; anticipate and adapt to changes in technologies and in user needs and expectations; [perform R/D in] computer and information science and cyberinfrastructure…; and serve as component elements of an interoperable data preservation and access network.” More than a service, DataNet seeks a model of organization that brings varied expertise together on the issues surrounding data curation. By way of comparison, it would seem like NSF thinks of this as a people challenge while Google Research thinks of it as a technology platform challenge.

A technology platform is certainly part of the DataNet needs, but not all of it. As one of the commenters in the Wired article noted, “masses of data are of course completely useless without extensive meta-data describing provinence.” Still, given the cyberinfrastructure that Google can bring to bear on the problem of large scale data archiving and the dataset visualization technology that they now have in house, it is a big part of a potential solution. One wonders about the viability of creating a response to the DataNet solicitation that, in effect, outsources the cyberinfrastructure piece to Google and focuses on building the sustainable organization model surrounding the description and dissemination of the data.

Anybody working on that?

4 Comments

  1. Stuart Weibel | January 24, 2008 at 7:14 pm | Permalink

    The speculation at the end of the post:

    One wonders about the viability of creating a
    response to the DataNet solicitation that, in effect, outsources
    the cyberinfrastructure piece to Google and focuses on building the
    sustainable organization model surrounding the description and
    dissemination of the data.

    raises the question of why
    Google would enter into such an arrangement? Their model — bring
    eyes into proximity to Google-ads — may be an answer, but by and
    large they aren’t known as an organization eager to surrender
    policy or strategy choices to others, which is pretty much what
    would be implied in one of the ‘new organizations’ that NSF calls
    out as a central objective of the DataNet solicitation. Interesting
    idea, though. stu, who is also a DataNet supplicant.

  2. Stuart Weibel | January 24, 2008 at 7:17 pm | Permalink

    hoping its obvious that I’ve misused the “quote selected text”
    option… the jester’s words, not mine, are quoted from the above
    post

  3. the jester | January 24, 2008 at 9:41 pm | Permalink

    Stu –

    First, I patched the problem that attributed the quote to you; it is apparently a known bug in the Quoter plug-in to WordPress. The attribution line is gone, at least, until a permanent fix is made.

    As to your question of whether Google would enter into a ‘new organization’ arrangement with partners: that is a real unknown in this scheme. Google has not been very transparent in many of its motives beyond its stated mission to “organize the world’s information and make it universally accessible and useful.” Long-time readers of DLTJknow that I’m aware that commercial motives are not in perfect alignment with not-for-profit (particularly higher education) motives. Still, if we are not successful in the first round of DataNet partner selection, I would propose that we approach Google in a partnership that would pair their high-tech cyberinfrastructure with a high-touch cadre of specialists that would help researchers effectively get their stuff into the Google Research buckets. Or, said another way, the high-touch portions of the partnership would bring better data and metadata to the Google Research site.

    It would me, of course, that the working relationship would extend to the parts of the DataNet solicitation that call for interoperability with other DataNet sites, and that might mean opening up programming interfaces into Google Research that they had not intended to pursue. It would also likely mean that Google would need to add data preservation to their suite of services surrounding the Google Research project. From the outside, it is difficult to tell whether Google Research is already considering the issues of preservation, or if they just consider it an “access” platform.

    If anyone inside Google is reading this and finds it interesting, feel free to give me a call. My contacts inside your organization are severely limited to non-existent.

  4. simonfj | March 12, 2008 at 4:03 pm | Permalink

    Peter, (Stu),

    I floated into your (very nice) domain while trying to track down the Blue Ribbon task force’s first page and also checked out your comments elsewhere.

    A question re: “the new organization”. I know everyone should have a blog today. The idea I guess is so that you have the pleasure of splitting every conversation, about similar things at different times, between hundreds of domains. As librarians I suppose you’re ensuring the growth of stuff which will need classifying (with metadata of course).

    Can you tell me though. Taking that everyone in your library world considers the term ‘data’ to be interchangeable with ‘information’, and that communication between peers must reflect their institution’s standing i.e. each must pump out the same information, like this press release, http://www.oclc.org/news/releases/200692.htm, which litters about 20 domains (so far). Do you think there may come a time where you and your peers might collaborate to produce and share an interactive environment?

    I was just considering this embryonic interactive TV station/ Like most others it hits the wall as soon as they try and figure out how communities can be categorized in a (tool centric) domain. http://www.scivee.tv/about

    Any chance we could reverse the approach = classify a community first (maybe using a bibliographic number) and then see about backing some tools into their domain.

Post a Comment

Your email is never published nor shared. Required fields are marked *
Human Detection Scheme
(What's this?)
Comment Preview

Subscribe without commenting

From the Disruptive Library Technology Jester (http://dltj.org/), printed on Friday the 25th of July 2008 at 8:17:27 AM EDT (-0400). The URL to this page is http://dltj.org/article/google-and-datanet/

[Creative Commons Logo] This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.