Two entries on big data lead this week's edition of DLTJ Thursday Threads. The first is at the grandest scale possible: a calculation of the amount of information in the world. Add up all the digital memory (in cell phones, computers, and other devices) and analog media (for instance, paper) and it goes to a very big number. The authors try to put it in perspective, which for me brought home how insignificant my line of work can be. (All of our information is still less than 1% of what is encoded in the human DNA?) The second "big data" entry describes an effort to make sense of huge amounts of data in the National Archives through the use of visualization tools. Rounding out this week is a warning to those who run public computers -- be on the look-out for key loggers that can be used to steal information from users.
If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my FriendFeed stream (or subscribe to its feed in your feed reader). Comments and tips, as always, are welcome.
How Much Information Is There in the World?
[caption id="attachment_2649_video" align="alignright" width="229" caption="Video with author of paper (4 minutes)"][/caption] So how much information is there in the world? How much has it grown?
Prepare for some big numbers:
- Looking at both digital memory and analog devices, the researchers calculate that humankind is able to store at least 295 exabytes of information. (Yes, that’s a number with 20 zeroes in it.)
Put another way, if a single star is a bit of information, that’s a galaxy of information for every person in the world. But it’s still less than 1 percent of the information stored in all the DNA molecules of a human being.
- 2002 could be considered the beginning of the digital age, the first year worldwide digital storage capacity overtook total analog capacity. As of 2007, almost 94 percent of our memory is in digital form.
- In 2007, humankind successfully sent 1.9 zettabytes of information through broadcast technology such as televisions and GPS. That’s equivalent to every person in the world reading 174 newspapers every day.
Feeling swamped in data? You probably don't have it too bad. Also see podcast interview (12 minutes) with one of the authors that briefly describe some of the findings in the original paper (subscription to Science Magazine required).(4 minutes, embedded above) and a
A Window on the Archives of the Future
In collaborating with NARA, members of TACC’s Data and Information Analysis group developed a multi-pronged approach to address technical challenges. The overall goal of their research is to investigate different data analysis methods within a visualization framework. The visualization interface is the bridge between the archivist and the analysis results, which are rendered visually onscreen as the archivists make selections and interact with the data. The results are presented as forms, colors and ranges of color to assist in synthesis and to facilitate an understanding of large-scale electronic records collections.
This article from the Texas Advanced Computing Center describes a research project to visualize the volumes of digital data in the National Archives. The visualization provides information about the amount of particular types of information, an assessment of the risks to files in the archive based on file type, and other metrics. A brief paper from the Society for Imaging Science and Technology "Archives" proceedings last year, Visualization for Archival Appraisal of Large Digital Collections [PDF], goes into more detail.
Hardware keyloggers discovered at public libraries
[caption id="attachment_2649" align="alignright" width="170" caption="USB Key Logger, courtesy of Sophos"][/caption] Public libraries in Manchester, England, have been advised to keep their eyes peeled for USB bugs after two devices were discovered monitoring every keystroke made by every user of affected PCs.
According to local media reports, the small surveillance devices were found attached to the keyboard sockets at the back of two PCs in Wilmslow and Handforth libraries.
Sophos, maker of internet security software, posted this notice about key-logging devices attached to public library computers in the U.K. This device would make it possible to capture usernames and passwords typed at the keyboard by patrons. The article goes on to suggest actions: conduct frequent checks of hardware and to plug keyboards into USB ports on the front of computers for easy visual inspection. [Via Jessamyn West]