Issue 104: Long Term Digital Storage
This week's Thursday Threads looks at digital storage from the past and the future. There are articles about the mechanics of massive data storage systems in tech giants like Google and Amazon, the still existing use of floppy disks in certain industries, and the herculean efforts of digital archivists to access stored data from outdated mediums.
This week:
- Hard drives seem indestructible, especially compared to previous forms of storage. So we went all in on digitizing and converting and storing on hard drives. But what if the hard drives start failing?
- You've been tasked with storing data. You don't know what the data is or how important it is, but you have to give it back when asked. When your goal is outliving the heat death of the universe.
- It is the rare company that reaches the size of Google, Amazon, or Facebook. These companies have a lot of data, and they want to make sure it is findable and usable anywhere in the company. What distributed storage looks like.
- When was the last time you used a floppy disk? There are some industries that still use them every day.
- Archives everywhere have stacks of old floppy disks. Read about the techniques that archivists use to recover what is on them.
- Your job is to store data that outlasts your career. What medium do you use? How do you distribute it? How to think about century-scale storage.
- This Week I Learned: In Ethiopia, time follows the sun like nowhere else.
- Obligatory Cat Photo: Alan and Mittens squabble at the cat tree
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ's Thursday Threads, visit the sign-up page. If you would like a more raw and immediate version of these types of stories, follow me on Mastodon where I post the bookmarks I save. Comments and tips, as always, are welcome.
Hard Drives Go Bad
This article focuses on the music industry, but its story is applicable across all fields. Music production once used multi-track analog tape (where splicing was done with physical cuts and tape); when the process was done, the analog tape went into storage. Alarms went up in the field about media deterioration and a lot of effort was made to digitize the source materials. Those digitized artifacts were stored on hard drives, and everyone assumed they were now safe. But preservation of digital media is an active process — one can't assume that the disks will spin and that the software to read the files still runs.
When your goal is out living the heat death of the universe
The idea that struck me in this article is that as service provider like Amazon can't distinguish between what is important and what is not: if a customer asked Amazon to store it, it will do its best to make sure it retrievable. How much storage is in use — multiple copies on multiple drives in multiple servers and multiple locations — for files that have zero value?
Distributed Storage Systems
This 13-year-old article explores the massive data storage systems utilized by major tech companies like Google, Amazon, and Facebook to manage their vast information stores. Traditional methods of scaling storage, such as increasing disk capacity or adding more servers, fall short at the size of in cloud computing environments. While you may not ever operate at the scale of these companies, it is interesting to read about how the tech giants do data storage and management. (The article's subtitle also refers to "big data" — a phrase that was fashionable in the previous decade but one which we don't hear much about anymore.)
Industries are still using floppy disks
8-inch floppy disks were invented in the early 1970s; they could store a megabyte a piece. 5.25-inch floppy disks were introduced in late 1970s; while obviously smaller, its high density capacity could also store about a megabyte and a quarter per disk. 3.5-inch disks (no longer called "floppy" because they were in a hard plastic case) came to the market in the early 1980s and could store a megabyte and a half. Each of these formats are still used today. (Maybe not the 8-inch floppies; those were retired from nuclear weapons silos in 2019.)
Reading old floppy disks
Speaking of floppy disks, digital archivists from Cambridge University Library and Churchill Archives Centre detail their efforts to create copies of 5.25-inch floppy disks. Remember 5.25-inch floppy disks? From soliciting donations of old floppy disk drives to the hardware and software required to access these old disks on new hardware, the report is a fascinating look at the past (and maybe a preview of what future generations will need to do to read today's digital storage media).
Century-scale Storage
This 15,000-word essay looks at digital storage from the earliest hard drives (including restoring data from a 1960s-era IBM hard disk prototype) to the cloud to old fashion print-on-paper. There are discussions of the reliability and longevity of different storage methods, such as RAID systems, cloud storage, and physical media like vinyl records and tape drives. But it isn't just the physical medium...the article also highlights the importance of institutional commitment, funding, and cultural values in ensuring the preservation of data. Ultimately, the writers suggest that successful century-scale storage requires a combination of methods, a culture of vigilance, and a commitment to preserving human cultural memory.
This Week I Learned: In Ethiopia, time follows the sun like nowhere else
This could have easily gone in last week's Thursday Threads on time standards. There are 12 hours of daylight, numbered 1 through 12. Then 12 hours of night, numbered 1 through 12. What could be easier?
Alan and Mittens squabble in the cat tree
These two troublemakers. Alan is the cat on top, looking down on Mittens below. In this cozy sunlit room with a cat tree by an open window, you'd think these two would get along. Not so. Alan's typical perch is on top of the cat tree, so it is Mittens that is intruding (if you could call it that.)