Archives » March 21st, 2009

March 21, 2009

History Lost

Crossposted from Seattlehound.

It would be tough to find someone more pro-digital than me. But at the same time there’s one bit of being all-digital that always makes me uneasy when I think about it, and that’s archiving. A digital datastore can contain an entire warehouse worth of documents and photos, all squeezed into the physical space of one hard drive. And that’s amazing. But it’s also amazingly easy to lose that hard drive, or to drop it and damage it, and doing so is the equivalent of an entire warehouse burning down in a matter of seconds.

So we make backups. And so we keep data off site, and sync data to the internet. I have my personal photo collection stored on two hard drives in my house and one at my office. But even that makes me uneasy, because I have to keep that collection going through active effort. It’s all tied to me, and if I’m gone there’s no one around who will know what to do with this stuff. There really is no digital equivalent of coming across a shoebox of old photos in Grandma’s attic. I wonder what we’re losing because of that.

Newspapers, too. It feels like the one thing we’re losing in the transition to digital news is the archives.  If you have a daily print run of a newspaper, it’s simple for a someone to grab an extra copy every day and put it in a box somewhere. Enough boxes and you’ve got a library with every issue of the newspaper going back for decades. Some newspapers are scanned into microfilm, so they take up less space and are more durable than paper. With modern technology you can even save a PDF of the paper every day.

But when the “paper” part of the newspaper goes away, how do you do that same kind of archiving? What kind of “daily unit” is there for a blog that can be updated at any time, or a sprawling website with dozens of people contributing to it? What is there that you can take and put away in a box, or on a hard drive even, that will serve as a record for the future? There’s not a lot of money in archiving. Libraries and museums do it as a public good, but they need money from other sources to keep it going. Newspapers sometimes have a staff person responsible for archiving, but that person doesn’t add anything to the bottom line and when it’s time to cut costs that position is at risk. Sure you can build a website where the archives are available electronically for all time, but all it takes is one decision to dump the archives, or for the website to go out of business and the hard drive to get erased for all of it to be gone. If a library burns down it’s a major event. But if a website goes offline and the one backup of the database is deleted, it’s scarcely noticed until it’s too late.

John C. Dvorak wrote about this recently, in an article called Our History: Error 404.

This doesn’t keep me up at night, but it gives me pain in the bottom of my stomach when I think about it. I also get pain reading this story from Benjamin Lukoff, referencing a Seattle Times article, where they say that in the last days of the P-I people were throwing boxes containing decades worth of research, records and documents into the trash.

In other corners of the newsroom, documents and sacred records that took years to accumulate were pulled from filing cabinets and discarded into dumpsters, gone in a matter of minutes.

In some ways this is worse, because it’s being done willfully. This is like a mob of arsonists torching the library. These documents are seen as trash, but then again just about every historic artifact probably was considered trash at the time. These documents could have been scanned, they could have been archived, they could have been donated to a museum or a library, but that all takes time. Time, money, and people, three things that are in short supply at the P-I during the transition.

Who knows what was lost. It could have actually been trash, much of it, documents that will have no significance tomorrow or in a hundred years. But it’s not for us to decide what will be important in the future, it’s our job to keep this stuff safe so others can decide later. But even then sorting through boxes and boxes of paper will take time, money, and people as well, and who’s to say those won’t still be in short supply 100 years from now?

I don’t have answers, and I don’t have ideas, but it seems like this is a big missing piece in the transition to digital that we’re making, and if a solution doesn’t show up soon it’s not going to happen and the concept of archives that can last across generations will go away.