The best way to fight the bit rot apocalypse is to prevent it before it becomes a problem. As Alt Girl reminds Regular Guy in Night of the Living Bit Rot , there are three steps to keeping your data and electronic records safe: hash your data, back up your data in different places, and check your data periodically to make sure it hasn’t changed.
Hashing data uses a secure hash algorithm to create a unique value for each piece of data. Because the value is unique to the bits associated with the data as it is in a specific point in time, hash values can be used to ensure that no changes have been made to the data over time. The State Archives of North Carolina recommends using a SHA-256 algorithm for hashing data.
Hashing data sounds complicated, but luckily there are tools to make creating hash values easier. The State Archives uses a free, open-source tool created by the Library of Congress called Bagger to collect data and create hash values. Bagger collects data into a folder they call a “bag”; within the bag is a folder containing all the data that has been put into the “bag”, a manifest of all the files contained in the bag, and a file containing hash values for each file in the bag. Bagger can create the bag and the hash values, as well as validate the files in the bag in order to ensure that the files have not changed, all within an easy to use GUI! The State Archives has created additional guidance on how to use Bagger that can be found here: http://archives.ncdcr.gov/For-Government/Digital-Records/State-Archives-Digital-Repository#transfer
Once your data has been hashed, copies of the data should be stored in multiple places in order to ensure that, if the bit rot apocalypse does come for one of your storage servers, there is another, unchanged copy for you to use as backup. Remember the acronym LOCKSS: Lots Of Copies Keep Stuff Safe! Ideally these copies should be stored in different places. For example, our Digital Repository, the State Archives has onsite storage, as well as backups in Ashville and multiple copies in the cloud using a service called Duracloud. This helps us avoid losing valuable electronic records in the case of a disaster, like a fire or flood.
Finally, don’t set it and forget it! Make sure you are checking your data on a regular basis to make sure that nothing has changed. Hash values will make it easy to check individual data items to make sure that they have not changed at the bit level. However, this is also a good time to check other things about your files. What file formats do you have? Can you still open them, or should you think about migrating them to a more stable file format so that you can continue to access the information they contain? What about the storage itself? How old is the drive or server where you are storing your data? Is it time to think about migrating the data to newer media?
Avoiding the bit rot apocalypse is a big job, but easy to do by taking the time to plan ahead for the care and maintenance of your electronic records. If you are overwhelmed by all of this information, don’t worry! There are staff in the State Archives who are happy to help you make sense of all of this and plan ahead to care for your data. We also have a number of resources available online:
And remember: not all data is meant to be kept forever. Make sure you check your retention schedule here http://archives.ncdcr.gov/For-Government/Retention-Schedules to make sure you are deleting materials that have met their retention, and preserving materials meant to be kept for a long time.
Together, we can fight the bit rot apocalypse!