A vast and ageing volume of information exists in digital form, and the pace that this information is created is continuously accelerating. Items, such as original objects within collections, are often reformatted through digitisation, adding to a rapidly growing repository of digitally generated content.
There are significant challenges associated with ensuring continued access to digital objects. There is a very real threat that these objects will be created in a way that cannot even assure their short-term viability, much less their preservation for future generations.
The key issues can be broken down into the following categories:
One: Continued access
Digital objects are stored as digital files in a particular file format on physical devices.
Digital files, file formats, and physical devices all suffer from the tendency to change over time, i.e., mutability. This can occur due to technological format change, both digital and physical — when was the last time you accessed a floppy disc, or opened a lotus 1-2-3 file?
It can also occur through accidental change like scratching a CD-ROM, or a power surge through a server. The most insidious form of change is decay and digital technology is not immune. All digital storage systems suffer from decay to some degree with unpowered storage (external drives) being most at risk compared to large server farms which are the safest option.
Backups are only useful if you can ensure that you will be able to identify the damaged files before they become backups themselves. However, having well maintained copies of files does not guarantee access: digital file formats change over time and the intended user may no longer have the appropriate devices or software to access them.
Two: Keeping the meaning of the data
It is imperative to maintain knowledge about the digital object, as the digital file may not contain sufficient information to allow the user to understand the content. Without it drawing files, databases, photographs, and all other files become meaningless and pointlessly fill up storage.
Keeping the context of the data and its dependencies is vital to the preservation of the digital objects themselves.
Three: Maintaining Trust in the Data
Keeping a record detailing the custody of a digital file is critical to trust. Without provenance, the digital file is not trustworthy and cannot be relied upon as a true record.
What is Digital Preservation?
Digital preservation is a practice that enables organisations to have trust in the permeance of important collections of digital files.
The American Library Association defines this as:
The term has also been defined by the Society of American Archivists in their Dictionary of Archivists Terminology as:
The practice of digital preservation covers the following activities:
Storage of the objects themselves alongside metadata about the object that describes context (including provenance), relationships, and preservation activity.
Unless the files have restrictions associated to their access, the storage medium should remain accessible to the intended users from their usual locations. If the files can only be accessed in a location that the user does not frequent or have easy access to, then the files will not be useful. Cloud-based services often provide the most accessible and durable storage solutions.
Checking that the digital file is undamaged, usually through transparent fixity checks, and replacing the file with a good backup if it is damaged.
Fixity refers to the use of checksums to ensure that files have not altered. A checksum on a file is a ‘digital fingerprint’ where even the smallest change to the file will cause the checksum to change completely. Checksums are typically created using cryptographic techniques and can be generated using a range of readily available and open source tools.
Fixity checks should be run frequently (i.e., several times a year) to provide assurance that the preserved files remain healthy. This provides the opportunity to restore damaged files before their backups are overwritten or suffer from decay themselves.
Migrating the file to a modern accessible format if the original format is deemed to be obsolete, i.e., not accessible by the intended user.
File format migration requires careful planning to ensure that all meaningful information is transferred to the modern format. A document that loses contextual formats such as page layouts, headings, indexes become less useful. A photograph that no longer accurately carries colour information may not pass on important nuances to the viewer.
When migrating file formats, it is essential to retain a copy of the original so that embedded information is not lost, as it will always be possible to find an expert to access these files when necessary.
So, digital preservation can be summarized as: ensuring the information contained within the file remains accessible to the intended audience with the digital tools they might reasonably be expected to have available to them.
How does Recollect make it easy to implement digital preservation?
Recollect has a very flexible metadata engine that allows you to configure administrative, descriptive, and preservation metadata that suits your collection and your collecting policies. Fields can be configured with internal access for management information, or as external access for the user.
Recollect incorporates digital preservation principles into core operations, this means that no additional workflow is required beyond normal collection management processes. The digital preservation processes are as follows:
Creating the record, or intellectual entity
Digital files are ingested into Recollect with their metadata — this is the submitted information package (SIP).
During the ingest process, Recollect:
-
-
-
- Scans for viruses.
- Validates the digital file format using industry standard tools, e.g., JHOVE, DRIOD.
- Creates a record in Recollect and stores the metadata in this record. This becomes the intellectual entity that ties together the information and the files.
- Creates and stores a checksum for the original.
- Stores the original and creates a backup copy. The original and metadata become the archival information package (AIP).
- Generates access copies of the original and stores them in a logically separate location. When combined with the metadata, these access copies become the dissemination information package (DIP).
-
-
Fixity Checks
Every 90 days, Recollect will recalculate the checksum of each original file. If the file fails the fixity check, the Recollect self-healing process will copy the backup over the original and revalidate the file.
Recollect provides an easy-to-read report showing the results of each fixity check scan and both the checksum and the health status can be seen in each record.
Format Migration
When obsolete files are identified, Recollect enables the storage of an accessible original along with the original. The newer format accessible original is also included in the fixity checks and will generate up to date accessible copies for dissemination.