SCAPE: Big Data Meets Digital Preservation
The digital collections of scientific and memory institutions – many of which are already in the petabyte range – are growing larger every day. The fact that the volume of archived digital content worldwide is increasing geometrically, demands that their associated preservation activities become more scalable. The economics of long-term storage and access demand that they become more automated. The present state of the art fails to address the need for scalable automated solutions for tasks like the characterization or migration of very large volumes of digital content. Standard tools break down when faced with very large or complex digital objects; standard workflows break down when faced with a very large number of objects or heterogeneous collections. In short, digital preservation is becoming an application area of big data, and big data is itself revealing a number of significant preservation challenges.