Articles

CLOCKSS: Preservation of Online Publications

A library that purchases a print subscription can manage and preserve the collection in its institution. But as more and more publishers are transitioning to an e-format, preservation of online publications is increasingly important. Libraries want to be assured that their institutions will have permanent access to the online publications that they have subscribed to or purchased.

A solution to the problem is a third-party archive, independent of the publisher, that can reach subscribers and library patrons if the publisher ceases publishing, goes out of business, or experiences a disruption in service for a long period. Even publishers that are already preserving content or have an internal contingency plan are wise to consider an added level of protection. If a publisher needs to shut down, having this additional level of protection would prevent system-management issues.

CLOCKSS (Controlled Lots of Copies Keep Stuff Safe) is one such permanent archive. A not-for-profit “dark archive” founded by the world’s leading libraries and publishers, it ensures the long-term preservation of online scholarly content.

CLOCKSS uses the award-winning LOCKSS (Lots of Copies Keep Stuff Safe) technology developed at Stanford University. LOCKSS enables librarians to preserve their electronic collections at the institutional level, providing perpetual access for the library.

CLOCKSS goes a step further by ensuring the long-term survival of digital scholarly publications for the entire world’s benefit. The technology preserves the content in the form in which it was originally published online and includes the publishers’ branding. Preservation in the original form ensures that the integrity of today’s content will remain unchanged and readable by tomorrow’s scholars. It also avoids errors that can occur when content is normalized.

Content in the archive is preserved and decentralized in a network of 12 geographically and geopolitically disparate nodes that span the globe. The nodes are in Australia, Canada, Germany, Hong Kong, Italy, Japan, Scotland, and five places in the United States.

CLOCKSS is a closed network that does not provide access, so it is a truly dark archive, and this ensures the security of the content. The archive’s multiple copies improve reliability and safeguard against natural disaster or political instability. The nodes are exact copies of each other and check each other for data integrity. The copies are a brilliant feature to have in case, for example, inclement weather at one of the locations causes a node to go down. If that occurs, 11 copies in the network can bring the 12th location back up to speed in short order.

The archive is governed by a 24-member Board of Directors that is drawn from participating publishers and supporting libraries and makes CLOCKSS a community-governed archive. The Board consists of representatives of 12 publishers and 12 libraries, all having equal say in deciding procedures and overseeing the preservation of e-journals, e-books, and data sets of a rapidly increasing list of participating publishers. It is an exceptional mix of decision makers, and many publishers find comfort in knowing that half the caretakers of the archive are also owners of content that is preserved in the archive.

Before releasing (triggering) content from the archive, the CLOCKSS staff works with the publisher in question to make certain that publication rights have not been transferred to another publisher or reverted to the author. The staff checks with the major aggregators to ensure that the content is not earning royalties for the rights holder. CLOCKSS does not want to provide access to a publication if it is already available, nor does it wish to interfere with a rights holder’s business model. CLOCKSS’s goal is to be the source of last resort.

When CLOCKSS is satisfied that the content is truly orphaned, it asks the Board for a decision to trigger content. Released content is free to everyone and made available under a Creative Commons License. Two locations that participate on the Board—EDINA, a data center at the University of Edinburgh, and Stanford University—are making triggered copies available on their servers, but anyone may host triggered copies. Triggered copies can be viewed at the following URL: www.clockss.org/clockss/Triggered_Content.

Publishers that would like to preserve content or libraries that would like to support the CLOCKSS archive may contact Randy.Kiefer@clockss.org or pub-director@clockss.org or access info@clockss.org.


KIM SMILEY is director of publisher relations for the CLOCKSS Archive.