In addition to the question as to who should archive is the potentially more impotent question of how to archive data. Given the rate of technological change, it is highly unlikely that any system implemented today will be similar to whatever system is used to archive the data in a couple of decades; media decays, standards change, software and the machines that can run them become obsolete and lost. The US Census information from 1960, originally stored on digital tapes, in addition to hundreds of other reels of tapes from multiple departments in the government have already become obsolete 76. Any long term archive will need significant recurring investments to keep it operational.
Long term archiving requires that the data be maintained, easily accessible, displayed and recreated. Moreover, one cannot just print out hard copies of the archive as this defeats the purpose of a digital archive and, it in many cases, much of the information cannot be meaningfully displayed on paper (i.e. hyperlinks)77. The issue of data archiving is complex and mostly beyond the scope of this paper, but we will present, succinctly, some of the options.
It is imperative that whatever system is used, that it allow for easy migration of the data from one system to another, bearing in mind the exponential growth of the archived data. The ability to transfer the data from one system to another, dynamically recreating the entire archive on the new technology is very important in light of the fact that much of the media used to preserve digital data is unstable and does degrade, without active preservation, as opposed to paper archives. Even within the lifetime of the present technologies being used, the storage media on which the digital information is stored have finite lives; data will degrade or be corrupted7879 . Additionally, as the archive grows and technology changes, newer, cheaper and better media will become available for use in storage.
What is needed is a long term solution, one that does not call for heroic efforts or continual interventions to maintain it over the longterm77. One idea is to use some sort of semi structured representation of the data, which would include basic information with each digital object, such as the attributes of the data – its structure and physical context, information regarding the organization of the information, and information regarding the display of the information, (e.g. a user interface)80. The use of platform independent technologies such as XML81 can be used to both describe and provide a simple and flexible format, and as a subsequence, longer lifetimes for the data82.
A similar idea is, as digital archives are inherently software dependent, that the original software should be kept and, as technology changes, it should be run under emulation on the future systems; present systems also have a short physical life and as such cannot be maintained to run the software.77 Alternatively, instead of creating emulators of outdated software, software could be designed to run on some ‘universal virtual computer’ that would be standardized and maintained83.
In addition to the issues concerning storing the data, there is a more basic issue of what deserves to be stored. As stated above there are already archives that are focused on informal publications, the so called gray literature. What of the gray literature deserves to be archived? Is all scientific data pertinent to the future and worth the cost of storage; for example, will they play an important role in terms of deciding who is deserving of scientific accolades and/or intellectual property rights for results. Additionally, even within the so called formal literature, the peer reviewed articles, how many versions of an article deserve to be preserved, (e.g. pre reviewed or drafts in progress) and should they, like the final copy of an article be preserved indefinitely.
Finally, another issue that has to be dealt with prior to the establishment of an archive is that of ownership of the articles, and the underlying research results. Although we assume that scientific results and especially those funded by the governmental grants are intended for the public domain, this is often not the case. As a result of the Bayh-Dole Act84, universities have been encouraged to protect and profit from their research by exercising intellectual property rights. One present area where the idea of ownership for scientific fact is hotly debated is in regard to databases 85. With regard to the archive in particular the issue of who should own should own the copyright of the article continues to be debated.
The copyrighting of scientific articles, like the patenting of scientific results funded by government funds has been termed a “public taxation for private privilege”86. It goes against the spirit of the law “to promote the progress of Science and the Useful Arts” by limiting the dissemination of research results. The United States Supreme Court has already ruled some time ago in Universal v Miller that research results cannot be copyrighted. Still, a trend has developed over time for journals publishers to require that the authors sign over all their copyrights to the journal. Authors acquiesced to this Faustian bargain wherein they would hand over copyrights and in return receive affirmation that their work would be disseminated and protected in perpetuity 87. In 1996, Congress, in the National Information Infrastructure Copyright Protection Act (H.R. 2441, and S. 1284), considered expanding the rights of owners of copyrighted articles at the expense of the academic community88.
Recently it has been proposed that authors maintain their copyright, either through new legislation requiring the author of government funded research to do so89, 90, or through a grass roots campaign where the authors were encouraged to not sign over copyrights91, and in cases where they were forced to, to boycott the journal92. Alternatively, it has been suggested that the journals maintain copyrights only for a very limited time, after which the copyrights are transferred over to a central journal repository23. With the growing trend of more collaborative works of scientific research, practically, it has become significantly harder to even determine who has copyrights to what93.