Data archiving involves the systematic and secure storage of research data for long-term preservation. The focus of data archiving is maintaining the integrity of the data over a long period of time, organizing it in a structured manner, and often adhering to specific standards for documentation and formatting. It is crucial for maintaining a historical record of research and supporting future analyses or meta-analyses. Archiving data should be done regardless of whether the research data is suitable for open publication.
While data publishing and data archiving are related concepts, they refer to different aspects of managing research data.
Data publishing involves making research data publicly accessible, typically through repositories or other platforms, e.g. GitHub, Zenodo. It can be done at any stage of the project, can be completed in phases and can encompass multiple dataset outputs. This process includes sharing datasets along with associated metadata, documentation, and sometimes analysis scripts. Data publishing is a key part of the open research landscape, emphasizing transparency and collaboration.
Data archiving is generally done at the end of the project and involves a transfer of all data and related materials to a suitable archiving facility.
At present TU Dublin does not offer a data archiving service, but the Open Research Support Unit can provide advice on suitable archives on request.
In summary, data publishing is about sharing data openly with the community, while data archiving is about preserving and maintaining the data for the long term, ensuring its availability over time. Both are essential components of responsible and transparent research practices.
Considerations for Data Archiving
Provide comprehensive metadata describing the dataset, e.g. README file.
Store data in standardized, non-proprietary formats to ensure long-term compatibility and accessibility, and the ability for archives to batch upgrade to avoid obsoletion.
Organize data in a clear and logical structure with well-defined directories and file naming conventions.
Develop a long-term preservation plan, including considerations for migrating data to new formats.
When publishing your data, it is important to consider the best file types for long-term preservation, accessibility and reusability. Where possible consider ‘open’ formats.
“An open file format is a file format for storing digital data, defined by a published specification usually maintained by a standards organization, and which can be used and implemented by anyone. For example, an open format can be implemented by both proprietary and free and open source software, using the typical software licenses used by each. In contrast to open formats, closed formats are considered trade secrets. Open formats are also called free file formats if they are not encumbered by any copyrights, patents, trademarks or other restrictions (for example, if they are in the public domain) so that anyone may use them at no monetary cost for any desired purpose.” A list of open formats can be found on Wikipedia.
Advantages of Open Formats
In general, files must be opened in the software in which they were created; if someone does not have a licence to that software, they may be unable to use it. Open (non-proprietary) Formats are generally readable by more than one application.
Often backed by robust communities or standards bodies and so more likely to be supported over time.
Open Formats are preferred by repositories as they can be more easily batch upgraded, reducing the risk of obsoletion.
Further Considerations for File Formats
Use widely-used formats, and open formats as much as possible.
Use ‘lossless’ file formats. A lossless file format preserves all original data without quality loss during compression and decompression, i.e., when data is compressed using a lossless method and then decompressed, the resulting file is an exact replica of the original, with no loss of information or quality, e.g., FLAC, PNG, Zip.
Used standardised and documented formats that adhere to established specifications or standards and have comprehensive documentation detailing their structure, encoding, and usage, e.g., PDF.
Supports metadata.
This work is licensed under CC BY-NC-SA 4.0