Skip to Main Content

Research Data Management: File Formats for Preservation

File Formats for Preservation

When publishing your data, it is important to consider the best file types for long-term preservation, accessibility and reusability. Where possible consider ‘open’ formats. 

“An open file format is a file format for storing digital data, defined by a published specification usually maintained by a standards organization, and which can be used and implemented by anyone. For example, an open format can be implemented by both proprietary and free and open source software, using the typical software licenses used by each. In contrast to open formats, closed formats are considered trade secrets. Open formats are also called free file formats if they are not encumbered by any copyrights, patents, trademarks or other restrictions (for example, if they are in the public domain) so that anyone may use them at no monetary cost for any desired purpose.” A list of open formats can be found on Wikipedia.  

Advantages of Open Formats  

  • In general, files must be opened in the software in which they were created; if someone does not have a licence to that software, they may be unable to use it. Open (non-proprietary) Formats are generally readable by more than one application.  

  • Often backed by robust communities or standards bodies and so more likely to be supported over time.  

  • Open Formats are preferred by repositories as they can be more easily batch upgraded, reducing the risk of obsoletion.  

Further Considerations for File Formats  

  • Use widely-used formats, and open formats as much as possible. 

  • Use ‘lossless’ file formats. A lossless file format preserves all original data without quality loss during compression and decompression, i.e., when data is compressed using a lossless method and then decompressed, the resulting file is an exact replica of the original, with no loss of information or quality, e.g., FLAC, PNG, Zip. 

  • Used standardised and documented formats that adhere to established specifications or standards and have comprehensive documentation detailing their structure, encoding, and usage, e.g., PDF. 

  • Supports metadata.

Librarian

This work is licensed under CC BY-NC-SA 4.0