Data often outlives the projects that create them. Increasingly funders request or require that their funding recipients create and follow plans for managing data, storing or preserving it for the future and making some or all of their data available on open access. It is accepted that not all data will be available for sharing due to commercial or ethical considerations so there may well be an opt-out clause in the funder’s policy.
Data Nightmare
What it is
Research data may be created by an individual researcher, group or contributed by a third party to a research group. Some examples are:
Observational: data captured in real time that is usually unique. Eg. Sensing data, survey data, field recordings, sample data: Experimental: data captured from lab equipment that is often reproducible-examples are gene sequences, magnetic field data: Models or simulations: data generated from test models where the model and metadata may be more important than the output data from the model - examples are climate models, economic models: Derived or complied: resulting from processing or combining “raw” data often reproducible: Reference or Canonical: a static or organic conglomeration or collection of datasets probably published and curated - examples are gene sequence databanks, collections of letters, historical images.
Examples of outputs to be considered
Documents, spreadsheets
Scanned laboratory notebooks, field notebooks, diaries
Online questionnaires, transcripts, surveys or codebooks
Digital audiotapes, videotapes or other digital media
Scanned photographs or films
Transcribed Test Responses
Database contents (video, audio, text, images)
Digital Models, algorithms, scripts
Contents of an application (input, output, log files for analysis software, simulation software, schemas)
Documented methodologies and workflows
Records of standard operating procedures and protocols
Why manage Data?
Data is a digital asset and needs to be managed so as
Having a good plan means you can find and understand your data when you need to, there is continuity if people leave or join the project, you avoid unnecessary duplication, you link your data and publications together and ultimately your research is more visible and has greater impact (data attracts citations). For the Sciences the expectation of the data is that it can be used for factual purposes and to validate the research. For the Arts & Humanities it is more about evidence for the production of new knowledge (this may very well be more subjective, ephemeral and tacit). If data is not managed properly, it can quickly become lost or unusable because of obsolete file formats, hardware etc.
REMEMBER: DO NOTHING + DATA = NO DATA