Skip to Main Content

Data Management

This guide deals with the management of research data

Issues

There can be issues with managing datasets and how you go about it may depend on the discipline. Generally, the following are areas to be aware of: Ownership, Sensitive data/GDPR, File Formats, Versioning, Identifiers, Licensing, Data citation, Security.

Ownership

The data creator is normally the data owner. However, data ownership must be clearly established in collaborative groups and this should be done at the beginning of the research project. It is important to behave ethically and follow the community norms. It is equally important to attach a license to the data which clearly states the conditions of use and reuse.

Sensitive data

This is data that can identify an individual, species, object that introduces a risk of discrimination, or unwanted attention. When dealing with human subjects, it is important that the researcher obtain “informed consent”. This means that the subject understands the need for the data, how it will be anonymised, how it will be used and importantly assurance that the data will only be used for the specific project in hand. Access to the data must be controlled and the way this is to be handled should be clearly stated in the data management plan. Remember data identification can be obvious such as names and address but also indirect as is salary, gender, workplace, sexual preference. When using sensitive data be general rather than specific and remove excessive detail

General Data Protection Regulations

The GDPR establishes certain rights for data subjects. These are the right to be informed, the right to access the data, the right to rectification, the right to erasure, the right to restrict processing, the right to data portability, the right to object to the use of the data and rights in relation to automated decision making and profiling.

File Formats

Aim for maximum accessibility so text in .txt format, column data in .csv, images in .jpg, .png, .tiff. It is best to avoid proprietary software as this may become obsolete. Also, be aware that word and excel files can behave differently if opened in other applications. Make the analysis code available in its source format and plaintext and ensure it is not wrapped in executables.

Files will need a Readme file that tells the user of the data what to do (Guidelines for Creating a Readme File). For example how do they run the analysis, what do the label on the columns in the excel file mean? What are the expected outputs? How do you know as the user knows if it has worked or performed correctly?

Versioning

A new version is created when the data is reprocessed, corrected or new data is added. Use versioning to track the changes as in v1, v2 etc. Decide how many versions to keep identifying the milestone versions. Identify the different version with a systematic naming convention as in numbers and dates. Find a specific location or locations for the storage of the master and milestone versions.

Identifiers

Assign a doi (digital object identifier) to the data. These can be obtained from the library. This is like a bar code for the dataset and will uniquely identify it. DOI’s are essential for interoperability enabling computer systems to talk to each other.  This is an example of a doi.

 

 

Licensing

Facts and data are not protected by Copyright law but metadata and data arrangements sometimes are so for example if the data is part of a database package it will be protected under the copyright of the creator of the database. So do not rely on copyright but make sure the data has a license attached to it. A license stipulates the conditions under which the data can be used and reuse. It is preferable to use an open access license as it facilitates discovery and sharing. Creative Commons

https://creativecommons.org/licenses

has a set of licenses that can be used which is based on attribution of the creator/author.

The main licenses are

       

This license lets others use, reuse and build on your work even for commercial purposes. All new works based on your work will have this license. This is the most open of the Digital Common licenses.

This license lets others remix, tweak and build on your work non- commercially. Others must attribute the creator and be non-commercial. However, they do not have to license derivative works as non-commercial.

This is the most restrictive license only allowing others to download your works and share them with others as long as the credit the creator but the work cannot be changed in any way or used in a commercial way.

This license lets others remix, tweak and build upon your work non-commercially and although their new works must acknowledge you and be non-commercial, they do not have to license their derivative works on the same terms.

This license lets others remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creations under the identical terms.

This license is the most restrictive of the six main licenses, only allowing others to download your works and share them with others as long as they credit you, but they can’t change them in any way or use them commercially.


A CCO license puts material into the public domain which means the creator waives all rights to the data.

There is a search tool https://search.creativecommons.org/on the Creative commons site https://creativecommons.org/licenses/that will help you to decide on the appropriate license

Data Citation

A data citation should contain the following:

Name/creator(s), Title, doi (digital object identifier), date published and version number if appropriate, date and time it was accessed and the name of the distributor (if you were citing something access on Arrow, TU Dublin would be the distributor).

Correct citation will enable discoverability and tracking of your citations where the doi is invaluable.
 

Security

Your data is a digital asset and should be protected at all times. Keep multiple backups in multiple locations (on site and off site). The data management plan should name the person responsible for securing the data. One safe way is to place the data on Arrow with an embargo date so the files will not be released until a specified data. Remember usb keys and services such as Dropbox are not secure.This license lets others reuse the work for any purpose including commercial. However, it cannot be shared with others in the adapted form and credit must be given to you as the author.