Research data management
As a researcher you are responsible for efficiently and responsibly carrying out your research and for communicating the results of research effectively. Good data management can save time and give you peace of mind that your data is not at risk.
Before the project starts
When applying for grants and planning your project the Library can help you find information about your legal and professional obligations. Document your decision-making, including costings, for your grant application.
Review the data management and data sharing requirements of funding agencies, taking particular note of how compliance will be assessed and the consequences of non-compliance. Review the data management requirements of partner organisations, particularly commercial organisations.
Consider the expectations of researchers in your discipline and from other disciplines and how these might affect how you manage your data (including sharing, if possible).
Resources
Seek advice from Cross Connect or contact your Librarian.
Data can be an important research output in its own right as well as providing supporting evidence for published findings. In some disciplines, the availability of data has led to a quantifiable increase in the number of citations for a related publication. Internationally infrastructure and services are emerging that will support the citation of datasets.
When planning a project, consider:
- the audiences for your research and how they could make use of the data you will be collecting - is your work of interest to policy makers, not-for-profit agencies, the commercial sector or the general public, as well as to other researchers?
- the data management and data sharing requirements of journals you might publish in
- the availability of data journals for your discipline for publishing data outputs
- how you could use data to communicate your results more effectively - data in raw and visualised forms adds interest to your publications and conference presentations
- whether an institutional repository or subject repository can disseminate your data - these services assign your data a Digital Object Identifier (DOI) that will help with citation and impact tracking, and provide information about your data to search engines like Google Scholar and registries like Research Data Australia. (See for more details about depositing in repositories and archives).
Incorporate your data dissemination plans into the sections of grant application forms dealing with publication and research impacts and/or data sharing.
Resources
- A list of data journals (PREPARDE project, University of Leicester)
- Find Data (Australian National Data Service)
Seek advice from Cross Connect or contact your Librarian.
If possible, establish data management related costs and include these in the proposed budget of your grant application if the funding rules allow.
- Costing Tool: Data Management Planning (pdf: UK Data Archive)
Funders may request information about how you plan to manage data, either as part of a grant application or as a separate document. If the funder does not require a formal data management plan, you can record data planning information in an internal document, which should cover the following types of information:
- what types of data will be created
- who will own and have access to the data
- what facilities and equipment and methods will be used to capture and process the data
- where data will be stored during the project and after the project is completed
- how data could be shared or published and what conditions of re-use will apply
- who will be responsible for each of these activities.
All partners should be involved in the development and signoff of a data plan. You can also document your data planning in a variety of other places, including funding and collaboration agreements, ethics applications, and annual reports to funding agencies. Treat these documents as corporate records and retain and dispose of them appropriately.
Resources
- ARC Funding rules and grant guidelines
- Data Management Plans (Australian Research Data Commons ARDC)
Seek advice from Office of DVCR.
At the start of the project
Work out who owns the data and how long it needs to be kept. Ethical commitments and the consent you seek from your participants will affect what you can do with the data later, so consider potential data sharing and re-use scenarios well before data is collected and acquired.
For data that you create or collect, you need to:
- Determine what rights, including copyright, will subsist in the data produced by the project
- Establish who will be the rights holder/s for the data. As an SCU staff member, most data you produce will be owned by the University. As a Higher Degree by Research student, you would usually own the data you generate; however, there are certain cases in which you must assign your IP to the University that you should be aware of.
- Consider what terms and conditions should be applied to the data for re-use.
As an SCU staff member, you may have permission from SCU to re-use most of your scholarly works (including data) for research and teaching purposes and to make decisions about re-use using your professional judgment, subject to Part D of the University's Intellectual Property Policy. As a Higher Degree by Research Student, unless you have assigned IP to the University, decisions about the re-use and licensing of your data are yours to make.
Resources
- SCU Intellectual Property Policy
- Good research data practices (Australian National Data Service)
For data that you are sourcing from elsewhere, you need to:
- Establish the rights holder
- Establish the terms and conditions of re-use granted by the rights holder/s and assess whether your re-use fits within these. To establish the terms and conditions of re-use, you will need to:
- Find and keep a copy of any 'express permission' that the rights holder has given. This will usually be a licence or a set of standard terms and conditions that apply to the process by which you have obtained the data, such as downloading from websites and online data archives
- OR, if no express permission is given that enables you to establish terms and conditions of re-use, you must seek permission from the rights holder directly.
Resources
- SCU Intellectual Property Policy
- Good research data practices (Australian National Data Service)
Before the project starts, work out the minimum retention period for the data, using the table provided by NSW State Records Retention Periods in GA47.
To work out the maximum retention period, you need to consider the longer-term value of the data in light of the potential research impact and other factors, such as:
- the research would be difficult or impossible to repeat
- repeating the research would be burdensome for human participants or animals
- the results are of high public interest or contention
- methods or results constitute a paradigm shift for the field of inquiry, or
- the research will result in notifiable intellectual property (e.g. a patent application).
In addition to the data, you also need to retain any corporate records related to the research data that you are generating.
Resources
- Seek advice from Corporate Records Management on disposal and retention of research data.
When completing your ethics application consider data management, and in particular data sharing and re-use, in the context of privacy, confidentiality and consent, cultural sensitivity, and community-based research.
Be explicit in your ethics application about any plans you have to make data available to other researchers or more broadly. Describe your strategies for protecting privacy and confidentiality, e.g. by ensuring:
- that participants will not be identifiable, or
- that informed consent will be sought from participants for the proposed data re-use, or
- that access controls or re-use agreements will be in place.
Be explicit in your consent forms about any plans to make data available, who will be able to access the data, and how the data would be accessed and potentially re-used.
You may enhance your ability to share data later if you identify broad types of access rather than specific services that may be unsuitable or unavailable in the future. For example, saying that you will publish data 'through web-based institutional or subject archives or repositories' will give you more flexibility than if you specify a single repository or archive that may not be available in the future.
Resources
- Working with sensitive data (Australian National Data Service)
- Ethical obligations (UK Data Archive)
- AIATSIS Code of Ethics (Australian Institute of Aboriginal and Torres Strait Islander Studies)
Seek advice from the Office of the DVCR.
During the project
Collect data in formats that are long-lasting. The Library provides training in EndNote, and can help you with information about format obsolescence, digital data preservation and documentation organisation. Data that is organised and well-documented is easier to find and use. Regularly assess your options for storing your data and moving it around. If your data is lost, stolen or misused you will lose valuable work and damage your reputation as a researcher.
Assess the durability of the file formats you will use by considering if the format is:
- endorsed and published by standards agencies such as Standards Australia or ISO
- publicly documented, i.e. complete authoritative specifications are available
- the product of collaborative development and consultative processes
- widely used and accepted as best practice within your discipline or other user communities.
You should also assess the long-term accessibility of any hardware and software used to create and manipulate research data.
If you develop software as part of your research, follow available best practice guidelines for developing, releasing and licensing your software.
Resources
- Working with research data and Community-endorsed data standards (Australian National Data Service)
- Digital preservation and curation - the danger of overlooking software (Software Sustainability Institute, UK)
Seek advice from Technology Services if required.
Digital data
You should only store master copies of digital research data on:
- SCU systems e.g. Enterprise systems like O365 OneDrive
- SCU approved storage services for the Australian research sector
Consult Technology Services if you need advice about secure storage options. Technology Services can refer you to SCU storage experts and authorised off-site providers. Gathering the following information will help you explain your needs to Technology Services staff:
- current data volume - total size in Mb/Gb/Tb - and likely rate of growth
- number of files and folders, and how they are organised
- location of your workspace/s, e.g. office, lab, home, in the field
- platform - Mac / Windows / Linux
- applications used to access and work with your data
- frequency of update, e.g. working data that changes daily, or data from a completed project that needs to be retained but would not be used often
- data type/s: spreadsheets, database, documents, images, datasets, etc.
- any special security needs, e.g. clinical data, personal data, commercial potential
- access control: Who needs access? Are they from SCU? If not, are they based in Australia or overseas? At universities or at other types of organisations?
Desktop and laptop computers
You should not store master copies of digital data on individual desktop or laptop computers. You should treat these as convenient working areas but not as primary stores.
Local drives fail and are often not backed-up. Local machines are regularly replaced, upgraded, allocated to other people and stolen - data is at risk of being lost or inappropriately accessed.
If you store additional working copies on local computers, schedule automatic synchronisation and/or backups and password-protect and physically secure the machines.
Removable media
You should not store master copies of digital data on removable media like CDs and DVDs, flash memory devices (i.e. USB sticks), and portable hard drives. These are:
- not always long-lasting, especially if they are not stored correctly (CDs/DVDs)
- easy to damage physically (e.g. through magnetism or shocks)
- prone to errors in writing to the media ('burning')
- a risk in terms of data security - they are easy to misplace or lose, usually are not password-protected and are an easy target for viruses and malware.
If you store additional working copies on removable media, schedule automatic synchronisation and/or regular backups. You should password-protect and encrypt the media and ensure they are as physically secure as possible
Choose high-quality products, and follow the instructions provided by the manufacturer for care and handling, including environmental conditions and labelling.
Regularly check the media to make sure that they are not failing, and periodically 'refresh' the data (i.e. copy to a new disk, USB stick, or portable drive).
Cloud services
With the exception of the research sector and enterprise solutions noted above, you must not store research data using services that are provided or managed externally to SCU by third parties. The reasons for this include:
- Protection of intellectual property: Some cloud services assert their ownership of the intellectual property in anything that is uploaded by users.
- Legal requirements: Storage of data that contains personal information outside Australia could be a breach of the Privacy Act.
- Risk management: The Terms and Conditions of some cloud services state that they will take no responsibility for data loss and that they can withdraw the service at any time. There are also documented security breaches of many of these systems.
Resources
- Cloud computing and the privacy principles (Office of the Information Commissioner Queensland)
Seek advice from Technology Services if required.
Seek advice from Technology Services if required to transfer research data.
You should avoid using email for data transfer. Some of the limitations of email include:
- size restrictions - most institutions have limits on the size of emails and attachments (SCU MS Outlook service restricts you to 25MB)
- security risks - particularly if you are working with data that is personally or commercially sensitive and/or utilising personal accounts that may not meet legal and ethical requirements around privacy and confidentiality, and
- version control issues.
You should create and maintain sufficient documentation or metadata (i.e. structured information about the data) to enable research data to be identified, discovered, associated with its owners and creators, linked to other related data or publications, contextualised in time and space, and to have the quality of the data assessed and research results validated.
If you poorly document your data, it will be difficult (or impossible) to find it and manage it in the longer term. Even if you (or others, in future) can find the data, its value will be diminished if it is hard to interpret.
Practices will differ depending on your discipline, but you should always ensure that protocols are agreed early in the project and adopted by all researchers consistently.
File naming for digital files
Digital file names can be important for identifying and finding digital files. You should develop file naming conventions early in a research project, and agree on these with colleagues and collaborators before data is created.
Conventions will differ depending on the nature and size of a research project. In all cases, filenames should be unique, persistent and consistently applied, if they are to be useful for finding and retrieving data.
Identifiers
An identifier is a reference number or name for a data object and forms a key part of your documentation and metadata. To be useful over the long-term, identifiers need to be:
- unique - globally unique if possible, but at the very least unique within your particular systems and processes, and
- persistent - the identifier should not change over time.
The standard for publicly available datasets is the Digital Object Identifiers (DOIs). Although DOIs have been traditionally used for electronically published journal articles, they can now be assigned to datasets. SCU can assign a DOI to a collection that you make available through the institutional repository ePublicatons@SCU.
Controlled vocabularies
A vocabulary sets out the common language a discipline has agreed to use to refer to concepts of interest in that discipline. It models the concepts in a discipline by applying labels to the concepts and relating the concepts to each other in a formal structure.
Vocabularies take many forms. They include glossaries, dictionaries, gazetteers, code lists, taxonomies, subject headings, thesauri, semantic networks and ontologies.
Wherever possible, you should use an existing controlled vocabulary. Even if you need to adapt or customise an existing standard, this is preferable to creating something from scratch.
Towards the end of your project
By depositing data in a repository (or archive), you can make sure that your data can be accessed and cited in the long term.
Before depositing, you should consider the implications of doing so, in terms of ownership of intellectual property, and ethical requirements like privacy and confidentiality.
Repositories differ in their discipline focus and the types of research data that they accept. It is common for repositories to specify some or all of the following:
- preferred formats that facilitate long-term access and preservation
- minimum standards for documentation and metadata that enhance the discoverability and usability of the data
- assurances from you, as the depositor, that storing the data and making it available will not infringe upon the rights of others, and
- your assignment of a licence that makes clear what rights re-users are granted.
Identifying a suitable repository for your data and discussing requirements with the repository staff is a valuable part of data planning.
Cross Connect
SCU staff have access to an institutional data repository that is not discipline-specific. The service is run by the Library and is suitable for a wide range of data. You can upload research data and make it openly accessible. You can also use the data repository to record and showcase:
- data that is hosted elsewhere - you provide metadata about the collection and links to the hosting site, and
- data that is not available online but may be accessed through negotiation with the collection custodian - you provide metadata about the collection and an access statement that tells users how to negotiate access.
Data and metadata that you choose to share publicly can be cited by others, and will be discoverable through Cross Connect, Research Data Australia, Google, Google Scholar and other services that expose your research to new audiences and potential collaborators.
Library staff will assist you in assessing and describing your data set.
Consideration should be given to:
- technical complexities - e.g. large volumes, high dependency between files, requirement for specialised hardware or software
- risk management
- data sensitivity and de-identification
- the user community has special requirements about how data needs to be delivered.
Resources
Digital repositories (Digital Curation Centre, UK)
Seek advice from crossconnect@scu.edu.au if required.
Other digital data repositories
In many disciplines, national or international repositories are available to support the long-term access to research data.
Re3Data is a searchable directory of research data repositories. In September 2014, almost 1,000 data repositories were listed in Re3Data.
In deciding whether to deposit to a repository outside SCU, consider the sustainability of the service (in terms of staffing, funding arrangements, and support from its host institution) and assess its level of support for and within your discipline.
If you add a metadata record to Cross Connect or your SCU Profile that links to the other archive or repository holding your data, your collection can still appear on Cross Connect or your SCU Profile as one of your research outputs.
Resources
- Re3Data (website)
When you disseminate data that you own or manage, you need to think about how you want others to reuse it. It is your responsibility to communicate clearly the terms and conditions that you want reusers of your data to follow.
All rights reserved: relying on the Copyright Act
You can reserve all your rights under the Copyright Act. This means people can view and download a copy of your data for private research and study only. They must credit you as the creator, and potential reusers would need to seek your permission for any other type of activity, including re-publishing.
While reserving all your rights can be useful for publications, in the case of data it can limit the research impact of your work by restricting other researchers from undertaking common activities such as deriving data or aggregating your data with other datasets.
If your goals in disseminating your data are to facilitate the greatest reuse possible, then applying an open licence will be more effective than relying on copyright legislation.
Some rights reserved: standard open licences
For openly accessible data, a standard licence is the most effective way of ensuring appropriate reuse. An open licence lets you reserve some rights as the owner of the material, but grant reusers more rights than they would have just under copyright legislation.
Adopting a standard licence is often a pre-condition to depositing in a repository or archive, but licences can also be applied to resources disseminated via the web or other means.
SCU researchers are encouraged to consider using open licences. Licences enable you to clearly indicate to others your wishes about how the data can be reused and how you want to be attributed.
The Australian National Data Service recommends AusGOAL (the Australian Governments Open Access and Licensing Framework), which has been endorsed as the preferred policy and licensing suite for government information across Australia. AusGOAL is now officially being extended into the research and innovation sector. AusGOAL's core is a suite of six standard Creative Commons (CC) licences that give you a great deal of flexibility in expressing your wishes. A good principle to apply is to use the least restrictive licence that is applicable to your data collection. If you want your data to be as widely used as possible, the Creative Commons Attribution Only licence (CC-BY), would be the most useful for that aim.
Some rights reserved: restricted licences and custom reuse agreements
If you would like to make data available only under certain conditions or by negotiation, you can use a restrictive licence or other written agreement (such as a Data Transfer Agreement). You might consider this when data contains personal or other confidential information, or if you want to impose some other condition such as a time limit on use or some form of payment.
Agreements of this kind could be constructed from a model template or developed for you especially to meet the requirements of a specific project. Examples of this approach include:
A restricted licence provides you with more protection and enables you to be specific about terms and conditions, but it can also be time-consuming and, if legal advice is required, expensive.
No rights reserved: copyright waivers and public domain dedications
Some licences or agreements allow you to place your work in the public domain. When you apply these to your work, you waive all your rights and the protections offered by copyright, including the right to be credited as the creator.
You should think carefully before using a 'No rights reserved' licence. Standards and tools for data citation are emerging, and in future citation of data may be an important metric for research impact. Waiving your rights means that neither you nor SCU must be credited if data is reused.
If you are required by an archive or repository to use a copyright waiver or public domain dedication, you should find out whether any "community norms" statements can be applied: these will not be legally binding but can signal your wishes to potential reusers, where this is practical.
Resources
- How to Licence Research Data (Digital Curation Centre, UK)
- Data Creator Flowchart (Australian Research Data Commons)
- Creative Commons Australia (website)
Seek advice from copyright@scu.edu.au.
When the required retention period has come to an end, you may need to destroy data to meet ethical requirements or because you determine the data no longer has any value.
The destruction process must be irreversible, meaning that there is no reasonable risk that any information may be recovered later. You must take extra care when dealing with records that contain sensitive information.
Print materials should be shredded and pulped. For non-sensitive materials, office shredders can be used. For sensitive materials, order a confidential waste bin through Facilities.
Data in digital formats must be processed so the information is irretrievable. These processes can include deleting or overwriting information, purging magnetic media through degaussing (exposure to a strong magnetic field), or destroying the physical media (e.g. CD-ROMS, DVDs).
Resources
Seek advice from the Corporate Records Unit if required.
You must not remove master copies of any working data that belongs to the University or to a third party with which the University has an agreement. The University's IP policy allows you to take a copy for teaching and research purposes; if you intend to use the data for other purposes (e.g. commercial), this should be agreed in writing with the Head of School or Research Centre.
Before leaving the University, you should arrange access for at least one other researcher or your Head of School or Research Centre to the data and any documentation relating to it.
Copies of completed data that you have deposited in the Cross Connect can remain in the care of the University. They will continue to be found and cited using the Digital Object Identifier (DOI) assigned to the collection at the time of publishing.
You must remove from University systems any working data that belongs to you. On leaving SCU, it is your responsibility to ensure this data is stored and managed correctly, that the privacy and confidentiality of the data is kept intact, and that the data is deposited or disposed of appropriately at the end of the retention period.
Adapted from Best practice guidelines for researchers: Managing research data and primary materials by Griffith University which is licensed under a Creative Commons Attribution 4.0 International License.