Announcing the UK Data Service - what you need to know
 
| HDS | Home | A-Z index | Site map | Contact |Search site
blank space

Guidelines for documenting data

Why is Good Documentation Important?

The maintenance of comprehensive documentation detailing the data creation process and the steps taken involves a significant but profitable investment of time and resources. It is more effective if documentation is generated during, rather than after, a data creation project. Such an approach will result in a better quality data collection, as well as better-quality documentation, because the maintenance of proper documentation demands consistency and attention to detail. The process of documenting a data creation project can also have the benefit of helping to refine research questions and it can be a vital aid to communication in larger projects.

Good documentation is crucial to a data collection's long-term vitality: without it, the resource will not be suitable for future use and its provenance will be lost. Proper documentation contributes substantially to a data collection's scholarly value. The elements essential to good documentation are described below. At a minimum, documentation should provide information about a data collection's contents, provenance and structure, and the terms and conditions that apply to its use. It needs to be sufficiently detailed to allow the data creator to use the resource in the future, when the data creation process has started to fade from memory. It also needs to be comprehensive enough to enable others to explore the resource fully, and detailed enough to allow someone who has not been involved in the data creation process to understand the data collection and the process by which it was created.

Guidelines for Documenting a Data Collection

Contents

A description of the contents of the data collection should be provided in sufficient detail to allow any potential user to assess whether it is suitable for their needs. This factual description should include, where applicable:

  • Title, which describes the contents and gives an indication of the temporal and geographic coverage

  • Main types of information it contains

  • Strengths and weaknesses

  • Time period(s) covered, including details of any data which only partially cover the time period

  • Periodicity of the data collection (e.g. monthly, annual, decennial)

  • Name(s) of the country, region, county, town or village covered. If the names or the administrative units were different during the time period covered by the data collection, document those names or administrative units and their present-day equivalents

  • Types of spatial units that can be used to analyse the data collection

  • Language(s) used

Provenance

The provenance of a data collection needs to be documented in detail. This information should include how, why, when and by whom the data collection was created and used.

Who created the data collection and why?

A data collection's intellectual context should be documented thoroughly enough to enable someone who has not been involved in the project to understand the intellectual framework in which it was created. This information should include:

  • Other title(s) and reference number(s) that have been used to identify the data collection during the data creation process

  • Name(s), affiliation(s) and role(s) of all the individual(s) or organisation(s) who have been involved in the data creation process

  • Names of any organisation(s) or individual(s) that funded the creation of the data collection, with grant numbers and titles where appropriate

  • Description and history of the research project (or other process) which gave rise to the data collection, including the main aims, objectives and topics of research

  • Description and history of how the data collection has been used

  • Bibliographic references for any publications based upon or about the data collection

  • Bibliographic references to any related data collections

How was the data collection created?

The way in which a data collection was created should be described in sufficient detail to allow any potential user to understand the steps that were taken. This information should include:

  • How and why the methods used and the structure and format of the data collection were chosen

  • Hardware and software used to create the data collection, and whether it has at any point been converted to new systems or formats

  • Dates relating to the creation of the data collection, including any dates when it was significantly amended

Which sources were used to create the data collection?

Detailed information about the source(s) used to create the data collection should be provided so that any user can trace the data collection back to its original source(s) and understand the relationship between the data collection and the source(s). This information should include:

  • List of sources, including archival or bibliographic references.

  • Purpose, scope, content, provenance, administration and history of the source(s), including any unusual or inconsistent features such as the destruction or separation of parts of the source

  • Bibliographic references to works that describe the source(s).

  • Details of how the source(s) have been converted to digital form, including: completeness of transcription, sampling and selection methods, standardisation procedures, and the use of mark-up, classification and coding schemes

  • Details of the relationship between the data collection and the source(s) including a photocopy or image of each source, with an example showing how it is represented in the data collection

Structure

It is essential that the structure, form and organisation of a data collection be described fully. This information should include:

  • List of files and tables with information about their contents, number of records and fields, and the way in which they relate both to each other and to the source

  • List of field names used in each file with information about the characteristics of each field, including name, contents, field length, data type and any codes used, and information about the way in which the fields relate to each other and to the source, including details of derived variables

  • Format of the data collection, including the delimiters used in delimited ASCII files

Terms and conditions

It is important that all the terms and conditions that apply to the use of a data collection are fully documented. In particular, copyright and other intellectual property rights must be clearly established, and the name(s) of the copyright holder(s) both for the data collection and for the original source material must be specified. If the collection was created during your work as an employee, the copyright holder will normally be your employer under your contract of employment. In particular, give full details if copyright is held jointly, if there are multiple copyrights, or if the collection is covered by Crown copyright. For further information about copyright see the AHDS Copyright FAQ.


History Data Service > History > Create
 
_
  Valid XHTML 1.0!
  Page last updated 25 March 2008
© Copyright 2003-2012 University of Essex. All rights reserved.
Contact   |    Copyright, disclaimer and privacy policy    |    Accessibility
Link to University of Essex Link to JISC Link to ESRC