Guidelines for documenting data
Why is Good Documentation Important?
The maintenance of comprehensive documentation detailing
the data creation process and the steps taken involves a significant
but profitable investment of time and resources. It is more
effective if documentation is generated during, rather than
after, a data creation project. Such an approach will result
in a better quality data collection, as well as better-quality
documentation, because the maintenance of proper documentation
demands consistency and attention to detail. The process of
documenting a data creation project can also have the benefit
of helping to refine research questions and it can be a vital
aid to communication in larger projects.
Good documentation is crucial to a data collection's long-term
vitality: without it, the resource will not be suitable for
future use and its provenance will be lost. Proper documentation
contributes substantially to a data collection's scholarly
value. The elements essential to good documentation are described
below. At a minimum, documentation
should provide information about a data collection's contents,
provenance and structure, and the terms and conditions that
apply to its use. It needs to be sufficiently detailed to
allow the data creator to use the resource in the future,
when the data creation process has started to fade from memory.
It also needs to be comprehensive enough to enable others
to explore the resource fully, and detailed enough to allow
someone who has not been involved in the data creation process
to understand the data collection and the process by which
it was created.
A description of the contents of the data collection should
be provided in sufficient detail to allow any potential user
to assess whether it is suitable for their needs. This factual
description should include, where applicable:
-
Title, which describes the contents and gives an indication
of the temporal and geographic coverage
-
Main types of information it contains
-
Strengths and weaknesses
-
Time period(s) covered, including details of any data
which only partially cover the time period
-
Periodicity of the data collection (e.g. monthly, annual,
decennial)
-
Name(s) of the country, region, county, town or village
covered. If the names or the administrative units were
different during the time period covered by the data collection,
document those names or administrative units and their
present-day equivalents
-
Types of spatial units that can be used to analyse the
data collection
-
Language(s) used
The provenance of a data collection needs to be documented
in detail. This information should include how, why, when
and by whom the data collection was created and used.
Who created the data collection and why?
A data collection's intellectual context should be documented
thoroughly enough to enable someone who has not been involved
in the project to understand the intellectual framework in
which it was created. This information should include:
-
Other title(s) and reference number(s) that have been
used to identify the data collection during the data creation
process
-
Name(s), affiliation(s) and role(s) of all the individual(s)
or organisation(s) who have been involved in the data
creation process
-
Names of any organisation(s) or individual(s) that funded
the creation of the data collection, with grant numbers
and titles where appropriate
-
Description and history of the research project (or other
process) which gave rise to the data collection, including
the main aims, objectives and topics of research
-
Description and history of how the data collection has
been used
-
Bibliographic references for any publications based upon
or about the data collection
-
Bibliographic references to any related data collections
How was the data collection created?
The way in which a data collection was created should be
described in sufficient detail to allow any potential user
to understand the steps that were taken. This information
should include:
-
How and why the methods used and the structure and format
of the data collection were chosen
-
Hardware and software used to create the data collection,
and whether it has at any point been converted to new
systems or formats
-
Dates relating to the creation of the data collection,
including any dates when it was significantly amended
Which sources were used to create the data collection?
Detailed information about the source(s) used to create the
data collection should be provided so that any user can trace
the data collection back to its original source(s) and understand
the relationship between the data collection and the source(s).
This information should include:
-
List of sources, including archival or bibliographic
references.
-
Purpose, scope, content, provenance, administration and
history of the source(s), including any unusual or inconsistent
features such as the destruction or separation of parts
of the source
-
Bibliographic references to works that describe the source(s).
-
Details of how the source(s) have been converted to digital
form, including: completeness of transcription, sampling
and selection methods, standardisation procedures, and
the use of mark-up, classification and coding schemes
-
Details of the relationship between the data collection
and the source(s) including a photocopy or image of each
source, with an example showing how it is represented
in the data collection
It is essential that the structure, form and organisation
of a data collection be described fully. This information
should include:
-
List of files and tables
with information about their contents, number of records
and fields,
and the way in which they relate both to each other and
to the source
-
List of field names used in each file with information
about the characteristics of each field, including name,
contents, field length, data
type and any codes
used, and information about the way in which the fields
relate to each other and to the source, including details
of derived variables
-
Format of the data collection, including the delimiters
used in delimited
ASCII files
It is important that all the terms and conditions that apply
to the use of a data collection are fully documented. In particular,
copyright and other intellectual property rights must be clearly
established, and the name(s) of the copyright holder(s) both
for the data collection and for the original source material
must be specified. If the collection was created during your
work as an employee, the copyright holder will normally be
your employer under your contract of employment. In particular,
give full details if copyright is held jointly, if there are
multiple copyrights, or if the collection is covered by Crown
copyright. For further information about copyright see the
AHDS Copyright
FAQ.