HDS logo (sidebar) titlebartitlebar-curveHomeSite MapSearch CatalogueSite SearchContact Usend-menu
About Us
Accessing Data
Depositing Data
Creating Data
Projects
News and Events
Advice and Training
Staff
Great Britain Historical Database Online
Website email
Address
AHDS logo/link
 

Discovering Digital Resources

Two Approaches to Data Description

Hans Joergen Marker

Data description is a prerequisite for the preservation of machine-readable data. Information description is also a necessary part of archival activity. In order to be able to retrieve information, you need to describe it. Thus the data archives and the general archives both need the means for description of data and the problem has been addressed in both settings. The solutions proposed are very different, and thus worthwhile to investigate closer.

The Standard Study Description

The Standard Study Description originates from the Social Science Data Archive tradition. It was developed in the early seventies. The data that it was intended to describe was flat file social science data, especially survey data. The mothers of the SSD were very conscious of the available tools for storage and retrieval of information. As these tools were fairly crude at the time the SSD is a rather ugly thing. Appearance alone is not enough to judge the SSD on. The SSD is basically a file structure upon which applications can be build. The applications need not to be ugly because the systems files are. The information to be provided in the SSD is divided into items which in turn have numbers. Most of the items are codes with a finite and often small number of allowed code-values. The emphasis on coding was not only due to traditions of social science data processing, but also a deliberate attempt to make the study description independent of language. In the SSD the unit being described is a data material, which is supposed to consist of one rectangular file with certain characteristics. If a research project result in a data collection which is organised in another way some items of the SSD becomes difficult to fill in (for instance 211 and 212).

As the SSD is a real life thing which is actually being used, it is not surprising that the items of the SSD are use somewhat differently at different archives. The many items being offered are not all ways relevant to particular studies and some items though perhaps relevant are rarely used because archive practices just does not work that way. Not many study descriptions have the 500-items filled in, even though most studies have the background variables sought for in this group of items. Viewing the collection of data at a data archive as consisting of separate items (studies), which are distinct and can be handled and described separately in a similar manner is very much in line with traditional data archive thinking. Actually it forms the backbone of data archive practices. It was very true at the time when the SSD was conceived. But this point of view has since been challenged by reality. Essentially the challenge has been met with work-arounds such as for instance splitting the data produced by a particular research project in to several studies.

A real life example with some problems

In the years 1971 to 1973 the Danish historian Hans Christian Johansen made a study of rural demography in the 18th century (1). The source for this study was parish registers from the period 1741 to 1801 and the census-lists from 1787 and 1801 from 26 selected country parishes. The information from these sources were key punched and a family reconstitution was carried out. The data from this study was deposited at the DDA on May 7th 1976. These data were far from the single rectangular file surveys for which the SSD (and data archive practices in general) was designed. The work-around employed was to construct six studies:

DDA-0101: Christenings in Selected Rural Parishes, 1741-1801,

DDA-0102: Burials in Selected Rural Parishes, 1741-1801,

DDA-0103: Weddings in Selected Rural Parishes, 1755-1801,

DDA-0106: Reconstituted families in Selected Rural Parishes, 1741-1801,

DDA-0181: Census Register for Selected Rural Parishes 1787,

DDA-0182: Census Register for Selected Rural Parishes 1801.

The resulting SSD's can be obtained from the DDA-web-site (http://www.dda.dk/). Needles to say these SSD s show some duplication. All the background stuff about principal investigator, conduction of the study, deposition at the archive, publications and accessibility is completely identical in the six study descriptions. Usually we do things a little differently today than we did 20 years ago, but the example illustrates the difficulties involved in applying the SSD very nicely.

The General International Standard Archival Description

The ISAD(G)(2) is much younger than the SSD, and thus it is designed with much more advanced data processing in mind. Consequently it is not nearly as ugly as the SSD. The ISAD(G) has been developed in the early nineties by an ad hoc commission sponsored by UNESCO and various national archives. The ISAD(G) is aimed at archival material in general and not as the SSD specifically directed toward machine-readable files. The ISAD(G)(3) is a description hierarchy based on four rules. Within the framework of these rules a number of Elements of description are applied. The consequence of this approach is that the archive holdings are viewed as a hierarchy in the extreme case with the entire archive on top and single items of information in the bottom. Between top an bottom the divisions of the hierarchy are the levels of description. In principle all element of description could be used on all levels. A particular bit of information is given at the lowest level where it does not result in duplication.

So what about the rural parishes in ISAD(G)

I haven t actually made a full ISAD(G) description of the data from Hans Christian Johansen s study of rural demography. I would propose a description on two levels study and file, disregarding the variable level in this example. Another possible level of description would be parish. The information about study, principal investigator, accessibility and references goes on the study level. Description of sources and file characteristics goes on the file level.

Documentation in the next millennium

Presently a lot of SSDs exist. In the DDA alone we have a couple of thousands. Having a lot of old stuff is a reason for resistance to change. Furthermore the SSD is still a recognised standard for description in the data archive community. This makes exchange of descriptions feasible. Work on a new data description standard is ongoing. This work is intended to take into account the advances in technology over the last quarter century. The results of that work are awaited in breathless suspense.

Items in the SSD

General information

001 Status of the study in the data archive

002 Classification of the study in cluster(s)

003 Relevant keywords for the study

004 Language employed in the present study description

005 Abstract of the study description

Identifications and acknowledgements

101 Bibliographical reference

111 Local data archive where the study is stored

112 Data archive where the study was originally stored

121 Depositor (donor)

122 Data of deposit

131 Principal investigator (Research organisation)

132 Data collector

141 Research initiator

142 Funding agency

199 Other identifications/Acknowledgements (Specify):

201 Research Topic (Abstract)

202 Kind of data

211 Units of observation

212 Number of units (Cases)

213 Dimensions of the dates

214 Completeness of the study stored

220 Time period covered

221 Time dimensions

222 Definition of total universe (Universe sampled)

223 Sampling procedures

225 Geographical area covered

231 Dates of data collection

232 Method of data collection

233 Type of research instrument

234 Actions to minimise losses (Specify)

235 Data gathering staff

236 Characteristics of data collection situation noted

241 Weighting

299 Other analysis conditions

Re-analysis conditions

301 Present data representation

302 Applicable analysis packages

303 Applicable retrieval systems

304 Information stored in retrieval system

305 Classification of scheme applied

311 Language(s) of written material

321 Control operations performed by original investigator

322 Control operations performed by data archive

331 Accessibility

332 Access directing authority

399 Other re-analysis conditions

References to relevant publications/results/studies

401 to 409 Publications/reports by the primary investigator

411 to 419 Other publications (Secondary analysis)

421 to 429 Unpublished papers/reports of interest

431 Results of analysis (Scales, indices etc.)

441 References to related studies

499 Other references (Specify)

Background variables included

501 Basic characteristics

502 Place of birth

503 Residence

504 Housing situation

511 Household characteristics

512 Characteristics of parental family/household

521 Place of work

522 Occupation

531 Income

541 Education

546 Social class

551 Politics

556 Religion

561 Capital assets

562 Consumption of durables

571 Readership, mass media and 'cultural' exposure

576 Organisational membership

599 Other background variables included (specify)

General International Standard Archival Description - ISAD(G)

Rules

2.1. Description from the general to the specific

2.2. Information relevant to the level of description

2.3. Linking of descriptions

2.4. Non-repetition of information

Elements of description

3.1 Identity Statement area

3.1.1 Reference codes

3.1.2 Title

3.1.3 Dates of creation of the material in the unit of description

3.1.4 Level of description

3.1.5 Extent of the unit of description

3.2 Context area

3.2.1 Name of creator

3.2.2 Administrative/Biographical history

3.2.3 Dates of accumulation of the unit of description

3.2.4 Custodial history

3.2.5 Immediate source of acquisition

3.3 Content and structure Area

3.3.1 Scope and content / abstract

3.3.2 Appraisal, destruction and scheduling information

3.3.3 Accruals

3.3.4 System of arrangement

3.4 Conditions of access and use Area

3.4.1 Legal Status

3.4.2 Access conditions

3.4.3 Copyright / Conditions governing reproduction

3.4.4 Language of material

3.4.5 Physical characteristics

3.4.6 Finding aids

3.5 Allied material area

3.5.1 Location of originals

3.5.2 Existence of copies

3.5.3 Related units of description

3.5.4 Associated material

3.5.5 Publication note

3.6 Note area

3.6.1 Note

Footnotes

1. Johansen, Hans Chr. Befolkningsudvikling og familiestrucker i det 18. Erhundrede. Odense University Press; 211 pp. 1975

2. General International Standard Archival Description

3. Here and in the following the description of the ISAD(G) is based on: "ISAD(G): General International Standard Archival Description", Ottawa 1994. Unfortunately, the ISAD(G) is not stable yet, and if you obtain a later version than the one quoted here, some items may have been added and others altered.

 
Copyright & Disclaimer