HDS logo (sidebar) titlebartitlebar-curveHomeSite MapSearch CatalogueSite SearchContact Usend-menu
About Us
Accessing Data
Depositing Data
Creating Data
Projects
News and Events
Advice and Training
Staff
Great Britain Historical Database Online
Website email
Address
AHDS logo/link
 

Discovering Digital Resources : a Workshop for Historians Draft Workshop Report

A report from the resource discovery workshop organised by the History Data Service, and held at the University of Essex, 18th and 19th April 1997

1. Introduction and Overview

This report summarises the findings of a workshop organised by The History Data Service (HDS), and held at the University of Essex in April 1997. The workshop was one of a series organised under the auspices of the Arts and Humanities Data Service (AHDS) and the United Kingdom Office for Library & Information Networking (UKOLN), which will feed into the development of the AHDS Integrated Catalogue.

The aim of the workshop was to explore and assess the information requirements of the historical community for discovering digital resources. The participants were asked to define a set of information criterion for describing and searching for historical electronic materials, and assess the suitability of the existing HDS catalogue records and search facilities against the agreed criterion.

The workshop participants came from a wide range of backgrounds, representing three primary groups of stakeholders: data creators; actual or potential secondary users; archivists and others working with historical data materials in related fields. Prior to the workshop, participants were supplied with a number of introductory papers, which were intended to place the workshop in its wider context.

A provisional timetable was drafted for the meeting with the proviso that it would be subject to alteration as the issues arose and the discussion developed. The workshop commenced with a series of introductory talks designed to set the scene for the workshops; Sheila Anderson presented an overview of the aims and objectives for the workshops; Pam Miller described the existing HDS catalogue system and how this adhered to the Standard Study Description standard; Cressida Chappell introduced the Dublin Core; Hans-Joergen Marker summarised his paper describing the General International Standard for Archival Description, ISAD(G). Participants were asked to consider the type of searches they were likely to undertake and the type of search strategies they might use to identify electronic material. This session set the scene for the development of the range and type of information the participants might wish to search for.

Participants moved on to identify the key elements which they wished to search on and the elements which they would wish to retrieve information on in the full catalogue record. Identifying the key elements was a time-consuming and difficult process. Participants experienced some difficulty in separating out the elements to be used for searching as opposed to those suitable for retrieval. They also regarded design issues as key issue in the development of a useable and successful catalogue. They recommended that the interface to the catalogue should be simple and effective and offer a multi-level approach to searching. They recommended that the first step should be limited to a small number of elements and then offer the ability for further, more sophisticated searches on an additional range of material. The workshop recommended that these issues are taken into consideration in the development of the catalogue.

After a limited discussion, participants indicated that they were happy to accept the existing standard used by the HDS (the SSD) and did not recommend changing to an alternative standard. However, they did highlight some of the weaknesses in the SSD and recommended a number of extensions which the HDS will take forward in its own development work. Using the elements identified and the elements included in the SSD, the participants proceed to map these against the Dublin Core in order to asses its viability for the basis of the AHDS catalogue. The conclusion reached by the workshop is that the Dublin Core is a workable base for the AHDS catalogue with some alterations and extensions. We are confident that the results of the workshop can feed into the development of the AHDS catalogue and ensure that the needs and requirements of the historical community will be catered for.

2. Resource Discovery Requirements

2.1 Relevant Standards

The workshop assessed the two main standards relevant to the work of the HDS - the Standard Study Description (SSD) and the General International Standard Archival Description (ISAD(G)). The SSD was developed by the Social Science Data Archives, in the 1970s specifically for machine-readable files. ISAD(G), which was developed during the early 1990s by an ad hoc commission sponsored by UNESCO and various national archives, is intended to describe archival material and is not specifically intended for machine-readable files. The existing HDS catalogue is based upon the Standard Study Description Scheme, modified for use with historical data files. Following the introductory sessions participants were asked to assess the two standards for describing historical data files. Participants concluded that the existing standard used by the HDS - the SSD - met their needs, although they did recommend some minor changes and additions.

2.2 Search Elements

After an initial discussion, the participants identified four elements as essential to the discovery of historical electronic materials - source, geography, time period and topic. Following further discussion, title and person/organisation were added to the list of essential elements. Participants had been asked to distinguish between elements for searching and elements of information that they would wish to see when the catalogue record was retrieved. However, it became clear that this would be very difficult. Therefore, what follows - although primarily listing the elements upon which the participants wish to be able to search on in the AHDS catalogue - also includes a range of information that may be included for retrieval purposes.

2.2.1. Title

The title function would simply involve a search of the title field. This was agreed to be useful in instances where a searcher knew the title of the dataset or may wish to search for datasets with particular words in the title. This element may be of particular use for librarians and others working in the field of information searching and provision.

2.2.2. Person/organisation

The person/organisation function would entail a search of all the fields in catalogue record, which contain the names of person(s) or organisation(s) associated with a dataset. This field would be of relevance when the searching for datasets created by a particular person or organisation.

2.2.3. Time period

This function would allow for the recovery of all datasets which cover a given year, or period of years. There was a recommendation that the time period should, where possible, be linked to the source. For example, where a dataset has been created from multiple sources, the time span covered may be a significant period of time but the time period covered by each individual source may not cover the entire period; the participants were keen that this should be obvious when looking at the catalogue record. In addition, participants expressed a desire to include information on periodicity where this is appropriate to the sources used.

2.2.4. Geographical location

This function would allow for the recovery of all datasets which cover a given place at a sufficient level of detail. Two functions were highlighted - the requirement to search on a place name, preferably within a hierarchical thesaurus and second, the ability to search and retrieve information on the lowest spatial unit to which the data may be dis-aggregated. Thus a search for Essex should recover all datasets which are indexed by the term Essex, plus all datasets which are indexed by places within Essex, and all datasets which include county level data and are indexed by a higher level index term.

2.2.5. Source

This function would record details of the source or sources from which the dataset was created. Participants wanted to be able to search for source with a multi-level approach, form a generic level i.e. census records down to the specific reference number relevant to the particular source. Thus searches would range from the very general, for example, a search for taxation records, to the very specific, for example, a search for a PRO or other archival reference number.

2.2.6. Topic

This function would highlight the central topics relating the dataset. The participants recommended that it should incorporate a freetext search of the entire catalogue record.

The workshop participants expressed great interest in the design of the information retrieval system. They favoured a system which would combine both simple and complex interfaces. The simple interface would offer a limited number of search options, and the more complex interfaces would offer a wide range of search options which would encompass the full range of fields in the SSD.

2.3 Retrievable Information

The participants recommended that the Standard Study Description scheme as used by the HDS is sufficient to enable searchers to identify historical datasets of interest. However, the participants suggested that it might be further improved by including the following extensions:

1. More information about the relationship between sources and datasets, in particular the level of transcription and the amount of coding

The workshop participants felt that, since sources are crucial in the context of the historical disciplines, there should be more information about the sources and the relationship between sources and datasets. The following types of information were identified as being particularly useful: information about archival reference numbers; information about the level of transcription and compilation; and information about the amount of coding and the process and method of coding, for example whether a dataset has been pre or post-coded.

2. More information about boundary geographies, spatial units and the granularity of the data

The workshop participants felt that this information would be crucial to many users. The issue was, however, deemed to be too large and complex to discuss in full during the workshop. It was recommended that a working group should be established to consider this issue.

3. More information about the structure of datasets

The workshop participants felt that information about whether a dataset is, for example, a relational database would help potential users assess the utility of a dataset. It was felt that this information might be particularly useful for users who were interested in using datasets for teaching.

4. More information about the format of datasets, in particular the size of datasets and the software they are held in

The workshop participants felt that this information would also help potential users assess the utility of a dataset.

5. More information about the software and versions used to create datasets

The workshop participants felt that this was important information because it would provide clues about both the nature of a dataset and the ways in which it might be used

6. On-line documentation

The workshop participants felt that access to on-line documentation is development which would be particularly helpful in allowing potential users assess the utility of a dataset.

7. A sample of the data

The workshop participants also felt that a relevant sample of the data would be particularly helpful in allowing potential users assess the utility of a dataset.

2.4 Reactions to the Dublin Core

Once participants had identified the core search and retrieve elements, some time was spent mapping these against the Dublin Core elements in order to highlight any difficulties and to recommend any changes and extensions. In general, the workshop participants were happy to accept the Dublin Core as the basis for the AHDS catalogue. A number of recommendations were made and these are outlined below :

2.4.1. Title

Label: TITLE

The name given to the resource by the CREATOR or PUBLISHER.

The TITLE element is unproblematic, and users from the historical community are likely to interpret this element in exactly the same way as users from the rest of the arts and humanities community.

2.4.2. Author or Creator

Label: CREATOR

The person(s) or organisation(s) primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources.

The CREATOR element is problematic because the creator is defined as the person(s) or organisation(s) primarily responsible for the intellectual content of the resource. This element would not be problematic if the creator was defined as the person(s) or organisation(s) primarily intellectual responsible for the resource. This subtle difference is important because historical datasets are mainly transcriptions of original sources. In the case of transcriptions the person(s) or organisation(s) who are responsible for 'creating' the dataset can be held to be intellectual responsible for the dataset, but can not be held to be responsible for the intellectual content of the dataset. The person(s) or organisation(s) who might best be held to be responsible for the intellectual content of the dataset, are instead the person(s) or organisation(s) who created the original source(s).

The concept of primary intellectual responsibility might also be problematic in some contexts. Thus it was felt that the CREATOR and CONTRIBUTORS elements might best be combined into a single element which could encompass all the person(s) and organisation(s) connected with a resource.

2.4.3. Subject and Keywords

Label: SUBJECT

The topic of the resource, or keywords or phrases that describe the subject or content of the resource. The intent of the specification of this element is to promote the use of controlled vocabularies and keywords. This element might well include scheme-qualified classification data (for example, Library of Congress Classification Numbers or Dewey Decimal numbers) or scheme-qualified controlled vocabularies (such as Medical Subject Headings or Art and Architecture Thesaurus descriptors) as well.

The SUBJECT element is unproblematic, and users from the historical community are likely to interpret this element in exactly the same way as users from the rest of the arts and humanities community.

2.4.4. Description

Label: DESCRIPTION

A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. Future metadata collections might well include computational content description (spectral analysis of a visual resource, for example) that may not be embeddable in current network systems. In such a case this field might contain a link to such a description rather than the description itself.

The DESCRIPTION element is unproblematic, and users from the historical community are likely to interpret this element in exactly the same way as users from the rest of the arts and humanities community.

2.4.5. Publisher

Label: PUBLISHER

The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity. The intent of specifying this field is to identify the entity that provides access to the resource.

The PUBLISHER element is problematic because the label publisher is inappropriate in some contexts. The HDS is a distributor or disseminator, thus it was felt that the PUBLISHER element needs a qualifier called distributor or disseminator.

2.4.6. Other Contributors

Label: CONTRIBUTORS

Person(s) or organisation(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specified in the CREATOR element (for example, editors, transcribers, illustrators, and convenors).

The CONTRIBUTORS element is not in itself problematic: however, because of problems with the CREATOR element, it was felt that the two elements might best be combined into a single element which could encompass all the person(s) and organisation(s) connected with a resource.

2.4.7. Date

Label: DATE

The date the resource was made available in its present form. The recommended best practice is an 8 digit number in the form YYYYMMDD as defined by ANSI X330-1985. In this scheme, the date element for the day this is written would be 19961203, or December 3, 1996. Many other schema are possible, but if used, they should be identified in an unambiguous manner.

The DATE element is not problematic, if it continues to be defined as the date on which the resource was made available in its present form. It should, however, be noted that the Dublin Core Qualifiers which have been proposed by Jon Knight and Martin Hamilton of the ROADS project set the default date type as the date on which the resource was first created. It was agreed that this element might need to be more precisely defined, and it was recognised that the date on which a resource was made available in its present form is easy to define, whilst the date on which a resource was first created might be difficult to define.

2.4.8. Resource Type

Label: TYPE

The category of the resource, such as home page, novel, poem, working paper, pre-print, technical report, essay, dictionary. It is expected that RESOURCE TYPE will be chosen from an enumerated list of types. A preliminary set of such types can be found at the following URL:
http://www.roads.lut.ac.uk/Metadata/DC-ObjectTypes.html

The TYPE element is not in itself problematic, however it was felt that the current list of preliminary object types was not appropriate to the needs of historians and the HDS. It was the view of the workshop participants that the categorisation `digital resource' would be more suitable than preliminary recommended object type `dataset'. It was also the view of the workshop participants that this element would also need to include more specialised information about the resource type. The specialised data would include the information about the structure of datasets which the workshop participants recommended should be included in the SSD, and the types of information which are already recorded in section 202 of the SSD (Kind of Data).

2.4.9. Format

Label: FORMAT

The data representation of the resource, such as text/HTML, ASCII, Postscript file, executable application, or JPEG image. The intent of specifying this element is to provide information necessary to allow people or machines to make decisions about the usability of the encoded data (what hardware and software might be required to display or execute it, for example). As with RESOURCE TYPE, FORMAT will be assigned from enumerated lists such as registered Internet Media Types (MIME types). In principal, formats can include physical media such as books, serials, or other non-electronic media.

The FORMAT element is not particularly problematic, and users from the historical community are likely to interpret this element in exactly the same way as users from the rest of the arts and humanities community. It was, however, also the view of the workshop participants that this element might also include information about the size of resource, and information about the software it is held in, if the data representation is not ASCII.

2.4.10. Resource Identifier

Label: IDENTIFIER

String or number used to uniquely identify the resource Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers, such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element.

The IDENTIFIER element is unproblematic, and users from the historical community are likely to interpret this element in exactly the same way as users from the rest of the arts and humanities community.

2.4.11. Source

Label: SOURCE

The work, either print or electronic, from which this resource is derived, if applicable. For example, an HTML encoding of a Shakespearean sonnet might identify the paper version of the sonnet from which the electronic version was transcribed.

The SOURCE element is crucial to the historical discipline, because virtually all historical datasets are derived from one or more original sources. It was the view of the workshop participants that this element should include a wide range of information about the original sources which would range from the generic to the specific. The generic data would include the type of information which is already recorded by the data source keywords, thus for example it might specify that the source is taxation records. The specific information would include the types of information which are already recorded in section 203 of the SSD (Data Sources). For example it might specify that the source is the Hearth Tax Returns which have the archival reference number XXX.

2.4.12. Language

Label: LANGUAGE

Language(s) of the intellectual content of the resource. Where practical, the content of this field should coincide with the Z3953 three character codes for written languages. See: http://www.sil.org/sgml/nisoLang3-1994.html

The LANGUAGE element is unproblematic, and users from the historical community are likely to interpret this element in exactly the same way as users from the rest of the arts and humanities community.

2.4.13. Relation

Label: RELATION

Relationship to other resources. The intent of specifying this element is to provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves. For example, images in a document, chapters in a book, or items in a collection. A formal specification of RELATION is currently under development. Users and developers should understand that use of this element should be currently considered experimental.

The RELATION element is unproblematic, and users from the historical community are likely to interpret this element in exactly the same way as users from the rest of the arts and humanities community.

2.4.14. Coverage

Label: COVERAGE

The spatial locations and temporal duration characteristic of the resource. Formal specification of COVERAGE is currently under development. Users and developers should understand that use of this element should be currently considered experimental.

The COVERAGE element is problematic because it combines both spatial locations and temporal duration. It was agreed that these are two very important elements in their own right which need to be separated out. Furthermore, it was the view of the workshop participants that the spatial location component and the temporal duration component can each be further sub-divided. The spatial location component would include both information about the actual places covered by the data, and information about boundary geographies, spatial units and the granularity of the data. The temporal duration component would include information about the time span covered and the periodicity of the data.

2.4.15. Rights Management

Label: RIGHTS

The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way. The intent of specifying this field is to allow providers a means to associate terms and conditions or copyright statements with a resource or collection of resources. No assumptions should be made by users if such a field is empty or not present.

The RIGHTS element is unproblematic and, users from the historical community are likely to interpret this element in exactly the same way as users from the rest of the arts and humanities community.

3. Workshop Recommendations

This section summarises the recommendations which were made at the HDS resource discovery workshop concerning the SSD, the Dublin Core and the AHDS Integrated Catalogue.

3.1 Extending the SSD

Although it was recognised that, in the main, the SSD is sufficient to enable searchers to identify historical datasets of interest, it was also recognised that the SSD has some weaknesses, and the workshop participants recommended the following seven extensions to SSD:

3.2 Refining the Dublin Core

The workshop participants were in general happy to accept the Dublin Core as a key to the unlocking of more detailed discipline specific resources, and they were in general happy to accept that the Dublin Core could be used in conjunction with the SSD. However, it was recommended that the following five refinements would need to be made:

3.3 Modelling the AHDS Integrated Catalogue

It was recommended that the following six search options, which were identified as being essential to the discovery of historical electronic materials, should be included in the AHDS Integrated Catalogue as searchable elements:

It should be noted that although the source search option was identified as being particularly important, it would be acceptable if this search option was included in the topic search option. It was also recommended that it should be possible to combine multiple entries within the search form either with a Boolean AND or Boolean OR.

The HDS would recommend that the AHDS Integrated Catalogue should be developed along the lines of the Council of European Social Science Data Archives Integrated Data Catalogue (Cessda IDC), which is a unified collection of mainly European social science data archive catalogues, which can be searched through one common interface. The Cessda IDC can be accessed at: http://dastar.essex.ac.uk/Cessda/IDC

The participants expressed a wish to be able to retrieve the information currently made available through the HDS information retrieval system BIRON in the AHDS catalogue. They would not wish to retrieve less information than is currently available to them.

4. Consultation Process

This document is a draft report of the findings of the HDS resource discovery workshop, and it will be circulated widely for consultation and comment during June and July 1997. All comments received by 23 July 1997 will be taken into consideration and incorporated into the final version of the report, which will be made available by 30 July 1997. Comments should be submitted to the authors of the report, either by email to hds@essex.ac.uk, or in writing to:

History Data Service
The Data Archive
University of Essex
Wivenhoe Park
Colchester
CO4 3SQ

 
Copyright & Disclaimer