Discovering Digital Resources : a Workshop for Historians Workshop Report
A report from the resource discovery workshop organised
by the History Data Service, and held at the University of Essex, 18th and
19th April 1997
1. Introduction and Overview
This report summarises the findings of a workshop organised by The History
Data Service (HDS), and held at the University
of Essex in April 1997. The workshop was one of a series organised under
the auspices of the Arts and Humanities Data
Service (AHDS) and the United Kingdom
Office for Library & Information Networking (UKOLN), which will feed
into the development of the AHDS Integrated Catalogue. A draft
version of this report was widely circulated for review and comment.
The aim of the workshop was to explore and assess the information requirements
of the historical community for discovering digital resources. The participants
were asked to define a set of information criteria for describing and searching
for historical electronic materials, and assess the suitability of the existing
HDS catalogue records and search facilities against the agreed criteria.
The workshop participants came
from a wide range of backgrounds, representing three primary groups of stakeholders:
data creators; actual or potential secondary users; archivists and others working
with historical data materials in related fields. Prior to the workshop, participants
were supplied with a number of introductory papers, which were intended to place
the workshop in its wider context.
A provisional timetable was drafted
for the meeting with the proviso that it would be subject to alteration as the
issues arose and the discussion developed. The workshop commenced with a series
of introductory talks designed to set the scene for the workshops; Sheila Anderson
presented an overview of the aims and objectives for the workshops; Pam Miller
described the existing HDS catalogue system and how this adhered to the Standard
Study Description standard; Cressida Chappell introduced the Dublin Core; Hans-Joergen
Marker summarised his paper describing the Standard Study Description, SSD and
the General International Standard for Archival Description, ISAD(G). Participants
were asked to consider the type of searches they were likely to undertake and
the type of search strategies they might use to identify electronic material.
This session set the scene for the development of the range and type of information
the participants might wish to search for.
Participants moved on to identify the key elements which they wished to search
on and the elements which they would wish to retrieve information on in the
full catalogue record. Identifying the key elements was a time-consuming and
difficult process. Participants experienced some difficulty in separating out
the elements to be used for searching as opposed to those suitable for retrieval,
but they were ultimately successful. They also regarded design issues as a key
issue in the development of a useable and successful catalogue. They recommended
that the interface to the catalogue should be simple and effective and offer
a multi-level approach to searching. They recommended that the first step should
be limited to a small number of elements and then offer the ability for further,
more sophisticated searches on an additional range of material. The workshop
recommended that these issues are taken into consideration
in the development of the catalogue.
After a limited discussion, participants indicated that they were happy to
accept the existing standard used by the HDS (the SSD) and did not recommend
changing to an alternative standard. However, they did highlight some of the
weaknesses in the SSD and recommended a number of extensions which the HDS will
take forward in its own development work. Using the elements identified and
the elements included in the SSD, the participants proceed to map these against
the Dublin Core in order to asses its viability for the basis of the AHDS catalogue.
The conclusion reached by the workshop is that the Dublin Core is a workable
base for the AHDS catalogue with some alterations and extensions. We are confident
that the results of the workshop can feed into the development of the AHDS catalogue
and ensure that the needs and requirements of the historical community will
be catered for.
2. Resource Discovery Requirements
2.1 Relevant Standards
The workshop assessed the two main standards relevant to the work of the HDS
- the Standard Study Description (SSD) and the General International Standard
Archival Description (ISAD(G)). The SSD was developed by the Social Science
Data Archives, in the 1970s specifically for machine-readable files. ISAD(G),
which was developed during the early 1990s by an ad hoc commission sponsored
by UNESCO and various national archives, is intended to describe archival material
and is not specifically intended for machine-readable files. The existing HDS
catalogue is based upon the Standard Study Description Scheme, modified for
use with historical data files. Following the introductory sessions participants
were asked to assess the two standards for describing historical data files.
Participants concluded that the existing standard used by the HDS - the SSD
- met their needs, although they did recommend some minor changes and additions.
2.2 Search Elements
After an initial discussion, the participants identified four elements as essential
to the discovery of historical electronic materials - source, geography, time
period and subject/topic. Following further discussion, title and person/organisation
were added to the list of essential elements. Participants had been asked to
distinguish between elements for searching and elements of information that
they would wish to see when the catalogue record was retrieved. However, it
became clear that this would be very difficult. Therefore, what follows - although
primarily listing the elements upon which the participants wish to be able to
search on in the AHDS catalogue - also includes a range of information that
may be included for retrieval purposes.
2.2.1. Title
The title function would simply involve a search of the title field (i.e.
DC.TITLE). This was agreed to be useful in instances where a searcher knew
the title of the dataset or may wish to search for datasets with particular
words in the title. This element may be of particular use for librarians and
others working in the field of information searching and provision.
2.2.2. Person/organisation
The person/organisation function would entail a search of all the fields
in catalogue record, which contain the names of person(s) or organisation(s)
associated with a dataset (i.e. DC.CREATOR and DC.CONTRIBUTORS). This field
would be of relevance when the searching for datasets created by a particular
person or organisation.
2.2.3. Time period
This function would allow for the recovery of all datasets which cover a
given year, or period of years (i.e. DC.COVERAGE). There was a recommendation
that the time period should, where possible, be linked to the source. For
example, where a dataset has been created from multiple sources, the time
span covered may be a significant period of time but the time period covered
by each individual source may not cover the entire period; the participants
were keen that this should be obvious when looking at the catalogue record.
In addition, participants expressed a desire to include information on periodicity
where this is appropriate to the sources used.
2.2.4. Geographical location
This function would allow for the recovery of all datasets which cover a
given place at a sufficient level of detail (i.e. DC.COVERAGE). Two functions
were highlighted - first, the requirement to search on a place name, within
a hierarchical thesaurus. Thus a search for Essex would not only recover all
datasets which are indexed by the term Essex, but would also recover all datasets
which include county level data for Essex and are indexed by a higher level
index term, for example, England, and might also be extended to include all
datasets which are indexed by places within Essex. Second, the ability to
search and retrieve information on different spatial units (e.g. parish, county,
enumeration district, parliamentary constituency). The catalogue record should
also indicate the lowest spatial unit at which the data may be sensibly analysed
- the level at which one may still achieve meaningful results.
2.2.5. Source
This function would cover details of the source or sources from which the
dataset was created (i.e. DC.SOURCE). Participants wanted to be able to search
for source with a multi-level approach, from a generic level e.g. census records
down to the archival reference numbers which can uniquely identify a particular
source. Thus searches would range from the very general, for example, a search
for taxation records, to the very specific, for example, a search for a PRO
reference number.
2.2.6. Subject/topic
This function would focus on the central topics covered by a dataset. The
participants strongly recommended that it should incorporate a freetext search
of the entire catalogue record. Failing that it should at least incorporate
a search of the following sections of the catalogue record: subject catergory,
subject keywords, abstract and main topics (i.e. DC.SUBJECT and DC.DESCRIPTION).
The workshop participants also expressed great interest in the design of
the information retrieval system. They favoured a system which would combine
both simple and complex interfaces. The simple interface would offer a limited
number of search options, and the more complex interfaces would offer a wide
range of search options which would encompass the full range of fields in
the SSD.
2.3 Retrievable Information
The participants recommended that the Standard Study Description scheme as
used by the HDS is sufficient to enable searchers to identify historical datasets
of interest. However, the participants suggested that it might be further improved
by including the following extensions:
1. More information about the relationship between sources and datasets, in particular
the level of transcription and the amount of coding
The workshop participants felt that, since sources are crucial in the context
of the historical disciplines, there should be more information about the
sources and the relationship between a dataset and its source(s). The following
types of information were identified as being particularly useful: information
about archival reference numbers; information about the level of transcription
and compilation; and information about the amount of coding and the process
and method of coding, for example whether a dataset has been pre or post-coded.
2. More information about boundary geographies, spatial units and the granularity
of the data
The workshop participants felt that this information would be crucial to
many users. The issue was, however, deemed to be too large and complex to
discuss in full during the workshop. It was recommended that
a working group should be established to consider this issue.
3. More information about the structure of datasets
The workshop participants felt that information about whether a dataset is,
for example, a relational database would help potential users assess the utility
of a dataset. It was felt that this information might be particularly useful
for users who were interested in using datasets for teaching.
4. More information about the format of datasets, in particular the size of datasets
and the software they are held in
The workshop participants felt that this information would also help potential
users assess the utility of a dataset.
5. More information about the software and versions used to create datasets
The workshop participants felt that this was important information because
it would provide clues about both the nature of a dataset and the ways in
which it might be used
6. On-line documentation
The workshop participants felt that providing on-line access to the explanatory
documentation which accompanies a dataset would be particularly useful in
helping potential users assess the utility of a dataset.
7. A sample of the data
The workshop participants also felt that a relevant sample of the data would
be particularly helpful in allowing potential users assess the utility of
a dataset.
2.4 Reactions to the Dublin Core
Once participants had identified the core search and retrieve elements, some
time was spent mapping these against the Dublin Core elements in order to highlight
any difficulties and to recommend any changes and extensions. In general, the
workshop participants were happy to accept the Dublin Core as the basis for
the AHDS catalogue. A number of recommendations were made and these are outlined
below :
2.4.1. Title
Label: TITLE
The name given to the resource by the CREATOR or PUBLISHER.
The TITLE element is unproblematic and the HDS agrees with the AHDS-wide
guidelines and TYPES which relate to this element. The TITLE element maps
to section 101 Title in the SSD, which carries the main title, subtitle, series
title and alternative title. 101 1 Main title maps to the AHDS-wide TYPE main,
101 2 Note maps to the AHDS-wide TYPE subtitle, 101 4 Series title maps to
the AHDS-wide TYPE series and 101 5 Alternative title maps to the AHDS-wide
TYPE alternate. 101 3 Project number contains any project or reference numbers
which have been used by the depositior to identify the dataset prior to deposit,
and the HDS would like to recommend the inclusion of a TYPE called something
like projectNumber.
2.4.2. Author or Creator
Label: CREATOR
The person(s) or organisation(s) primarily responsible for the intellectual
content of the resource. For example, authors in the case of written documents,
artists, photographers, or illustrators in the case of visual resources.
The CREATOR element was felt to be problematic because of the way creator
is defined. Historical datasets are mainly transcriptions of original sources.
Therefore, there are usually at least two layers of responsibility for the
intellectual content of the resource - the creators of the original source
from which the dataset was created and the creators of the dataset itself.
The person(s) or organisation(s) who are responsible for 'creating' the dataset
can be held to be intellectual responsible for the content of the dataset
but they can not be held to be responsible for the intellectual content of
the source from which the dataset has been created. The person(s) or organisation(s)
who might best be held to be responsible for the intellectual content of the
source, are instead the person(s) or organisation(s) who created the original
source(s).
The concept of assigning primary intellectual responsibility might
also be problematic in some contexts. Thus it was felt that the CREATOR and
CONTRIBUTORS elements might best be combined into a single element which could
encompass all the person(s) and organisation(s) connected with the creation
of a resource.
The HDS, therefore agrees with the general principle of the AHDS-wide recommendation
that the CREATOR and CONTRIBUTORS elements should be merged and that this
element should be redefined as "The person(s) or organisation(s) responsible
for creation of the original resource, its source, surrogates, or metadata
pertaining to the above whose involvement is considered worthy of inclusion
for the purposes of discovering said resource." We take 'original resource'
here to mean the resource which the Service Provider is cataloguing. The History
Data Service does not include the name(s) of the creator of the source(s)
from which the dataset has been created in its Name Authority Files, however,
their inclusion is at present under consideration.
The combined CREATOR and CONTRIBUTORS element maps to the following sections
in the SSD 131 Principal Investigator, 132 Data Collector, 141 Research Initiator,
141 99 Other Acknowledgements, 142 Sponsor and 161 Original Data Producer.
131 Principal Investigator maps to the AHDS-wide TYPE role.projectLeader;
and 132 Data Collector, 141 Research Initiator, 141 99 Other Acknowledgements,
142 Sponsor and 161 Original Data Producer map to the AHDS-wide TYPE role.majorContributor.
The affiliations included in sections 131, 132, 141, 141 99, 142 and 161 of
the SSD map to the AHDS-wide TYPE affiliation. The Data Archive's Name Authority
Files used by the HDS do not specifically distinguish between individuals
and organisations, consequently the HDS would like to recommend the inclusion
of another AHDS-wide TYPE, in addition to personalName and corporateName,
called Name which would be the default TYPE. The Data Archive's Name Authority
Files will need to be included as SCHEMES in the central AHDS repository.
2.4.3. Subject and Keywords
Label: SUBJECT
The topic of the resource, or keywords or phrases that describe the subject
or content of the resource. The intent of the specification of this element
is to promote the use of controlled vocabularies and keywords. This element
might well include scheme-qualified classification data (for example, Library
of Congress Classification Numbers or Dewey Decimal numbers) or scheme-qualified
controlled vocabularies (such as Medical Subject Headings or Art and Architecture
Thesaurus descriptors) as well.
The SUBJECT element is unproblematic, and the HDS agrees with the AHDS-wide
guidelines and TYPES which relate to this element. The SUBJECT element maps
to section 003 Subject Category in the SSD, and to the Subject keywords in
the index record which are drawn from HASSET. The Data Archive's Subject Categories
and The Data Archive's Humanities and Social Science Electronic Thesaurus
(HASSET) will thus need to be included as SCHEMEs in the central AHDS repository.
2.4.4. Description
Label: DESCRIPTION
A textual description of the content of the resource, including abstracts
in the case of document-like objects or content descriptions in the case of
visual resources. Future metadata collections might well include computational
content description (spectral analysis of a visual resource, for example)
that may not be embeddable in current network systems. In such a case this
field might contain a link to such a description rather than the description
itself.
The DESCRIPTION element is unproblematic and the HDS agrees with the AHDS-wide
guidelines and TYPES which relate to this element. The DESCRIPTION element
maps to the following sections in the SSD: 201 Abstract and 599 Main topics.
2.4.5. Publisher
Label: PUBLISHER
The entity responsible for making the resource available in its present form,
such as a publisher, a university department, or a corporate entity. The intent
of specifying this field is to identify the entity that provides access to
the resource.
The PUBLISHER element was felt to be problematic because the label publisher
is inappropriate in the context of the HDS. The HDS is a distributor or disseminator,
thus it was felt that the PUBLISHER element needs a TYPE called distributor
or disseminator. In view of the fact that the word 'distributor' has a domain-specific
meaning in the context of the Performing Arts Data Service (PADS) the HDS
would like to recommend the inclusion of an AHDS-wide TYPE called Disseminator.
2.4.6. Other Contributors
Label: CONTRIBUTORS
Person(s) or organisation(s) in addition to those specified in the CREATOR
element who have made significant intellectual contributions to the resource
but whose contribution is secondary to the individuals or entities specified
in the CREATOR element (for example, editors, transcribers, illustrators,
and convenors).
The CONTRIBUTORS element was felt to be problematic (see under section CREATOR)
and the HDS agrees with the AHDS-wide recommendation that this element should
be merged with CREATOR.
2.4.7. Date
Label: DATE
The date the resource was made available in its present form. The recommended
best practice is an 8 digit number in the form YYYYMMDD as defined by ANSI
X330-1985. In this scheme, the date element for the day this is written would
be 19961203, or December 3, 1996. Many other schema are possible, but if used,
they should be identified in an unambiguous manner.
The DATE element was felt to be problematic because the Dublin Core Qualifiers
which have been proposed by Jon Knight and Martin Hamilton of the ROADS project
set the default date type as the date on which the resource was first created,
rather than the date on which a resource was made available in its present
form. In the context of the HDS the date on which a resource was made available
in its present form is easy to define, whilst the date on which a resource
was first created would be more difficult, if not impossible, to define. Thus
it was recommended that this element should be more precisely defined.
The HDS, therefore agrees with the AHDS-wide TYPES which relate to this element
and the AHDS-wide recommendation that the DATE elements should be redefined
as 'Dates associated with the creation and dissemination of the resource.
These dates should not be confused with those related to the content of a
resource (AD 43, in a database of artefacts from the Roman conquest of Britain)
which are dealt with under COVERAGE or its subject (1812 in relation to Tchaikovsky's
eponymous overture) which are dealt with under SUBJECT".
The DATE element maps to the following sections in the SSD: 122 Date of first
release/date of latest release, 231 Dates of fieldwork and 300 Date file created
by original investigator. 122 Date of first release maps to the AHDS-wide
TYPE accessioned; 231 Dates of fieldwork start date and 300 Date file created
by original investigator start date map to the AHDS-wide TYPE projectStart;
231 Dates of fieldwork end date and 300 Date file created by original investigator
end date map to the AHDS-wide TYPE projectEnd; 122 Date of latest release
maps to the AHDS-wide TYPE lastUpdate; and Date catalogue entry last updated
maps to the AHDS-wide TYPE metadataLastModified.
2.4.8. Resource Type
Label: TYPE
The category of the resource, such as home page, novel, poem, working paper,
pre-print, technical report, essay, dictionary. It is expected that RESOURCE
TYPE will be chosen from an enumerated list of types. A preliminary set of
such types can be found at the following URL:
http://www.roads.lut.ac.uk/Metadata/DC-ObjectTypes.html
The TYPE element is not in itself problematic, however it was felt that the
current list of preliminary object types was not appropriate to the needs
of historians and the HDS. It was the view of the workshop participants that
the categorisation `digital resource' would be more suitable than preliminary
recommended object type `dataset'. It was also the view of the workshop participants
that this element would also need to include more specialised information
about the resource type. The specialised data would include the information
about the structure of datasets which the workshop participants recommended
should be included in the SSD, and the types of information which are already
recorded in section 202 of the SSD (Kind of Data). The provisional hierarchical
list of TYPES which is now taking shape at:
http://sunsite.Berkeley.EDU/Metadata/types.html
would seem to adequately cover the most of information which is recorded in
202 Kind of data, and the HDS agrees with the AHDS-wide guidelines which relate
to this element. The only information not covered by the provisional hierarchical
list of TYPES is an indication of whether the data is individual level or
aggregate level, in the context of the HDS this information can be very important.
The TYPE element maps to Section 202 Kind of data in the SSD, and the sub-sections
textual data, numeric data, alpha/numeric data and image map respectively
to the provisional TYPES text, data.numeric, data and image.graphic.
2.4.9. Format
Label: FORMAT
The data representation of the resource, such as text/HTML, ASCII, Postscript
file, executable application, or JPEG image. The intent of specifying this
element is to provide information necessary to allow people or machines to
make decisions about the usability of the encoded data (what hardware and
software might be required to display or execute it, for example). As with
RESOURCE TYPE, FORMAT will be assigned from enumerated lists such as registered
Internet Media Types (MIME types). In principal, formats can include physical
media such as books, serials, or other non-electronic media.
The FORMAT element is not particularly problematic and HDS agrees with the
AHDS-wide guidelines, SCHEMES and TYPES which relate to this element. It was
also the view of the workshop participants that this element should also include
information about the size of resource and any software it is held in, either
in additon to, or instead of, the ASCII representation. It is the view of
the HDS that this element is the place for information about the versions
of software, page lengths, soundtrack/film running times, file sizes, and
whether a film is colour/mono, aural/silent etc. The FORMAT element maps to
the following sections of the SSD: 111 99 which specifies the current data
representation and 111 3 Size. 111 99 maps to the AHDS-wide TYPE fileType
but does not use the AHDS-wide SCHEME IMT, 111 3 maps to the AHDS-wide TYPE
fileSize. Section 213 Dimensions of the dataset which records the number of
observations in a dataset, also contains information relevant to this element
which is not covered by the AHDS-wide TYPES.
2.4.10. Resource Identifier
Label: IDENTIFIER
String or number used to uniquely identify the resource Examples for networked
resources include URLs and URNs (when implemented). Other globally-unique
identifiers, such as International Standard Book Numbers (ISBN) or other formal
names would also be candidates for this element.
The IDENTIFIER element is unproblematic and HDS agrees with the AHDS-wide
guidelines and SCHEMES which relate to this element. It should, however, be
noted that, except in the case of studies accessioned from other data archives,
it is usual for HDS catalogue records to contain only one identification number.
Thus the definition for the AHDS-wide SCHEME HDS will be different from the
ADS and will be as follows: "HDS/DA internal identification number, uniquely
identifying any resource within the HDS/DA catalogue. An HDS/DA identifier
is required for every resource in the HDS/DA catalogue." The IDENTIFIER
element maps to the Data Archive Study Number which uniquely identifies each
dataset held by The Data Archive. Section 112 99 records the Original Study
Number of datasets which have come from other data archives.
2.4.11. Source
Label: SOURCE
The work, either print or electronic, from which this resource is derived,
if applicable. For example, an HTML encoding of a Shakespearean sonnet might
identify the paper version of the sonnet from which the electronic version
was transcribed.
The SOURCE element is crucial to the historical discipline, because virtually
all historical datasets are derived from one or more original sources. It
was the view of the workshop participants that this element should include
a wide range of information about the original sources which would range from
the generic to the specific. The generic data would include the type of information
which is already recorded by the data source keywords, thus for example it
might specify that the source is taxation records. The specific information
would include the types of information which are already recorded in section
203 of the SSD (Data Sources). For example it might specify that the source
is the Hearth Tax Returns which have the archival reference number XXX. The
HDS agrees with the AHDS-wide guidelines and SCHEMES which relate to this
element. The SOURCE element maps to Section 203 Data sources in the SSD, and
to the Data source keywords in the index record which are drawn from HASSET.
The Data Archive's Humanities and Social Science Electronic Thesaurus (HASSET)
will thus need to be included as a SCHEME in the central AHDS repository.
2.4.12. Language
Label: LANGUAGE
Language(s) of the intellectual content of the resource. Where practical,
the content of this field should coincide with the Z3953 three character codes
for written languages. See: http://www.sil.org/sgml/nisoLang3-1994.html
The LANGUAGE element is unproblematic and the HDS agrees with the AHDS-wide
guidelines and SCHEMES which relate to this element. The LANGUAGE element
maps to Section 311 Language of written material in the SSD.
2.4.13. Relation
Label: RELATION
Relationship to other resources. The intent of specifying this element is
to provide a means to express relationships among resources that have formal
relationships to others, but exist as discrete resources themselves. For example,
images in a document, chapters in a book, or items in a collection. A formal
specification of RELATION is currently under development. Users and developers
should understand that use of this element should be currently considered
experimental.
The RELATION element is unproblematic and the HDS agrees with the AHDS-wide
guidelines TYPES and SCHEMES which relate to this element. The definition
for the AHDS-wide SCHEME HDS will be "HDS/DA internal identification
number, uniquely identifying any resource within the HDS/DA catalogue. A RELATION
using the HDS identifier is required when there
is a relationship between the dataset being described in the catalogue record
and another dataset held by the HDS or the Archive". The RELATION element
maps to the following sections in the SSD: 401 References/reports by principal
investigators, 411 Other references/reports based on data, 441 References
to related datasets, 442 Constituent datasets. 401 and 411 maps to the AHDS-wide
TYPE isParentOf, 441 Group maps to the AHDS-wide TYPE isChildOf, and 441 Other
Studies and 442 maps to the AHDS-wide TYPE isSiblingOf.
2.4.14. Coverage
Label: COVERAGE
The spatial locations and temporal duration characteristic of the resource.
Formal specification of COVERAGE is currently under development. Users and
developers should understand that use of this element should be currently
considered experimental.
The COVERAGE element is problematic because it combines both spatial locations
and temporal duration. It was agreed that these are two very important elements
in their own right which are required to be separated out. Furthermore, it
was the view of the workshop participants that the spatial location component
and the temporal duration component can each be further sub-divided. The spatial
location component would include both information about the actual places
covered by the data, and information about boundary geographies, spatial units
and the granularity of the data. The temporal duration component would include
information about the time span covered and the periodicity of the data. The
spatial locations component of the COVERAGE element maps to the following
sections in the SSD: 222 23 Town/village, 222 24 Region/county and 222 25
Country; and to the geographical keywords in the index record. The temporal
durations component of the COVERAGE element maps to section 220 Time Period
Covered in the SSD, and to the Date year keywords in the index record which
are drawn from HASSET. The Data Archive's Humanities and Social Science Electronic
Thesaurus (HASSET) will thus need to be included as a SCHEME in the central
AHDS repository.
2.4.15. Rights Management
Label: RIGHTS
The content of this element is intended to be a link (a URL or other suitable
URI as appropriate) to a copyright notice, a rights-management statement,
or perhaps a server that would provide such information in a dynamic way.
The intent of specifying this field is to allow providers a means to associate
terms and conditions or copyright statements with a resource or collection
of resources. No assumptions should be made by users if such a field is empty
or not present.
The RIGHTS element is unproblematic. HDS catalogue records include a copyright
statement in Section 199 of the SSD, as well as a statement about the access
conditions which apply to a dataset, generated from codes which are stored
in Section 111 of the SSD.
3. Workshop Recommendations
This section summarises the recommendations which were made at the HDS resource
discovery workshop concerning the SSD, the Dublin Core and the AHDS Integrated
Catalogue.
3.1 Extending the SSD
Although it was recognised that, in the main, the SSD is sufficient to enable
searchers to identify historical datasets of interest, it was also recognised
that the SSD has some weaknesses, and the workshop participants recommended
the following seven extensions to SSD:
3.2 Refining the Dublin Core
The workshop participants were in general happy to accept the Dublin Core as
a key to the unlocking of more detailed discipline specific resources, and they
were in general happy to accept that the Dublin Core could be used in conjunction
with the SSD. However, it was recommended that the following five refinements
would need to be made:
3.3 Modelling the AHDS Integrated Catalogue
It was recommended that the following six search options, which were identified
as being essential to the discovery of historical electronic materials, should
be included in the AHDS Integrated Catalogue as searchable elements:
It should be noted that although the source search option was identified as
being particularly important, it would be acceptable if this search option was
included in the topic search option. It was also recommended that it should
be possible to combine multiple entries within the search form either with a
Boolean AND or Boolean OR.
The HDS would recommend that the AHDS Integrated Catalogue should be developed
along the lines of the Council of European Social Science Data Archives Integrated
Data Catalogue (Cessda IDC), which is a unified collection of mainly European
social science data archive catalogues, which can be searched through one common
interface. The Cessda IDC can be accessed at: http://dastar.essex.ac.uk/Cessda/IDC
The participants expressed a wish to be able to retrieve the information currently
made available through the HDS information retrieval system BIRON
in the AHDS catalogue. They would not wish to retrieve less information than
is currently available to them.
4. Conclusion
The HDS workshop proved successful in its stated goals - it identified a standard
for cataloguing with which the participants were content (the SSD), mapped this
successfully against the Dublin Core, and provided a range of recommendations
which will not only feed into the development of the AHDS catalogue but also
provide the foundation for improvements of the existing HDS information retrieval
and cataloguing system.
The workshop proved interesting and challenging at times. This was due in part
to the desire on the part of the participants to focus discussion around two
issues. First, they were keen to discuss the existing HDS / Data Archive cataloguing
system and to suggest extensions and improvements to it. They were also sometimes
diverted into discussing the collection of information for input into the system.
Whilst these were extremely useful discussions from the point of view of developing
HDS internal procedures and documentation, they were not wholly relevant to
the topic in hand. However, a number of useful recommendations and suggestions
emerged from these discussion which the HDS will be acting upon in the coming
months. Second, participants wanted to spend a considerable amount of time discussing
the design of the proposed AHDS catalogue system. Participants argued strongly
that it was difficult to separate out information content and retrieval from
the design of the system. Thus, they made recommendations to the AHDS on the
layout and design of the information retrieval system. These recommendations
however, were inevitably subject specific and it is doubtful that they can be
implemented across the AHDS. However, this discussion should serve as an important
reminder that the design of the system and the requirement for ease and simplicity
when searching should be a major consideration when commissioning the AHDS cataloguing
system.
The overall reaction to the workshop and to the development of the AHDS cataloguing
initiative was very positive. The participants had expressed a desire to retain
the functionality of the existing HDS cataloguing system. They were very pleased
that the development of the AHDS catalogue would provide them with an opportunity
to search across domains where this might of interest, while retaining the option
to search only the HDS catalogue using the existing BIRON system. The participants
looked forward to testing the AHDS system in 1998.
|