Reflections on experience with archaeological controlled vocabularies in indexing and retrieval

Douglas Tudhope*, Ceri Binding

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Downloads (Pure)


In the STAR project investigating semantic integration, we employed thesauri from the Forum on Information Standards in Heritage (FISH) and word lists from Historic England recording manuals. Semantic integration allowed search across both archaeological datasets and grey literature reports via data extraction and NLP (Tudhope et al. 2011; Vlachidis & Tudhope 2016). The ARIADNE and ARIADNEplus European Infrastructure projects confronted multi-lingual issues in seeking to integrate archaeological data and reports written (and indexed by CV) in various partner languages. We developed tools to help map partner CVs to a central ‘mapping hub’, the Getty Art and Architecture Thesaurus (AAT), allowing search across partner data and reports in different languages and also query expansion using the AAT’s hierarchical structure (Binding & Tudhope 2016). We have also provided tools to express English, Scottish, Welsh (including Gaelic and Welsh language) FISH vocabularies as Linked Open Data (HeritageData) facilitating programmatic use. We are currently collaborating with the Archaeology Data Service (ADS) on CV based NLP tools to make automatic indexing suggestions for the OASIS online index of fieldwork events and their unpublished reports, drawing on FISH vocabularies (Monuments, Objects, Periods) employed in OASIS subject indexing.

Reflections from this experience are discussed. These include the potential of mapping between CVs, possible need for an enhanced entry vocabulary (synonyms etc) in CVs when used in NLP and the challenge of compound phrases that combine concepts, possibly meriting a faceted approach. There may be a need to draw on standard CVs from other domains (eg for scientific areas). It is possible to index with multiple CVs. It is important to consider use cases; the indexing requirements of grey literature may differ from academic journal publishing. CVs should be continually maintained and evolve, alert to potential gaps or bias of different kinds.
Original languageEnglish
Title of host publicationSession 320, A controlled vocabulary for archaeology: a necessary requirement for the development of a sustainable research practice into the 21st century
Publication statusPublished - 1 Sept 2023
Event European Association of Archaeologists 29th Annual Meeting - Belfast, United Kingdom
Duration: 30 Aug 20232 Sept 2023


Conference European Association of Archaeologists 29th Annual Meeting
Abbreviated titleEAA 2023
Country/TerritoryUnited Kingdom
Internet address


Dive into the research topics of 'Reflections on experience with archaeological controlled vocabularies in indexing and retrieval'. Together they form a unique fingerprint.

Cite this