Crynodeb
In the STAR project investigating semantic integration, we employed thesauri from the Forum on Information Standards in Heritage (FISH) and word lists from Historic England recording manuals. Semantic integration allowed search across both archaeological datasets and grey literature reports via data extraction and NLP (Tudhope et al. 2011; Vlachidis & Tudhope 2016). The ARIADNE and ARIADNEplus European Infrastructure projects confronted multi-lingual issues in seeking to integrate archaeological data and reports written (and indexed by CV) in various partner languages. We developed tools to help map partner CVs to a central ‘mapping hub’, the Getty Art and Architecture Thesaurus (AAT), allowing search across partner data and reports in different languages and also query expansion using the AAT’s hierarchical structure (Binding & Tudhope 2016). We have also provided tools to express English, Scottish, Welsh (including Gaelic and Welsh language) FISH vocabularies as Linked Open Data (HeritageData) facilitating programmatic use. We are currently collaborating with the Archaeology Data Service (ADS) on CV based NLP tools to make automatic indexing suggestions for the OASIS online index of fieldwork events and their unpublished reports, drawing on FISH vocabularies (Monuments, Objects, Periods) employed in OASIS subject indexing.
Reflections from this experience are discussed. These include the potential of mapping between CVs, possible need for an enhanced entry vocabulary (synonyms etc) in CVs when used in NLP and the challenge of compound phrases that combine concepts, possibly meriting a faceted approach. There may be a need to draw on standard CVs from other domains (eg for scientific areas). It is possible to index with multiple CVs. It is important to consider use cases; the indexing requirements of grey literature may differ from academic journal publishing. CVs should be continually maintained and evolve, alert to potential gaps or bias of different kinds.
Reflections from this experience are discussed. These include the potential of mapping between CVs, possible need for an enhanced entry vocabulary (synonyms etc) in CVs when used in NLP and the challenge of compound phrases that combine concepts, possibly meriting a faceted approach. There may be a need to draw on standard CVs from other domains (eg for scientific areas). It is possible to index with multiple CVs. It is important to consider use cases; the indexing requirements of grey literature may differ from academic journal publishing. CVs should be continually maintained and evolve, alert to potential gaps or bias of different kinds.
Iaith wreiddiol | Saesneg |
---|---|
Teitl | Session 320, A controlled vocabulary for archaeology: a necessary requirement for the development of a sustainable research practice into the 21st century |
Statws | Cyhoeddwyd - 1 Medi 2023 |
Digwyddiad | European Association of Archaeologists 29th Annual Meeting - Belfast, Y Deyrnas Unedig Hyd: 30 Awst 2023 → 2 Medi 2023 https://www.e-a-a.org/EAA2023/Home/EAA2023/Home.aspx?hkey=c376135d-4d51-4a35-ae41-569699c7e496 |
Cynhadledd
Cynhadledd | European Association of Archaeologists 29th Annual Meeting |
---|---|
Teitl cryno | EAA 2023 |
Gwlad/Tiriogaeth | Y Deyrnas Unedig |
Dinas | Belfast |
Cyfnod | 30/08/23 → 2/09/23 |
Cyfeiriad rhyngrwyd |