Description
Temporal expressions occur in archaeological datasets, publications and grey literature reports in different formats, each with a wide variety of patterns. Temporal expressions can occur as ordinal named or numbered centuries, years and year spans, decades/centuries and spans and also named periods (which can vary according to place). The issue is compounded in multilingual contexts. This gives rise to major difficulties for automatic analysis and comparison of archaeological outputs and even within the same dataset or report. It also presents problems for automatic analysis of temporal metadata. This presentation reports on the design, implementation and use of two open-source software tools to resolve and normalise temporal expressions to standard formats.One Python tool normalises textual temporal expressions in different languages to a numerical time axis. Textual patterns for seven categories of temporal expression are normalized: Ordinal named or numbered centuries, Year spans; Single year (with tolerance); Decades; Century spans; Single year with prefix; Named periods. The following languages are currently supported: Dutch, English, French, German, Italian, Norwegian, Spanish, Swedish, Welsh. The input is a temporal text string and a language code (ISO639-1). The output is a tab delimited text file with start/end years (in ISO 8601 format), relative to Common Era (CE). The normalized outputs are provided as additional attributes along with the original text expression for consuming software to employ in end-user applications. This tool has been applied in various applications, including a selection of datasets from ADS Archive, the ARIADNE project multilingual demonstrator and ARIADNEplus data aggregation [1].
The other tool is an automatic, vocabulary-based, subject indexing (hybrid Named Entity Recognition) recommendation system that includes temporal expressions as a major element. The tool has been applied to the OASIS online index of fieldwork events and their unpublished reports [2] and is currently being applied in the ATRIUM research infrastructure project [3]. The automatic indexing tool includes a specialised temporal annotator and a (SKOS) vocabulary annotator (currently using the FISH Archaeological Object Thesaurus and the FISH Thesaurus of Monument Types). In addition to numerical date and century expressions, archaeological named periods are taken from PeriodO. Currently the Historic England ‘Archaeological and Cultural Periods’ PeriodO authority is used but other PeriodO authorities are possible. The system is implemented using the spaCy open-source Natural Language processing (NLP) library. A suite of spaCy ‘patterns’ has been developed as Python modules together with a series of specialised NER pipeline components to identify and tag various types of temporal entity within passages of free text. Output includes the suggested metadata, lists of span entities representing the vocabulary and temporal concepts identified (including their corresponding positions within the input text), together with aggregated counts of the identified spans. Bulk processing scripts and Python notebooks are included to demonstrate usage and to highlight aspects of the available functionality.
Resolving dates and periods to standard formats potentially affords data integration, interchange, search, comparison and visualization. Open-source tools can offer a step in this direction.
References
[1]Binding, C, & Tudhope D. 2023. ‘Automatic Normalization of Temporal Expressions’. Journal of Computer Applications in Archaeology, 6(1), 24-39. https://doi.org/10.5334/jcaa.105
[2]Binding, C & Tudhope, D 2024, 'KOS-based enrichment of archaeological fieldwork reports', Knowledge Organization, vol. 51, no. 5, pp. 292 - 299. https://doi.org/10.5771/0943-7444-2024-5-292Yyy
[3]ATRIUM Project. https://atrium-research.eu/
Period | 6 May 2025 |
---|---|
Event title | Computer Applications and Quantitative Methods in Archaeology (CAA) 2025: Digital Horizons: Embracing heritage in an evolving world |
Event type | Conference |
Conference number | 52 |
Location | Athens, GreeceShow on map |
Degree of Recognition | International |
Documents & Links
- Two open-source tools for archaeological temporal expressions
File: application/pdf, 511 KB
Type: Other