Automatic Normalization of Temporal Expressions

Ceri Binding, Doug Tudhope*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

63 Downloads (Pure)

Abstract

Dates, periods and timespans are described in archaeological datasets using a number of different textual patterns for which myriad variations exist, rendering direct automated comparison difficult. The issue can occur even within records from the same dataset and is further compounded when attempting to integrate multilingual data - particularly where dates may be expressed in words rather than numbers. The same problem can be found in temporal metadata, whether manually entered or generated via Natural Language Processing (NLP) techniques from reports and grey literature. Resolving and normalizing dates and periods to internationally agreed standard formats enables efficient data integration, interchange, search, comparison and visualization. This paper reports on the design and implementation of a tool to normalize temporal expressions to a numerical time axis and reflects on key issues.

Textual patterns for seven categories of temporal expression have been normalized: Ordinal named or numbered centuries, Year spans; Single year (with tolerance); Decades; Century spans; Single year with prefix; Named periods. The following languages are currently supported: Dutch, English, French, German, Italian, Norwegian, Spanish, Swedish, Welsh. Methods are described together with an (open source) normalization tool developed in Python and four applications of the method are discussed, together with limitations and future work. Results are presented from diverse data sets and languages. The input is a temporal text string and a language code (ISO639-1). The output is a tab delimited text file with start/end years (in ISO 8601 format), relative to Common Era (CE). The normalized outputs are provided as additional attributes along with the original text expression for consuming software to employ in end-user applications.
Original languageEnglish
Pages (from-to)24-39
Number of pages16
JournalJournal of Computer Applications in Archaeology
Volume6
Issue number1
DOIs
Publication statusPublished - 27 Mar 2023

Keywords

  • temporal expressions
  • dating
  • time periods
  • semantic integration
  • software
  • multilingual

Fingerprint

Dive into the research topics of 'Automatic Normalization of Temporal Expressions'. Together they form a unique fingerprint.

Cite this