Negation detection and word sense disambiguation in digital archaeology reports for the purposes of semantic annotation

Andreas Vlachidis*, Douglas Tudhope

*Awdur cyfatebol y gwaith hwn

Allbwn ymchwil: Cyfraniad at gyfnodolynErthygladolygiad gan gymheiriaid

Crynodeb

Purpose - The purpose of this paper is to present the role and contribution of natural language processing techniques, in particular negation detection and word sense disambiguation in the process of Semantic Annotation of Archaeological Grey Literature. Archaeological reports contain a great deal of information that conveys facts and findings in different ways. This kind of information is highly relevant to the research and analysis of archaeological evidence but at the same time can be a hindrance for the accurate indexing of documents with respect to positive assertions.

Design/methodology/approach - The paper presents a method for adapting the biomedicine oriented negation algorithm NegEx to the context of archaeology and discusses the evaluation results of the new modified negation detection module. A particular form of polysemy, which is inflicted by the definition of ontology classes and concerning the semantics of small finds in archaeology, is addressed by a domain specific word-sense disambiguation module.

Findings - The performance of the negation dection module is compared against a "Gold Standard" that consists of 300 manually annotated pages of archaeological excavation and evaluation reports. The evaluation results are encouraging, delivering overall 89 per cent precision, 80 per cent recall and 83 per cent F-measure scores. The paper addresses limitations and future improvements of the current work and highlights the need for ontological modelling to accommodate negative assertions.

Originality/value - The discussed NLP modules contribute to the aims of the OPTIMA pipeline delivering an innovative application of such methods in the context of archaeological reports for the semantic annotation of archaeological grey literature with respect to the CIDOC-CRM ontology.

Iaith wreiddiolSaesneg
Tudalennau (o-i)118-134
Nifer y tudalennau17
CyfnodolynProgram-Electronic library and information systems
Cyfrol49
Rhif cyhoeddi2
Dynodwyr Gwrthrych Digidol (DOIs)
StatwsCyhoeddwyd - 7 Ebrill 2015

Ôl bys

Gweld gwybodaeth am bynciau ymchwil 'Negation detection and word sense disambiguation in digital archaeology reports for the purposes of semantic annotation'. Gyda’i gilydd, maen nhw’n ffurfio ôl bys unigryw.

Dyfynnu hyn