This paper discusses the automatic generation of rich metadata for semantic search of reports of archaeological excavations. An extension of the CIDOC CRM for the archaeological domain acts as a core ontology. This enables cross search between diverse excavation datasets and ‘grey literature’ excavation reports originating from the Archaeological Data Service OASIS library. Rich metadata is
automatically extracted from the reports, directed by the CRM, via a three phase process of semantic enrichment employing the GATE toolkit. This is expressed as XML annotations coupled with the reports and also as RDF metadata, both represented as CRM entities, qualified by SKOS archaeological concepts. A web portal delivers the annotated XML files for visual inspection while the STAR
research demonstrator offers unified search of excavation data and grey literature in terms of the conceptual structure. Initial evaluation results show operational precision and recall rates for three different semantic expansion configurations of the system
Original languageEnglish
Title of host publicationComputational Linguistics
EditorsAdam Przepiórkowski, Maciej Piasecki, Krzysztof Jassem , Piotr Fuglewicz
PublisherSpringer Verlag
Number of pages16
ISBN (Electronic)978-3-642-34399-5
ISBN (Print)978-3-642-34398-8
StatePublished - 2013

Publication series

NameStudies in Computational Intelligence
ISSN (Print)1860-949X

    Research areas

  • Automatic Metadata Generation, CIDOC CRM, Digital Archaeology, Digital Library, GATE, Knowledge Organization Systems, Information Extraction, Semantic Annotation, Semantic Search, SKOS

ID: 475914