Standard

A study of semantic integration across archaeological data and reports in different languages. / Binding, Ceri; Tudhope, Douglas; Vlachidis, Andreas.

In: Journal of Information Science, 31.07.2018.

Research output: Contribution to journalArticle

Harvard

APA

Vancouver

Author

BibTeX

@article{e6c26b8b1dcb4769aeaea2a5d336bc04,
title = "A study of semantic integration across archaeological data and reports in different languages",
abstract = "This study investigates the semantic integration of data extracted from archaeological datasets with information extracted via NLP across different languages. The investigation follows a broad theme relating to wooden objects and their dating via dendrochronological techniques, including types of wooden material, samples taken, wooden objects including shipwrecks. The outcomes are an integrated RDF dataset coupled with an associated interactive research demonstrator query builder application. The semantic framework combines the CIDOC CRM with the Getty Art and Architecture Thesaurus (AAT). The NLP, data cleansing and integration methods are described in detail together with illustrative scenarios from the web application Demonstrator. Reflections and recommendations from the study are discussed. The Demonstrator is a novel SPARQL web application, with CRM/AAT based data integration. Functionality includes the combination of free text and semantic search with browsing on semantic links, hierarchical and associative relationship thesaurus query expansion. Queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over AAT hierarchies of wood types and specialised associative relationships. Following a 'mapping pattern' approach (via the STELETO tool) ensured validity and consistency of all RDF output. The user is shielded from the complexity of the underlying semantic framework by a query builder user interface. The study demonstrates the feasibility of connecting information extracted from datasets and grey literature reports in different languages and semantic cross-searching of the integrated information. The semantic linking of textual reports and datasets opens new possibilities for integrative research across diverse resources.",
keywords = "knowledge organization systems, thesaurus query expansion, semantic interoperability, natural language processing, CIDOC CRM, Art and Architecture Thesaurus, named entity recognition, archaeological grey literature",
author = "Ceri Binding and Douglas Tudhope and Andreas Vlachidis",
year = "2018",
month = "7",
day = "31",
doi = "10.1177/0165551518789874",
language = "English",
journal = "Journal of Information Science",

}

RIS

TY - JOUR

T1 - A study of semantic integration across archaeological data and reports in different languages

AU - Binding,Ceri

AU - Tudhope,Douglas

AU - Vlachidis,Andreas

PY - 2018/7/31

Y1 - 2018/7/31

N2 - This study investigates the semantic integration of data extracted from archaeological datasets with information extracted via NLP across different languages. The investigation follows a broad theme relating to wooden objects and their dating via dendrochronological techniques, including types of wooden material, samples taken, wooden objects including shipwrecks. The outcomes are an integrated RDF dataset coupled with an associated interactive research demonstrator query builder application. The semantic framework combines the CIDOC CRM with the Getty Art and Architecture Thesaurus (AAT). The NLP, data cleansing and integration methods are described in detail together with illustrative scenarios from the web application Demonstrator. Reflections and recommendations from the study are discussed. The Demonstrator is a novel SPARQL web application, with CRM/AAT based data integration. Functionality includes the combination of free text and semantic search with browsing on semantic links, hierarchical and associative relationship thesaurus query expansion. Queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over AAT hierarchies of wood types and specialised associative relationships. Following a 'mapping pattern' approach (via the STELETO tool) ensured validity and consistency of all RDF output. The user is shielded from the complexity of the underlying semantic framework by a query builder user interface. The study demonstrates the feasibility of connecting information extracted from datasets and grey literature reports in different languages and semantic cross-searching of the integrated information. The semantic linking of textual reports and datasets opens new possibilities for integrative research across diverse resources.

AB - This study investigates the semantic integration of data extracted from archaeological datasets with information extracted via NLP across different languages. The investigation follows a broad theme relating to wooden objects and their dating via dendrochronological techniques, including types of wooden material, samples taken, wooden objects including shipwrecks. The outcomes are an integrated RDF dataset coupled with an associated interactive research demonstrator query builder application. The semantic framework combines the CIDOC CRM with the Getty Art and Architecture Thesaurus (AAT). The NLP, data cleansing and integration methods are described in detail together with illustrative scenarios from the web application Demonstrator. Reflections and recommendations from the study are discussed. The Demonstrator is a novel SPARQL web application, with CRM/AAT based data integration. Functionality includes the combination of free text and semantic search with browsing on semantic links, hierarchical and associative relationship thesaurus query expansion. Queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over AAT hierarchies of wood types and specialised associative relationships. Following a 'mapping pattern' approach (via the STELETO tool) ensured validity and consistency of all RDF output. The user is shielded from the complexity of the underlying semantic framework by a query builder user interface. The study demonstrates the feasibility of connecting information extracted from datasets and grey literature reports in different languages and semantic cross-searching of the integrated information. The semantic linking of textual reports and datasets opens new possibilities for integrative research across diverse resources.

KW - knowledge organization systems

KW - thesaurus query expansion

KW - semantic interoperability

KW - natural language processing

KW - CIDOC CRM

KW - Art and Architecture Thesaurus

KW - named entity recognition

KW - archaeological grey literature

U2 - 10.1177/0165551518789874

DO - 10.1177/0165551518789874

M3 - Article

JO - Journal of Information Science

T2 - Journal of Information Science

JF - Journal of Information Science

ER -

ID: 2624852