The paper discusses the application of Natural Language Processing (NLP) techniques in the context of classical art text, for the aims of semantic annotation via rule-based Information Extraction (IE) techniques combined with ontological and domain vocabulary input. The CASIE (Classical Art Semantics Information Extraction) is a pilot collaborative project between the Hypermedia Research Unit (University of South Wales) and the Beazley Archive (Oxford University), which aims to automatically extract information about cultural objects from classical art scholarly texts and represent this information in terms of the ISO metadata standard for cultural heritage, the International Council of Museum’s CIDOC Conceptual Reference Model (CRM). In total 12 documents (fascicules – high quality catalogues) were processed, originating from the Corpus Vasorum Antiquorum (CVA) collection containing over 350 high quality catalogues of mostly ancient Greek painted pottery, illustrating more than 100,000 vases. The extracted information was expressed in interoperable RDF graphs consistent with the CLAROS project format. The role of CIDOC-CRM is central for enabling semantic interoperability across the range of datasets that contribute to CLAROS. The CASIE pilot enabled a complementary exploitation of terminological and ontological resources via rule-based information extraction techniques, delivering semantic annotation with respect to the CRM in the broader field of digital humanities.
|Title of host publication
|Conference of the British Chapter of the International Society for Knowledge Organization, London, UK, 8-9 July 2013.
|Number of pages
|Published - 8 Jul 2013
|3rd ISKO UK biennial conference : Knowledge Organization - Pushing the Boundaries - University College London , London, United Kingdom
Duration: 7 Jul 2013 → 8 Jul 2013
|3rd ISKO UK biennial conference
|7/07/13 → 8/07/13