The question of how to integrate diverse digital repositories into a unified information infrastructure, accessible and discoverable through simple interfaces, remains a central research issue for digital libraries. Many collections are described by specialized metadata, which currently has to be mapped and crosswalked to a standard format in order to be useful. However, this metadata work can be expensive and resource consuming. We describe work-in-progress with DISTIL (Document Indexing & Semantic Tagging Interface for Libraries) to support federated cross-collection search in humanities and the social sciences. DISTIL proposes to support interoperability by generating Dewey Decimal Classification ‘tags’ from individual metadata records. The resulting tags can then be used to support cross-collection browsing. We focus here on some of the initial pre-processing stages of the metadata workflow, which include cleaning and formatting metadata records, in order to extract terms that can then be used to generate the DDC tags. Some initial strategies for and issues with this workflow are described.
Original languageEnglish
Title of host publicationTheory and Practice of Digital Libraries
Subtitle of host publicationSecond International Conference, TPDL 2012, Paphos, Cyprus, September 23-27, 2012. Proceedings
EditorsPanayiotis Zaphiris, George Buchanan, Edie Rasmussen, Fernando Loizides
PublisherSpringer Verlag
Number of pages6
ISBN (Electronic)978-3-642-33290-6
ISBN (Print)978-3-642-33289-0
StatePublished - 2012

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)0302-9743

    Research areas

  • Dewey Decimal Classification, digital humanities, interoperability, Metadata, social sciences, tagging

ID: 475099