Transforming text into data to extract meaning and make connections.
How can digital techniques be used to create and manage relationships for collections content?
As with almost all data, museum collection catalogues are largely unstructured, variable in consistency and overwhelmingly composed of thin records. The form of these catalogues means that the potential for new forms of research, access and scholarly enquiry that range across multiple collections and related datasets remains dormant.
Furthermore, whilst recent years have seen a growth in the publication online of scholarly research related to heritage collections, this material remains hard to discover as it is rarely linked to from related collection records.
In this project, we will apply a battery of digital techniques to connect similar, identical and related items within and across collections and other publications. The project will explore a range of data analysis approaches that will analyse catalogues, published material and knowledge graphs (primarily Wikidata), and build links at scale between these that can then be used for new forms of research and access.
The project’s research questions are:
- How can existing digital tools and methods be used to build relationships at scale between poorly and inconsistently catalogued digitised collection objects and other content sources?
- Which software tools and approaches, in combination, provide the most effective approaches?
- How might the relationships built using these techniques complement and amplify the benefits of the use of persistent identifiers?
- What gaps and biases emerge when these relationships are created, and which hitherto unexpected connections are made?
- How might confidence in these relationships impact on their usefulness in research and discovery? Can ‘degrees of confidence’ become a useful concept in searching across collections as it is in statistics?
- Is such an approach scalable to larger volumes of content and different types of collections?
- Where is the best use of human input in supporting such an approach? What expertise and skills are required for this input?
- How might these new automatically established relationships help museums to gain greater understanding of how their collections are being and could be used and interpreted?
- What benefits arise from using such tools to link to scholarship such as theses and journal article abstracts/keywords? What new forms of research are afforded?
- What ethical issues are raised by such an approach?
- What are the most effective methods for organising interdisciplinary and cross-sectoral research projects concerned with digital cultural heritage, and how can these methods usefully be shared across the independent research organisations (IRO)?
Heritage Connector seeks to provide researchers with more targeted and relevant information than ever before, and generate new opportunities for research projects.
The project will deliver insights into cataloguing practices that have the potential to increase the depth of materials categorised, expand access to museum records, identify new digital catalogue formats, and cross-reference materials so that more thorough and systematic data analysis can be undertaken.
- To conduct a review of relevant literature and digital tools.
- Document events, findings and work in progress through a project blog.
- To construct in open source software a 'Heritage Connector’ capable of holding a dense web of links between object records and knowledge graphs such as Wikidata for evaluation and experimentation.
- To apply a series of digital tools / computational methods to create speculative identifications between different records within the test dataset. Among the computation methods to be trialled are:
- Named entity recognition to determine which items in the catalogue text map to proper names, such as people or places.
- Named entity linking to recognise and disambiguate named entities to relevant knowledge bases.
- Machine learning to construct classifiers and cluster similar and related content.
- A range of existing reconciliation tools and services to establish relationships and associated confidence levels for links.
- To work on successively larger and varied datasets as the project proceeds in three stages, starting with Science Museum Group Collection records, then adding those of the V&A.
- At every stage to hold focused workshops for groups relevant to each of the project's phase, including a hackathon towards the end of the project.
- To write a final report to the AHRC on our findings.
- To write articles for peer reviewed journals.
- Kalyan Dutia, Research Developer
- Rhiannon Lewis, Project Coordinator and Doctoral research student, School of Advanced Study, University of London
- Richard Palmer, Senior Web Developer, V&A
- John Stack, Digital Director, Science Museum Group
- Jamie Unwin, Technical Architect, Science Museum Group
- Jane Winters, Professor of Digital Humanities & Pro-Dean for Libraries, School of Advanced Study, University of London
- Angela Wolff, Full Stack Developer, V&A
Heritage Connector is funded by the Arts and Humanities Research Council's Towards a National Collection: Opening UK Heritage to the World fund.