In November and December 2023, the CENL “AI in Libraries” Network Group will host three webinars on various uses of Artificial Intelligence (AI) in national libraries.
The online events last approximately 45 minutes each.
For more information please see below and/or contact Jean-Philippe Moreux from the National Library of France, the chair of the group at firstname.lastname@example.org.
Yves Maurer (NLL)
Interacting with chatbots has become familiar to many of us recently and their usage in a library context seems promising. This project shows how the national library of Luxembourg has indexed its whole digitized newspaper collection in a vector database for semantic search and uses that to give context information to chatGPT when it tries to answer the user’s questions. As the multilingual aspect is a very prominent part of the collection, with articles in French, German, Luxembourgish, English and other languages, the queries are translated so that the semantic search works across language barriers. The presentation will cover the basic technology, the integration and demo the live system.
Wednesday 8 November, 14:00 CEST
Maximilian Kähler (DNB)
With an average annual publication volume of 1.5 Million Online Publications collected by the German National Library (DNB), automatic subject indexing has long paved its way into production processes at DNB. However, given the extreme size and heterogeneity of the Integrated Authority File (GND) as our target vocabulary, automatic subject indexing still remains a deeply challenging problem. In a three year project the DNB is now investigating how latest progress and innovation in the fields of natural language processing and machine learning can help to improve the quality of automatic indexing at DNB. In this talk we will introduce how Extreme Multi Label Learning is the right mind-set for automatic subject indexing at scale and the various angles that we approach to improve quality of our subject indexing results.
Thursday 30 November, 11:00 CEST
Bjarne Andersen (Royal Danish Library)
At the Royal Danish Library we have a very large radio/TV archive with millions of hours and millions of programs from a broad range of radio and television channels. This archive consists of many many duplicates because most of the ingest is made from 24/7 recordings of a number of channels. For our users this gives quite some noise when searching for specific programs in our online platform.
We have been experimenting with different algorithms using both metadata and audio data to try to identify these duplicates. Specifically we have been experimenting with doc2vec, chromaprint and decisionTree algorithms.
Monday 11 December, 15:00 CEST