Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

WIN Wednesday Works In Progress

LibriBrain100

Preseted by Oiwi Parker Jones

Abstract: The aim of this study is to acquire a "large" and "naturalistic" dataset of speech processing for the development of neural decoding methods. By "large" and "naturalistic", the dataset will include over 100 hours of MEG recordings acquired while subjects listen to audiobooks. This will build on the LibriBrain dataset (Özdogan et al. 2025) which we published at NeurIPS and has formed the basis for the 2025 PNPL Competition (Landau et al. 2025) - also published at NeurIPS. Both datasets significantly advance recent work in large and naturalistic speech datasets (e.g. Armeni et al. 2022; Gwilliams et al. 2023; d'Ascoli et al. 2024). There are two main differences between LibriBrain and LibriBrain100. First, LibriBrain100 will double the amount of data (from 50 hours to 100 hours). Second, whereas LibriBrain focused on so-called "deep" (i.e. large-scale within subject) data, LibriBrain100 aims both for "depth" and "bredth" - extending the number of subjects significantly (64 subjects), allowing the investigation of individual variation in speech processing and between-subject transfer learning. The plan is for these data to form the basis of another open machine learning competition (cf. https://libribrain.com/).