ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus

We introduce ÌròyìnSpeech, a new corpus influenced by the desire to increase the amount of high quality, contemporary Yorùbá speech data, which can be used for both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) tasks. We curated about 23 000 text sentences from news and creative writing domains. To encourage a participatory approach to data creation, we provide 5 000 curated sentences to the Mozilla Common Voice platform to crowd-source the recording and validation of Yorùbá speech data. In total, we created about 42 hours of speech data recorded by 80 volunteers in-house, and 6 hours of validated recordings on Mozilla Common Voice platform.
Dataset
The dataset is published in the ELRA catalogue →
- ELRA Resource description page
- 012-405-700-001-6 → Corresponding unique ISLRN number to use in citations, publications
Paper
- The LREC-COLING 2024 paper → arXiv
Speech Synthesis Samples
Samples of speech synthesised with the ÌròyìnSpeech Dataset.
VITS
Samples generated by with our TTS data with the VITS architecture.
Male Voice.
Female Voice.
VITS Continued
Samples generated by with our TTS data with continued pretraining of the Bible TTS model.
Male Voice.
Female Voice.
In this project, we also developed a Speech Recorder App which we have open-sourced to enable easy dataset curation for researchers interested in creating speech datasets.
This project was fundeded by an Imminent Research Grant.
BibTeX entry and citation info
If you make use of our dataset, please cite the our paper.
@misc{ogunremi2024iroyinspeech,
title={\`{I}r\`{o}y\`{i}nSpeech: A multi-purpose Yor\`{u}b\'{a} Speech Corpus},
author={Tolulope Ogunremi and Kola Tubosun and Anuoluwapo Aremu and Iroro Orife and David Ifeoluwa Adelani},
year={2024},
eprint={2307.16071},
archivePrefix={arXiv},
primaryClass={cs.CL}
}