Dataset / Yorùbá / Automatic Speech Recognition / Speech Synthesis

ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus

May 28, 20212 min read
Screenshot of Yoruba Common Voice site
Photo of Yoruba Common Voice site

We introduce ÌròyìnSpeech, a new corpus influenced by the desire to increase the amount of high quality, contemporary Yorùbá speech data, which can be used for both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) tasks. We curated about 23 000 text sentences from news and creative writing domains. To encourage a participatory approach to data creation, we provide 5 000 curated sentences to the Mozilla Common Voice platform to crowd-source the recording and validation of Yorùbá speech data. In total, we created about 42 hours of speech data recorded by 80 volunteers in-house, and 6 hours of validated recordings on Mozilla Common Voice platform.

Dataset

The dataset is published in the ELRA catalogue →

Paper

Speech Synthesis Samples

Samples of speech synthesised with the ÌròyìnSpeech Dataset.

VITS

Samples generated by with our TTS data with the VITS architecture.

Male Voice.

Female Voice.

VITS Continued

Samples generated by with our TTS data with continued pretraining of the Bible TTS model.

Male Voice.

Female Voice.

In this project, we also developed a Speech Recorder App which we have open-sourced to enable easy dataset curation for researchers interested in creating speech datasets.

This project was fundeded by an Imminent Research Grant.

BibTeX entry and citation info

If you make use of our dataset, please cite the our paper.

@misc{ogunremi2024iroyinspeech,
      title={\`{I}r\`{o}y\`{i}nSpeech: A multi-purpose Yor\`{u}b\'{a} Speech Corpus}, 
      author={Tolulope Ogunremi and Kola Tubosun and Anuoluwapo Aremu and Iroro Orife and David Ifeoluwa Adelani},
      year={2024},
      eprint={2307.16071},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
YorùbáNewsÌròyìnSpeech