A large-scale open resource for African language speech technology

Anchoring in the African AI ecosystem

Crucial to the WAXAL project was our commitment to working with, and contributing directly to, the African AI ecosystem. The data collection effort was led entirely by African academic and community organizations, guided by Google experts on world-class data collection practices. This collaborative approach ensured the corpus was built by and for the community it serves; with shared methodology each partner focused on a specific subset of languages. Our partners included Makerere University, which collected ASR and/or TTS data for nine different languages, and the University of Ghana, which focused its efforts on eight languages, using the ASR image-prompted data collection methodology outlined above. Additional key collaborators were Digital Umuganda, in partnership with Addis Ababa University, who were instrumental in leading the ASR collection for several regional languages. For the high-quality, studio-recorded voices, Media Trust, Loud n Clear and African Institute for Mathematical Sciences Senegal spearheaded the TTS recordings across various regional languages.

This framework is fundamentally rooted in the principle that our partners retain ownership of the collected data toward the shared commitment to make all datasets openly available for the broader community. This deep collaboration and open-access philosophy have already enabled notable derivative research and publications.

Through this framework, our partners have already enabled new research, such as the development of a cookbook for community-driven collection of impaired speech . This research resulted in the first open-source dataset for Akan speakers with conditions like cerebral palsy and stammering, and demonstrated that in-person, image-prompted elicitation is more effective than text-based prompts for these populations. This work provides a vital roadmap for developing inclusive speech technologies in low-resource environments.
Furthermore, the initiative supported a major study that introduced a 5,000-hour speech corpus for five Ghanaian languages — Akan, Ewe, Dagbani, Dagaare, and Ikposo. This work established infrastructure for building robust ASR and TTS systems tailored to the linguistic diversity of West Africa by using a controlled crowdsourcing approach to capture natural, spontaneous intonations.
Other essential research has focused on benchmarking four state-of-the-art models (Whisper, XLS-R, MMS, and W2v-BERT) across 13 African languages. This study analyzed how performance scales with increased training data, offering key insights into data efficiency and highlighting that scaling benefits are strongly dependent on linguistic complexity and domain alignment.
Finally, a systematic literature review was published, cataloging 74 datasets across 111 African languages to map the current frontier of speech technology. This review emphasized the urgent need for multi-domain conversational corpora and the adoption of linguistically informed metrics, such as Character Error Rate (CER), to better evaluate performance in morphologically rich and tonal language contexts.

What's Hot

Android 17 Beta 3 is here with the floating window feature Google promised

I’m saying it: The Galaxy S26 is a better ‘Pixel’ than the Pixel 10

I tested UGREEN’s 17-in-1 Maxidok, and it is the best Thunderbolt 5 dock around — it even gets an unbelievable launch discount

Meta Releases TRIBE v2: A Brain Encoding Model That Predicts fMRI Responses Across Video, Audio, and Text Stimuli

Accelerating LLM fine-tuning with unstructured data using SageMaker Unified Studio and S3

A Coding Implementation to Run Qwen3.5 Reasoning Models Distilled with Claude-Style Thinking Using GGUF and 4-Bit Quantization

Police Used Flock to Give a Man a Traffic Ticket

Run Generative AI inference with Amazon Bedrock in Asia Pacific (New Zealand)

Apple Gives FBI a User’s Real Name Hidden Behind ’Hide My Email’ Feature

Android 17 Beta 3 is here with the floating window feature Google promised

I’m saying it: The Galaxy S26 is a better ‘Pixel’ than the Pixel 10

I tested UGREEN’s 17-in-1 Maxidok, and it is the best Thunderbolt 5 dock around — it even gets an unbelievable launch discount

Android 17 Beta 3 is here with the floating window feature Google promised

I’m saying it: The Galaxy S26 is a better ‘Pixel’ than the Pixel 10

I tested UGREEN’s 17-in-1 Maxidok, and it is the best Thunderbolt 5 dock around — it even gets an unbelievable launch discount

Usefull link

categories

What's Hot

A large-scale open resource for African language speech technology

Anchoring in the African AI ecosystem

Related Posts

Usefull link

categories