Anchoring in the African AI ecosystem
Crucial to the WAXAL project was our commitment to working with, and contributing directly to, the African AI ecosystem. The data collection effort was led entirely by African academic and community organizations, guided by Google experts on world-class data collection practices. This collaborative approach ensured the corpus was built by and for the community it serves; with shared methodology each partner focused on a specific subset of languages. Our partners included Makerere University, which collected ASR and/or TTS data for nine different languages, and the University of Ghana, which focused its efforts on eight languages, using the ASR image-prompted data collection methodology outlined above. Additional key collaborators were Digital Umuganda, in partnership with Addis Ababa University, who were instrumental in leading the ASR collection for several regional languages. For the high-quality, studio-recorded voices, Media Trust, Loud n Clear and African Institute for Mathematical Sciences Senegal spearheaded the TTS recordings across various regional languages.
This framework is fundamentally rooted in the principle that our partners retain ownership of the collected data toward the shared commitment to make all datasets openly available for the broader community. This deep collaboration and open-access philosophy have already enabled notable derivative research and publications.
- Through this framework, our partners have already enabled new research, such as the development of a cookbook for community-driven collection of impaired speech . This research resulted in the first open-source dataset for Akan speakers with conditions like cerebral palsy and stammering, and demonstrated that in-person, image-prompted elicitation is more effective than text-based prompts for these populations. This work provides a vital roadmap for developing inclusive speech technologies in low-resource environments.
- Furthermore, the initiative supported a major study that introduced a 5,000-hour speech corpus for five Ghanaian languages — Akan, Ewe, Dagbani, Dagaare, and Ikposo. This work established infrastructure for building robust ASR and TTS systems tailored to the linguistic diversity of West Africa by using a controlled crowdsourcing approach to capture natural, spontaneous intonations.
- Other essential research has focused on benchmarking four state-of-the-art models (Whisper, XLS-R, MMS, and W2v-BERT) across 13 African languages. This study analyzed how performance scales with increased training data, offering key insights into data efficiency and highlighting that scaling benefits are strongly dependent on linguistic complexity and domain alignment.
- Finally, a systematic literature review was published, cataloging 74 datasets across 111 African languages to map the current frontier of speech technology. This review emphasized the urgent need for multi-domain conversational corpora and the adoption of linguistically informed metrics, such as Character Error Rate (CER), to better evaluate performance in morphologically rich and tonal language contexts.

