The community that trains the models.
ORAOX is the data engine behind LocaleNLP. A gamified platform where native speakers across Africa and the Arab world ingest, validate, and certify language data — earning XP, climbing leaderboards, and building the training corpora that power AfriLION models.
Three steps. Infinite data.
Contributors upload audio clips, text snippets, or code-switched utterances in their native language. ORAOX accepts voice recordings from any device — smartphone, browser microphone, or file upload. Each submission is timestamped, geotagged by region (not GPS), and language-labelled by the contributor.
- ›Audio, text, and code-switch submissions
- ›Mobile-first upload — no app required
- ›Contributor self-labels language & dialect
- ›Automatic quality pre-screening (SNR, duration)
Each submission enters a validation queue where other contributors — fluent in the same language — review it: approve if the transcription is correct, flag if the audio is noisy or wrong, or suggest a correction. Three independent approvals are required before a clip is certified for training.
- ›3-approval threshold before certification
- ›Flag → review → reject pipeline
- ›Expert linguist spot-check layer (1 in 50)
- ›Full consent and deletion rights for submitters
Contributors earn XP for every approved submission and validation action. Accurate validators who consistently agree with the expert layer earn bonus multipliers. The leaderboard resets monthly — keeping competition fresh and preventing farming. Top contributors are credited in dataset releases.
- ›XP for submissions, validations, and streaks
- ›Expert-alignment bonus multiplier
- ›Monthly leaderboard resets
- ›Named credit in CC-BY-SA dataset releases
The contributors building the stack.
Rankings reset monthly. Top contributors are named in dataset release notes.
ORAOX is in invite-only beta. Priority access for native speakers of Hausa, Wolof, Darija, Amharic, Swahili, Yoruba, Igbo, and Arabic dialect variants.
Request early access →Speak a low-resource language?
Your voice is infrastructure. Every clip you validate closes the gap between your language and the AI models that will serve your community for decades.