[ HQ: MARRAKECH | STATUS: INFRASTRUCTURE BUILD ]

Building the Cultural Intelligence of the Global South.

We are not just teaching machines to speak African languages. We are teaching them to understand humanity through culture, context, and cognition. One dataset. One dialect. One voice at a time.

50+
Languages
4.7B
Verified tokens
38
Language families
Initializing Geodata
CORPUS STATUS
validated2,847,403
languages50+
consentverified
ACTIVE MODELS
ASR WER6.2%
AfriLION ppl18.4
TTS MOS4.6/5
REGION COVERAGE
W.Africa22 langs
E.Africa14 langs
MENA11 langs
5-Year Vision

A calculated, inevitable deployment.

MODULE: LUGHATNA STT/TTS
2025 – 2026

Phase 1: The Core Engine

Validating the foundation. Building the world's most accurate NLP engine for 15 initial African and Arabic languages — trained on community-verified, consent-anchored data.

~/localenlp/phase1/core
> languages_targeted: 15
> training_data: 4.7B tokens
> consent_chain: verified
> wer_target: <7%
> status: INGESTING
IN BUILD
PHASE DATA
statusin build
year2025– 2026
phase1 / 5
SYSTEM
languages15
edge_modeoff
consentverified
HAUSWAAMHYORARAIGBTWIWOLSOMORMFULLUGZULTIGDARAfriLIONCORE
PHASE 1 / 5 — 2025 – 2026
20252030
Physics of Power

The cost of intelligence.

40%
AI Energy Reduction vs. Cloud-first LLMs
Standard Cloud LLM
Massive cloud compute. Continuous retraining. Billions of API calls routing through energy-intensive data centers.
LocaleNLP Edge Architecture
Offline-first deployment. INT4-quantized models run fully on-device. Sustainable infrastructure that scales without burning the planet.
Founder's Statement

The sign-off.

~/vision/README.md
# LOCALENLP — VISION DOCUMENT
## The Problem We Are Solving

Every six months, a new foundation model drops with 100+ supported languages. Every six months, the same 30 languages are on the list. Swahili gets included if you are lucky. Hausa, Amharic, Wolof — invisible. Arabic gets a slot but the Gulf dialect is treated as the default for 450 million speakers who don't talk that way.

This is not a resource problem. The data exists. The communities exist. The speakers exist. What does not exist is an infrastructure company willing to build the collection, curation, and deployment stack with the rigor these languages deserve.

## The Mandate

LocaleNLP is not a research project. It is not an NGO. It is infrastructure. The same infrastructure that English, Mandarin, and Spanish have had for decades — built from scratch, from community ground truth, for languages that carry the cognitive weight of entire civilizations.

We will build it methodically. Phase by phase. Language by language. Until the default state of AI is one that understands everyone.

> EXECUTE VISION: Alieu Jagne
> ROLE: Founder & CEO, LocaleNLP
> LOC: Marrakech, MA
> END_OF_FILE
SEALED
Architecture

The LocaleNLP infrastructure tree.

process tree / infrastructure
LocaleNLP, Inc.
├── ORAOX — community data validation platform
├── ingest utterances (audio, text, code-switch)
├── validate & flag via contributor leaderboard
└── export certified training corpora → AfriLION
├── AfriLION Models — foundational ML stack
├── ASR / TTS / Translation / LLM
├── INT4-quantized edge export (ONNX)
└── Cultural Intelligence (CQ) API [research]
├── Lughatna Engine — offline inference runtime
├── ARM / Snapdragon / MediaTek targets
└── Tecno · Infinix · custom ODM partnerships
└── Enterprise APIs — B2B / B2G layer
├── Healthcare · Agriculture · Education
└── e-Governance · VoicePrint Identity [planned]
ORAOX: data fuel
AfriLION: model core
Lughatna: edge runtime
People

Built by linguists and engineers from the communities we serve

AJ
Alieu Jagne
Founder & CEO
Marrakech, Morocco
Language Infrastructure · Product Strategy · Fundraising
MG
Mouhamedou Golomanta
CTO
West Africa
Systems Architecture · Edge ML · Model Training
BE
Bouchra El Mhizli
Product Manager / AI Engineer
Morocco
Product Development · Applied ML · API Design
JC
Justina Catherine
Product Designer
West Africa
UX/UI Design · Language UX · Design Systems
Values

The principles we don't negotiate

Cultural Authenticity

We do not impose linguistic frameworks from high-resource languages. Every model is built from in-language data collected by native speakers, not translated or synthetic approximations.

Engineering Rigor

Shipping a model is an act of trust. We benchmark against native-speaker evaluation panels — not just automated metrics — and publish our methodology in full.

Radical Inclusion

A language technology stack that excludes 2.6B people is not infrastructure — it is gatekeeping. Every architectural decision is evaluated against its impact on the least-represented communities first.

Open Research

We publish weights, datasets, and evaluation benchmarks because language infrastructure should not be owned by any single entity. Our research is peer-reviewed before it ships as product.

Trust

Institutional commitments

12 universities
Research partnerships
MIT, University of Lagos, Addis Ababa U., AIMSS Cairo, and 8 others
8 countries
Active pilot programs
Healthcare, agriculture, and education verticals across SSA and MENA
IRB-compliant
Data ethics standard
Every collection protocol reviewed by independent ethics board. Full deletion enforcement.
4 public weights
Open model releases
Apache 2.0 licensed. Reproducible training runs. Published evaluation corpora.