[ HQ: MARRAKECH | STATUS: INFRASTRUCTURE BUILD ]

Building the Cultural Intelligence of the Global South.

We are not just teaching machines to speak African languages. We are teaching them to understand humanity through culture, context, and cognition. One dataset. One dialect. One voice at a time.

Explore models →Our story

50+

Languages

4.7B

Verified tokens

Language families

Initializing Geodata

CORPUS STATUS

validated2,847,403

languages50+

consentverified

ACTIVE MODELS

ASR WER6.2%

AfriLION ppl18.4

TTS MOS4.6/5

REGION COVERAGE

W.Africa22 langs

E.Africa14 langs

MENA11 langs

5-Year Vision

A calculated, inevitable deployment.

MODULE: LUGHATNA STT/TTS

2025 – 2026

Phase 1: The Core Engine

Validating the foundation. Building the world's most accurate NLP engine for 15 initial African and Arabic languages — trained on community-verified, consent-anchored data.

~/localenlp/phase1/core

> languages_targeted: 15

> training_data: 4.7B tokens

> consent_chain: verified

> wer_target: <7%

> status: INGESTING

IN BUILD

PHASE DATA

statusin build

year2025– 2026

phase1 / 5

SYSTEM

languages15

edge_modeoff

consentverified

PHASE 1 / 5 — 2025 – 2026

20252030

Physics of Power

The cost of intelligence.

40%

AI Energy Reduction vs. Cloud-first LLMs

Standard Cloud LLM

Massive cloud compute. Continuous retraining. Billions of API calls routing through energy-intensive data centers.

LocaleNLP Edge Architecture

Offline-first deployment. INT4-quantized models run fully on-device. Sustainable infrastructure that scales without burning the planet.

Founder's Statement

The sign-off.

~/vision/README.md

# LOCALENLP — VISION DOCUMENT

## The Problem We Are Solving

Every six months, a new foundation model drops with 100+ supported languages. Every six months, the same 30 languages are on the list. Swahili gets included if you are lucky. Hausa, Amharic, Wolof — invisible. Arabic gets a slot but the Gulf dialect is treated as the default for 450 million speakers who don't talk that way.

This is not a resource problem. The data exists. The communities exist. The speakers exist. What does not exist is an infrastructure company willing to build the collection, curation, and deployment stack with the rigor these languages deserve.

## The Mandate

LocaleNLP is not a research project. It is not an NGO. It is infrastructure. The same infrastructure that English, Mandarin, and Spanish have had for decades — built from scratch, from community ground truth, for languages that carry the cognitive weight of entire civilizations.

We will build it methodically. Phase by phase. Language by language. Until the default state of AI is one that understands everyone.

> EXECUTE VISION: Alieu Jagne

> ROLE: Founder & CEO, LocaleNLP

> LOC: Marrakech, MA

> END_OF_FILE

SEALED

Architecture

The LocaleNLP infrastructure tree.

process tree / infrastructure

LocaleNLP, Inc.

├── ORAOX — community data validation platform

├── ingest utterances (audio, text, code-switch)

├── validate & flag via contributor leaderboard

└── export certified training corpora → AfriLION

├── AfriLION Models — foundational ML stack

├── ASR / TTS / Translation / LLM

├── INT4-quantized edge export (ONNX)

└── Cultural Intelligence (CQ) API [research]

├── Lughatna Engine — offline inference runtime

├── ARM / Snapdragon / MediaTek targets

└── Tecno · Infinix · custom ODM partnerships

└── Enterprise APIs — B2B / B2G layer

├── Healthcare · Agriculture · Education

└── e-Governance · VoicePrint Identity [planned]

ORAOX: data fuel

AfriLION: model core

Lughatna: edge runtime

People

Built by linguists and engineers from the communities we serve

Alieu Jagne

Founder & CEO

Marrakech, Morocco

Language Infrastructure · Product Strategy · Fundraising

Mouhamedou Golomanta

CTO

West Africa

Systems Architecture · Edge ML · Model Training

Bouchra El Mhizli

Product Manager / AI Engineer

Morocco

Product Development · Applied ML · API Design

Justina Catherine

Product Designer

West Africa

UX/UI Design · Language UX · Design Systems

Values

The principles we don't negotiate

◉

Cultural Authenticity

We do not impose linguistic frameworks from high-resource languages. Every model is built from in-language data collected by native speakers, not translated or synthetic approximations.

◈

Engineering Rigor

Shipping a model is an act of trust. We benchmark against native-speaker evaluation panels — not just automated metrics — and publish our methodology in full.

◑

Radical Inclusion

A language technology stack that excludes 2.6B people is not infrastructure — it is gatekeeping. Every architectural decision is evaluated against its impact on the least-represented communities first.

◰

Open Research

We publish weights, datasets, and evaluation benchmarks because language infrastructure should not be owned by any single entity. Our research is peer-reviewed before it ships as product.

Trust

Institutional commitments

12 universities

Research partnerships

MIT, University of Lagos, Addis Ababa U., AIMSS Cairo, and 8 others

8 countries

Active pilot programs

Healthcare, agriculture, and education verticals across SSA and MENA

IRB-compliant

Data ethics standard

Every collection protocol reviewed by independent ethics board. Full deletion enforcement.

4 public weights

Open model releases

Apache 2.0 licensed. Reproducible training runs. Published evaluation corpora.

Join the team →