Class 7 · CBSE AI · Strand C — NLP, Vision, and LLMs Deep-Dive
Indian language NLP — the real challenges for AI
22 scheduled languages, many scripts, little data: why Indian-language AI is genuinely hard. For Class 7.
Class 7 · CBSE AI · Strand C — NLP, Vision, and LLMs Deep-Dive
22 scheduled languages, many scripts, little data: why Indian-language AI is genuinely hard. For Class 7.
Teaching a child to read
Imagine teaching a child to read Tamil without ever giving them a Tamil textbook — only English books. They might learn to recognise a few Tamil letters from billboards, but they'd struggle with grammar, complex sentences, and meaning. NLP models for Indian languages are often in exactly this position: trained on scraps, not full libraries.
Train network analogy
India's railway network connects major cities with fast, frequent trains — but remote villages are served by infrequent branch lines or not at all. Indian language NLP is similar: Hindi and English have high-speed, well-maintained model 'tracks', while languages like Maithili or Dogri are still waiting for basic infrastructure.
Every Dhee Learning session for this concept follows three stages. We share the questions Dhee actually asks, so you can hear what a session sounds like.
Stage 1 — Surface
Type a message the way you'd actually send it to a friend about school — use whatever mix of languages or abbreviations feels natural. Now think: what would an AI trained only on formal English do with that message?
Rote answer
"Indian languages have challenges like multiple scripts and dialects."
Understood
"My natural message probably mixes English and Hindi, uses abbreviations, and has no formal punctuation. An English-only AI would see most of it as noise or out-of-vocabulary tokens — it would either misunderstand or refuse to process it."
Stage 2 — Reasoning
Tamil is a morphologically rich language — the verb 'to go' can appear in over 200 different forms depending on tense, person, gender, and politeness. Why is this a much harder problem for NLP than English, where 'go' only changes to 'goes', 'went', 'gone', 'going'?
Follow-up Dhee may use: What technique from C7-SC1 might help handle this explosion of word forms? (Hint: sub-word tokenisation)
Stage 3 — Application
You're on a team asked to build an AI health helpline that works for rural populations in Odisha, where people speak Odia mixed with local tribal languages. Name three specific data challenges you'd face in the first month of this project.
Misconception Dhee watches for: Assuming that translating everything to Hindi first and then using Hindi NLP solves the problem — this loses local dialect nuances and may still fail for tribal language speakers.
Dhee turns this concept into a 15-minute spoken session — asking, listening, and probing — so your child builds the idea themselves.
22 scheduled languages, many scripts, little data: why Indian-language AI is genuinely hard. For Class 7.
Building an AI that works in Hindi means it works for most Indians — Hindi is not understood by roughly 45% of the population, and many Hindi speakers also use regional languages primarily.
Dhee opens with a question — for example: "Type a message the way you'd actually send it to a friend about school — use whatever mix of languages or abbreviations feels natural. Now think: what would an AI trained only on formal English do with that message?" — listens to your child's answer, then probes the reasoning behind it. The session ends when the child can apply the idea to a brand-new situation, not just recall it.