Class 7 · CBSE AI · Strand C — NLP, Vision, and LLMs Deep-Dive
What are tokens? How AI reads text — for Class 7
Before a model reads text, it splits it into tokens. Why modern LLMs use sub-word pieces. For Class 7.
Class 7 · CBSE AI · Strand C — NLP, Vision, and LLMs Deep-Dive
Before a model reads text, it splits it into tokens. Why modern LLMs use sub-word pieces. For Class 7.
Indian cooking — making dough
Tokenising text is like breaking a big lump of atta into equal small balls before rolling chapatis. You can't cook the whole lump at once — it must be divided into manageable, consistent pieces first.
Train ticketing
A train booking system only understands city names it has on its list. If you type 'Bengaluru' but it only knows 'Bangalore', it may split your input into pieces it recognises — like 'Benga' and 'luru' — and give you a wrong result.
Every Dhee Learning session for this concept follows three stages. We share the questions Dhee actually asks, so you can hear what a session sounds like.
Stage 1 — Surface
If I asked you to teach a baby robot to read the word 'unbelievable', what's the first problem you'd run into?
Rote answer
"A token is the smallest unit of text a machine reads."
Understood
"The robot doesn't know what 'unbelievable' means as a whole, so it might have to break it into 'un', 'believ', 'able' — pieces it has seen before."
Stage 2 — Reasoning
Why do you think a tokeniser might split 'chatting' into 'chat' and '##ting' instead of keeping it as one word?
Follow-up Dhee may use: Imagine you're making a dictionary with only 500 words. How would you handle a word you've never seen before?
Stage 3 — Application
An AI tutoring app is trained only on English text. A student types 'maths ka syllabus do'. What tokenisation problems might appear, and what could go wrong?
Misconception Dhee watches for: Assuming the AI simply 'reads words like we do' and the problem is just translation, not tokenisation.
Dhee turns this concept into a 15-minute spoken session — asking, listening, and probing — so your child builds the idea themselves.
Before a model reads text, it splits it into tokens. Why modern LLMs use sub-word pieces. For Class 7.
Tokens are always whole words — in reality sub-word and character-level tokens are common.
Dhee opens with a question — for example: "If I asked you to teach a baby robot to read the word 'unbelievable', what's the first problem you'd run into?" — listens to your child's answer, then probes the reasoning behind it. The session ends when the child can apply the idea to a brand-new situation, not just recall it.