Class 7 · CBSE AI · Strand C — NLP, Vision, and LLMs Deep-Dive

Next-token prediction — the truth about how LLMs work

An LLM is trained to do one thing: predict the next token. Why that simple goal, at scale, is so powerful. For Class 7.

What this concept actually says

  • LLMs are trained by repeatedly predicting the next token in a sequence — this single objective, at scale, produces remarkably general capabilities
  • Temperature and sampling strategies control how deterministic vs. creative an LLM's outputs are
  • The same next-token prediction mechanism that makes LLMs fluent is the root cause of hallucination

An analogy your child will recognise

Completing a popular film dialogue

If someone starts 'Mogambo...' every Indian film fan knows to complete it as '...khush hua!' That completion is obvious because you've heard it thousands of times. LLMs work exactly like this, but for every sentence structure ever written — the more often a completion has appeared, the more strongly the model is pulled toward it.

Finishing a familiar bhajan or folk song

If you know a bhajan deeply, you can hum any missing line automatically — but if you forget a word, you might substitute something that fits the rhythm and meaning even if it's not quite right. LLMs do the same: they generate a 'fits well enough' completion even when the exact right answer is unavailable in what they've learned.

Common misconceptions to watch for

  • An LLM 'thinks about' the answer before responding — the generation is a left-to-right token-by-token process with no separate 'thinking' phase (unless chain-of-thought prompting is used).
  • Higher temperature always means worse output — for creative tasks, low temperature produces repetitive, predictable text; higher temperature is often preferred.

Key facts in one breath

  • The training objective 'predict the next token' is called 'self-supervised learning' — no human labels are needed because the text itself provides the targets.
  • A model trained on next-token prediction implicitly learns grammar, factual associations, reasoning patterns, and even some mathematics — all from the same single objective.
  • Temperature is one of several 'decoding strategies' — others include top-k sampling, nucleus sampling, and beam search, each producing different types of output.
  • At temperature 0, an LLM is deterministic — given the same input, it will always produce the same output. This is important for reproducibility in research.

How Dhee Learning teaches this — the 3-stage question loop

Every Dhee Learning session for this concept follows three stages. We share the questions Dhee actually asks, so you can hear what a session sounds like.

Stage 1 — Surface

Finish this sentence the most obvious way: 'The national animal of India is the ___.' Now finish this one: 'The best solution to climate change is ___.' Why was the first easy and the second almost impossible to finish with just one right word?

Rote answer

"LLMs predict the next token in a sequence."

Understood

"The first has one very probable answer from training data. The second could be completed a thousand different ways, each with roughly equal probability — an LLM has to pick one, and which one it picks will depend on randomness settings and what came before it."

Stage 2 — Reasoning

An LLM has a 'temperature' setting. At temperature 0, it always picks the most probable next token. At temperature 1, it picks more randomly. Why would you use temperature 0 for a medical diagnosis assistant but temperature 0.9 for a creative writing tool?

Follow-up Dhee may use: What could go wrong if someone accidentally deployed a creative writing temperature setting on a legal document drafting tool?

Stage 3 — Application

You ask an LLM: 'Who won the 2024 Indian Premier League?' It confidently gives you an answer. Explain, using next-token prediction mechanics, exactly why you should verify this before trusting it — even if the answer sounds completely certain.

Misconception Dhee watches for: Assuming that a confident, fluent answer indicates a verified fact — the confidence is a property of the probability distribution, not of factual accuracy.

Related concepts

Want your child to actually understand this?

Dhee turns this concept into a 15-minute spoken session — asking, listening, and probing — so your child builds the idea themselves.

Frequently asked questions

What is next-token prediction — the truth about llms — explained for kids? +

An LLM is trained to do one thing: predict the next token. Why that simple goal, at scale, is so powerful. For Class 7.

What's the most common mistake children make about this concept? +

An LLM 'thinks about' the answer before responding — the generation is a left-to-right token-by-token process with no separate 'thinking' phase (unless chain-of-thought prompting is used).

How does Dhee Learning teach this in a Class 7 session? +

Dhee opens with a question — for example: "Finish this sentence the most obvious way: 'The national animal of India is the ___.' Now finish this one: 'The best solution to climate change is ___.' Why was the first easy and the second almost impossible to finish with just one right word?" — listens to your child's answer, then probes the reasoning behind it. The session ends when the child can apply the idea to a brand-new situation, not just recall it.