What is next-token prediction — the truth about llms — explained for kids?

An LLM is trained to do one thing: predict the next token. Why that simple goal, at scale, is so powerful. For Class 7.

What's the most common mistake children make about this concept?

An LLM 'thinks about' the answer before responding — the generation is a left-to-right token-by-token process with no separate 'thinking' phase (unless chain-of-thought prompting is used).

Next-token prediction — the truth about how LLMs work

Q: How does Dhee Learning teach this in a Class 7 session?

Dhee opens with a question — for example: "Finish this sentence the most obvious way: 'The national animal of India is the ___.' Now finish this one: 'The best solution to climate change is ___.' Why was the first easy and the second almost impossible to finish with just one right word?" — listens to your child's answer, then probes the reasoning behind it. The session ends when the child can apply the idea to a brand-new situation, not just recall it.

What this concept actually says

LLMs are trained by repeatedly predicting the next token in a sequence — this single objective, at scale, produces remarkably general capabilities
Temperature and sampling strategies control how deterministic vs. creative an LLM's outputs are
The same next-token prediction mechanism that makes LLMs fluent is the root cause of hallucination

An analogy your child will recognise

Completing a popular film dialogue

If someone starts 'Mogambo...' every Indian film fan knows to complete it as '...khush hua!' That completion is obvious because you've heard it thousands of times. LLMs work exactly like this, but for every sentence structure ever written — the more often a completion has appeared, the more strongly the model is pulled toward it.

Finishing a familiar bhajan or folk song

If you know a bhajan deeply, you can hum any missing line automatically — but if you forget a word, you might substitute something that fits the rhythm and meaning even if it's not quite right. LLMs do the same: they generate a 'fits well enough' completion even when the exact right answer is unavailable in what they've learned.

Common misconceptions to watch for

An LLM 'thinks about' the answer before responding — the generation is a left-to-right token-by-token process with no separate 'thinking' phase (unless chain-of-thought prompting is used).
Higher temperature always means worse output — for creative tasks, low temperature produces repetitive, predictable text; higher temperature is often preferred.

Key facts in one breath

The training objective 'predict the next token' is called 'self-supervised learning' — no human labels are needed because the text itself provides the targets.
A model trained on next-token prediction implicitly learns grammar, factual associations, reasoning patterns, and even some mathematics — all from the same single objective.
Temperature is one of several 'decoding strategies' — others include top-k sampling, nucleus sampling, and beam search, each producing different types of output.
At temperature 0, an LLM is (nearly) deterministic — given the same input, it will usually produce the same output. This makes results far more reproducible for research.

How Dhee Learning teaches this — the 3-stage question loop

Every Dhee Learning session for this concept follows three stages. We share the questions Dhee actually asks, so you can hear what a session sounds like.

Stage 1 — Surface

Finish this sentence the most obvious way: 'The national animal of India is the ___.' Now finish this one: 'The best solution to climate change is ___.' Why was the first easy and the second almost impossible to finish with just one right word?

Rote answer

"LLMs predict the next token in a sequence."

Understood

"The first has one very probable answer from training data. The second could be completed a thousand different ways, each with roughly equal probability — an LLM has to pick one, and which one it picks will depend on randomness settings and what came before it."

Stage 2 — Reasoning

An LLM has a 'temperature' setting. At temperature 0, it always picks the most probable next token. At temperature 1, it picks more randomly. Why would you use temperature 0 for a medical diagnosis assistant but temperature 0.9 for a creative writing tool?

Follow-up Dhee may use: What could go wrong if someone accidentally deployed a creative writing temperature setting on a legal document drafting tool?

Stage 3 — Application

You ask an LLM: 'Who won the 2024 Indian Premier League?' It confidently gives you an answer. Explain, using next-token prediction mechanics, exactly why you should verify this before trusting it — even if the answer sounds completely certain.

Misconception Dhee watches for: Assuming that a confident, fluent answer indicates a verified fact — the confidence is a property of the probability distribution, not of factual accuracy.

Related concepts

Class 7

What are tokens? How AI reads text — for Class 7

Class 7

Why do LLMs hallucinate? The deep explanation for Class 7

Class 7

Prompt engineering as software engineering — for Class 7

Class 7

Data visualisation with matplotlib for beginners — Class 7

Next-token prediction — the truth about how LLMs work

What this concept actually says

An analogy your child will recognise

Common misconceptions to watch for

Key facts in one breath

How Dhee Learning teaches this — the 3-stage question loop

Related concepts

Want your child to actually understand this?

Frequently asked questions