What is building a topic classifier — explained for kids?

How AI sorts text into categories — one of the oldest and most useful NLP tasks. For Class 7.

What's the most common mistake children make about this concept?

More categories always means a better classifier — fine-grained categories require exponentially more labelled data and increase the chance of confusion between similar classes.

What is text classification? Building a topic classifier

Q: How does Dhee Learning teach this in a Class 7 session?

Dhee opens with a question — for example: "A news app wants to tag every article automatically as 'Sports', 'Politics', 'Technology', or 'Entertainment'. What happens when an article is about a cricket player who becomes a politician and launches a sports app?" — listens to your child's answer, then probes the reasoning behind it. The session ends when the child can apply the idea to a brand-new situation, not just recall it.

What this concept actually says

Topic classification assigns a text to one or more predefined categories based on its content
The choice of categories is a design decision with real consequences — it shapes what the system can and cannot represent
Multi-label classification allows a text to belong to more than one topic simultaneously

An analogy your child will recognise

Post office sorting

A post office sorter puts each letter into one city's bag. But what if a letter is addressed to someone who has two homes — one in Chennai, one in Delhi? You have to pick one bag, or create a new rule for 'dual destination' letters. Topic classifiers face exactly this problem with multi-topic content.

Mela stall organisation

At a mela, stalls are organised by type — food, games, crafts. But a stall selling handmade food toys (like a craft-food hybrid) doesn't fit neatly anywhere. The person running the mela has to make a decision about where to place it. That placement decision is exactly what a topic classifier does — and it always involves some loss of nuance.

Common misconceptions to watch for

More categories always means a better classifier — fine-grained categories require exponentially more labelled data and increase the chance of confusion between similar classes.
A topic classifier 'understands' what an article is about — it recognises patterns of words associated with labels in training data, which is not the same as comprehension.

Key facts in one breath

Topic classification is one of the oldest NLP tasks, with early systems dating to the 1960s using simple keyword matching.
Modern classifiers fine-tune large pre-trained models on domain-specific labelled data rather than building from scratch.
Zero-shot classifiers can assign topics they were never explicitly trained on, by leveraging embedding similarity to topic descriptions.
The choice of label taxonomy (the set of categories) is a sociotechnical decision — it encodes assumptions about how the world should be organised.

How Dhee Learning teaches this — the 3-stage question loop

Every Dhee Learning session for this concept follows three stages. We share the questions Dhee actually asks, so you can hear what a session sounds like.

Stage 1 — Surface

A news app wants to tag every article automatically as 'Sports', 'Politics', 'Technology', or 'Entertainment'. What happens when an article is about a cricket player who becomes a politician and launches a sports app?

Rote answer

"A topic classifier puts text into categories."

Understood

"That article fits all three categories at once, which breaks a system that forces one label. You'd either need multiple labels per article, or you'd have to accept that whichever single label you pick, you're losing important information."

Stage 2 — Reasoning

Two classifiers are trained on the same news dataset. Classifier A has 5 topic categories; Classifier B has 50. What are the trade-offs — when would you prefer A, and when would you prefer B?

Follow-up Dhee may use: What if two of the 50 categories are nearly identical — like 'Cricket' and 'IPL'? What problem does that create for the model?

Stage 3 — Application

You're building a classifier to route student questions to the right subject teacher in a school chatbot. List the three hardest design decisions you face before collecting any data.

Misconception Dhee watches for: Assuming the category list is obvious and fixed — in practice, defining categories is where most real-world classification projects spend the most time.

Related concepts

Class 7

Reading a CSV with pandas — your first data file in Python

What is text classification? Building a topic classifier — Class 7