What is what makes data 'good' — explained for kids?

If you teach an AI from bad examples, it learns bad habits. The most important lesson in AI.

What's the most common mistake children make about this concept?

More data always means better AI — in reality, a large amount of inaccurate data is worse than a small amount of accurate data.

How does Dhee teach this in a Class 4 session?

Dhee opens with a question — for example: "If you asked 10 friends what their favourite food is but wrote down the answers randomly without listening — would that be good data or bad data? Why?" — listens to your child's answer, then probes the reasoning behind it. The session ends when the child can apply the idea to a brand-new situation, not just recall it.

Garbage In, Garbage Out — what makes AI data good

What this concept actually says

Good data is accurate, complete, and fair
Bad data leads AI to make bad decisions
The phrase 'garbage in, garbage out' means poor inputs produce poor outputs

An analogy your child will recognise

Cooking / kitchen

If you put stale, rotten vegetables into a curry, no amount of good cooking will make it taste right. Data is like the ingredients — the dish (the AI) can only be as good as what you put in.

Cricket scoreboard

Imagine the scorekeeper writing random runs on the board instead of the actual score. At the end of the match, everyone would think the wrong team won. That scoreboard is 'garbage data'.

Common misconceptions to watch for

More data always means better AI — in reality, a large amount of inaccurate data is worse than a small amount of accurate data.
Once data is collected, it is automatically good to use — in reality, data almost always needs to be checked and cleaned before use.

Key facts in one breath

The phrase 'garbage in, garbage out' (GIGO) means that poor quality input data always produces poor quality output from an AI.
Good data has three main qualities: it is accurate (correct), complete (nothing important is missing), and representative (it includes everyone it should).
An AI cannot tell by itself that its training data was bad — it will confidently use whatever it was given.

How Dhee teaches this — the 3-stage Socratic loop

Every Dhee session for this concept follows three stages. We share the questions Dhee actually asks, so you can hear what a session sounds like.

Stage 1 — Surface

If you asked 10 friends what their favourite food is but wrote down the answers randomly without listening — would that be good data or bad data? Why?

Rote answer

"Bad data because it is incorrect."

Understood

"It would be bad because the answers don't actually match what my friends said, so any conclusion I draw would be wrong too."

Stage 2 — Reasoning

Imagine an AI is trained to recommend medicines to sick people, but the data it learned from had many wrong entries. What could go wrong?

Follow-up Dhee may use: Think about it this way — if you studied from a textbook with wrong answers, what would happen when you took a test?

Stage 3 — Application

Your school wants to build an AI that suggests what snacks to keep in the canteen. What would 'good data' look like for this AI, and what would 'bad data' look like?

Misconception Dhee watches for: Thinking that more data always means better data, regardless of its accuracy or fairness.

Related concepts

Class 4

What is data? Explained for Class 4 kids