Class 7 · CBSE AI · Strand D — The Architect's Capstone

AI data requirements — what data does your project need?

Every AI needs training data and runtime data. How to specify type, quantity and quality before you build. For Class 7.

What this concept actually says

  • Every AI system needs training data and runtime data — identifying both is a prerequisite to building
  • Data requirements analysis asks: what data does the AI need, where will it come from, and what makes it sufficient and unbiased
  • Poor data planning is the most common reason AI projects fail or cause harm after launch

An analogy your child will recognise

Teaching a child to recognise vegetables

If you only show a child photos of green vegetables to teach them what vegetables look like, they'll struggle with red tomatoes and orange carrots. Your training data is exactly these teaching examples — their gaps become the AI's blind spots. Garbage in, confusion out.

Voter list for elections

An election where only one city's voters are on the roll isn't representative of the whole country — the result will be skewed. An AI trained only on data from one type of user will be equally skewed in who it works for. Data requirements planning is making sure your voter roll is complete and fair.

Common misconceptions to watch for

  • More data always solves bias problems — 10,000 biased examples produce a more confidently biased model than 1,000 biased examples
  • You only need data once to train the model — AI systems in real use require ongoing data to retrain and correct for drift

Key facts in one breath

  • Training data requirements specify the type, quantity, quality, and labelling of examples the model learns from
  • Runtime data requirements specify what information the user must provide during actual use of the AI
  • A minimum viable dataset for a simple image classifier is typically 100–200 labelled examples per class
  • Data bias in AI usually reflects existing inequalities in who generated the data — if low-income users weren't represented when data was collected, the AI may underserve them

How Dhee Learning teaches this — the 3-stage question loop

Every Dhee Learning session for this concept follows three stages. We share the questions Dhee actually asks, so you can hear what a session sounds like.

Stage 1 — Surface

You want to build an AI that detects whether a plant is diseased from a photo. Before writing a single line of code, what questions would you ask about the data you need?

Rote answer

"Child says 'I need photos of plants' without distinguishing between labelled vs unlabelled, sufficient quantity, or the importance of including examples of multiple disease types"

Understood

"Child asks: how many examples, do they need labels (which disease?), do I have photos of healthy plants too, do I have photos from different lighting or angles, who labelled them and how accurately?"

Stage 2 — Reasoning

You collect 500 photos for your plant disease detector — but 450 are of tomatoes and 50 are of all other plants combined. What will happen when a farmer uses your AI on their wheat crop — and why?

Follow-up Dhee may use: How would you restructure your data collection to avoid this problem before you start training?

Stage 3 — Application

Fill in this data requirements table for your own capstone project: (1) What data does the AI need to learn from? (2) What data does the AI need from the user at runtime? (3) Where will training data come from, and who will label it? (4) What could make this data biased or insufficient?

Misconception Dhee watches for: Child conflates training data with runtime data, or assumes 'any data is fine as long as there's a lot of it' — ignoring quality, labelling, and representativeness

Related concepts

Want your child to actually understand this?

Dhee turns this concept into a 15-minute spoken session — asking, listening, and probing — so your child builds the idea themselves.

Frequently asked questions

What is data requirements analysis — explained for kids? +

Every AI needs training data and runtime data. How to specify type, quantity and quality before you build. For Class 7.

What's the most common mistake children make about this concept? +

More data always solves bias problems — 10,000 biased examples produce a more confidently biased model than 1,000 biased examples

How does Dhee Learning teach this in a Class 7 session? +

Dhee opens with a question — for example: "You want to build an AI that detects whether a plant is diseased from a photo. Before writing a single line of code, what questions would you ask about the data you need?" — listens to your child's answer, then probes the reasoning behind it. The session ends when the child can apply the idea to a brand-new situation, not just recall it.