Class 7 · CBSE AI · Strand D — The Architect's Capstone
AI data requirements — what data does your project need?
Every AI needs training data and runtime data. How to specify type, quantity and quality before you build. For Class 7.
Class 7 · CBSE AI · Strand D — The Architect's Capstone
Every AI needs training data and runtime data. How to specify type, quantity and quality before you build. For Class 7.
Teaching a child to recognise vegetables
If you only show a child photos of green vegetables to teach them what vegetables look like, they'll struggle with red tomatoes and orange carrots. Your training data is exactly these teaching examples — their gaps become the AI's blind spots. Garbage in, confusion out.
Voter list for elections
An election where only one city's voters are on the roll isn't representative of the whole country — the result will be skewed. An AI trained only on data from one type of user will be equally skewed in who it works for. Data requirements planning is making sure your voter roll is complete and fair.
Every Dhee Learning session for this concept follows three stages. We share the questions Dhee actually asks, so you can hear what a session sounds like.
Stage 1 — Surface
You want to build an AI that detects whether a plant is diseased from a photo. Before writing a single line of code, what questions would you ask about the data you need?
Rote answer
"Child says 'I need photos of plants' without distinguishing between labelled vs unlabelled, sufficient quantity, or the importance of including examples of multiple disease types"
Understood
"Child asks: how many examples, do they need labels (which disease?), do I have photos of healthy plants too, do I have photos from different lighting or angles, who labelled them and how accurately?"
Stage 2 — Reasoning
You collect 500 photos for your plant disease detector — but 450 are of tomatoes and 50 are of all other plants combined. What will happen when a farmer uses your AI on their wheat crop — and why?
Follow-up Dhee may use: How would you restructure your data collection to avoid this problem before you start training?
Stage 3 — Application
Fill in this data requirements table for your own capstone project: (1) What data does the AI need to learn from? (2) What data does the AI need from the user at runtime? (3) Where will training data come from, and who will label it? (4) What could make this data biased or insufficient?
Misconception Dhee watches for: Child conflates training data with runtime data, or assumes 'any data is fine as long as there's a lot of it' — ignoring quality, labelling, and representativeness
Dhee turns this concept into a 15-minute spoken session — asking, listening, and probing — so your child builds the idea themselves.
Every AI needs training data and runtime data. How to specify type, quantity and quality before you build. For Class 7.
More data always solves bias problems — 10,000 biased examples produce a more confidently biased model than 1,000 biased examples
Dhee opens with a question — for example: "You want to build an AI that detects whether a plant is diseased from a photo. Before writing a single line of code, what questions would you ask about the data you need?" — listens to your child's answer, then probes the reasoning behind it. The session ends when the child can apply the idea to a brand-new situation, not just recall it.