Class 7 · CBSE AI · Strand C — NLP, Vision, and LLMs Deep-Dive
Object detection vs classification — what's the difference?
Classification asks 'what is this?'; detection adds 'where is it?' with boxes. How tools like YOLO work. For Class 7.
Class 7 · CBSE AI · Strand C — NLP, Vision, and LLMs Deep-Dive
Classification asks 'what is this?'; detection adds 'where is it?' with boxes. How tools like YOLO work. For Class 7.
Attendance in a classroom
Classification is like asking 'Is there at least one student named Priya in the class?' Object detection is like asking 'Which specific seat is Priya in?' Instance segmentation is like drawing an outline around exactly where Priya is sitting including her bag and chair. Each task requires more precision — and more effort.
Finding players on a cricket field from a broadcast camera
Classification: Is there a cricket match happening? Detection: Where are each of the 22 players and 3 umpires right now? Segmentation: Trace the exact pixel boundary of each player to compute their precise running speed. Broadcast analytics systems now do all three simultaneously — each layer of analysis enables a different downstream use.
Every Dhee Learning session for this concept follows three stages. We share the questions Dhee actually asks, so you can hear what a session sounds like.
Stage 1 — Surface
A self-driving car needs to both know there's a pedestrian in the scene AND know exactly where they are to avoid hitting them. Why is 'there is a person in this image' not enough information — and what extra piece of information does it actually need?
Rote answer
"Object detection finds where objects are in an image, not just what they are."
Understood
"Just knowing a person exists somewhere in a large camera frame isn't actionable — the car needs to know the person is at a specific position and distance to calculate whether to brake or steer. If there are three pedestrians, it needs all three locations, not just a count."
Stage 2 — Reasoning
A fruit-sorting machine in an apple orchard can classify individual apples as 'ripe' or 'unripe' with 98% accuracy when tested on single-apple photos. Why might this performance drop dramatically when deployed in the actual orchard where apples grow in clusters and touch each other?
Follow-up Dhee may use: What changes to the training data would you make to prepare for real orchard conditions?
Stage 3 — Application
Design a vision system for monitoring a school corridor to ensure no student is running (safety rule). Specify: (a) what type of vision task this requires, (b) what training data you'd need, and (c) one serious failure mode you'd need to address.
Misconception Dhee watches for: Treating this as a simple classification problem (running/not-running per frame) — action recognition requires temporal context across multiple frames, not just single-image classification.
Dhee turns this concept into a 15-minute spoken session — asking, listening, and probing — so your child builds the idea themselves.
Classification asks 'what is this?'; detection adds 'where is it?' with boxes. How tools like YOLO work. For Class 7.
Object detection is just image classification run multiple times on different regions — modern detectors like YOLO process the entire image in a single pass, making them much faster than sliding-window approaches.
Dhee opens with a question — for example: "A self-driving car needs to both know there's a pedestrian in the scene AND know exactly where they are to avoid hitting them. Why is 'there is a person in this image' not enough information — and what extra piece of information does it actually need?" — listens to your child's answer, then probes the reasoning behind it. The session ends when the child can apply the idea to a brand-new situation, not just recall it.