Teaching observability without drowning dashboards in numbers
Early career engineers sometimes equate observability with chart count. We introduce three questions every service should answer: who is impacted, how fast is recovery, and what changed recently. Metrics and traces exist to support those questions, not the reverse.
Labs wire Spring Boot Actuator exports into a local stack with sane cardinality limits. Students feel what happens when a label explodes and practice trimming dimensions before the problem reaches production.
We also discuss human factors: alert fatigue, on-call rotations, and how to write an incident summary that external reviewers can follow. Those topics sit beside technical configuration on purpose.
The capstone asks learners to propose a minimal dashboard for their demo service and defend each panel. If a panel lacks an owner and a review cadence, it does not ship.