Extension Operation Capability
Designing and Evaluating Extensions — Hands-On Operation
Pick one real teaching challenge from your current unit. Across at least eight iterations, work with an AI tool to produce something that would actually help you teach that thing. For each iteration you record:
- Pedagogical intent — one sentence, stated before you write the prompt. Not “better examples” — “examples that force a Year 9 student to distinguish between proportional and linear relationships by varying the initial value.”
- Full prompt text. Every word, including the system prompt or persona. No paraphrasing.
- AI output summary. What the AI produced — not your reaction yet.
- Evaluation against the intent. Did this output serve the intent? Where did it fail? Be specific about which student and which misconception.
- Revision rationale. Before you write the next prompt, state which pedagogical function was not served and what you are changing to serve it.
That fifth field is the most important. Teachers at Level 3 write rationales as a coherent story of diagnosing and adjusting. Teachers at Level 2 leave rationales like “too long” or “off-topic” that tell us nothing about pedagogical function. If you find yourself writing “too generic,” ask yourself generic in what way, and generic at what cost to learning — and write that instead.
Evaluation criteria: D2.1 Extension Design CapabilityFor each of three AI-generated materials relevant to your subject, write an evaluation that:
- Identifies at least one error. Factual (hallucination), bias (implicit framing), or overconfidence (an answer that should have been hedged). Across your three reports, at least one must address a subtle error — not the obvious ones the tool itself would catch.
- Explains why the error matters in your subject. “This is wrong” is not an evaluation. “This is wrong and if a student believed it they would form a misconception that would mis-teach the next three lessons” is an evaluation. Trace the learning consequence.
- Writes a corrected version. Short. Show what right looks like.
The subtle-error report is the hardest and the most important. Subtle errors are the ones you’ll miss in a hurry, which means they’re the ones students will absorb and carry forward.
Evaluation criteria: D2.2 Extension Evaluation CapabilityOne thing you build with AI, not just ask from AI — something another teacher could pick up and reuse in your subject and get a reliable result. Your artefact must include:
- Pedagogical rationale — what this artefact is for in a student’s learning journey, not what it does mechanically.
- Target HAA type — A, B, or C, and why. A B-type artefact that’s accidentally a C-type is a problem worth knowing about.
- Boundary conditions — situations where this artefact should not be used.
- At least one tested failure mode. Actually use your artefact until it breaks. Document how it broke. A customised artefact without a documented failure mode is a Level 2 artefact no matter how polished it looks — we cannot trust an extension we haven’t seen fail yet.
Compare two AI tools head-to-head on one lesson activity in your unit. You must test both, not read reviews. Recommend one for that specific activity and state why.
Use your A/B/C taxonomy from Day 1 to structure the comparison: are the two tools offering the same extension type with different quality, or offering different extension types entirely? That reframing matters — a lot of tool-selection decisions go sideways because teachers compare tools as if they were all A-type, when the real difference is that one scaffolds and one amplifies.
Evaluation criteria: D2.3 Extension Customisation Capability- Pedagogical intent stated before the prompt, not retrofitted after output
- Revision rationale names a pedagogical failure, not a format failure
- Arc shows genuine learning — later iterations are qualitatively different
- At Level 3+, references your specific class’s needs in the evaluation
- Errors identified with subject-specific impact analysis
- The “subtle error” is genuinely subtle — not obvious to a non-specialist
- Corrected versions show the teacher knows the content well enough to fix it
- Designed for a specific class, not “Year 10 Science” in general
- Boundary conditions are operationally concrete
- The tested failure mode is documented with actual AI output
- Both tools were actually tested (not reviewed from documentation alone)
- Comparison uses A/B/C taxonomy, not feature lists
- Selection justified by fit to the specific activity and class