HAA Teacher Portal
Competency Assessment Rubric
About Position paper FAQ Sign in
Day 8 · Dimension 8 · ~5 hrsnew in v5

Assessment Redesign Under AI

Rehabilitating Assessment Validity — Assessment Redesign Under AI

D8 measures whether the teacher can rehabilitate assessment validity under AI. The critical move is from "assessment plus an AI-use policy" to "assessment that is STRUCTURALLY non-evidential for AI-substituted output." Abstract statements about validity ("we need to watch for AI misuse") cannot score above Level 2. Level 3+ requires concrete instruments and validation evidence. D8.2 (Authentic Agentic Assessment) is the heart of this dimension — a designed assessment form that does not rely on administrative policing.
Level 1–2
Validity audit asserts generic AI-undermines-assessment concerns. Authentic-assessment design is an administrative policy, not a structural redesign. Policy co-articulation is a one-way announcement. Revalidation trial is absent or anecdotal.
Level 3
Audit names specific construct + failure mechanism per assessment. Design is structurally non-evidential for AI-substituted output with a named scoring approach. Co-articulation records specific student voices. Revalidation trial reports honest evidence + a proposed revision.
Level 4
Audit surfaces non-obvious validity losses. Design discriminates extension-use from substitution-use. Co-articulation revised your own framing. Trial revealed something about the CONSTRUCT, not just the form.
4 TASKS What you will produce on Day 8
01 Assessment Validity Audit
LOG-D

Many traditional assessments — take-home essays, problem sets, lab reports — were designed when AI could not routinely produce outputs that look like those capacities being exercised. Under current AI, they may no longer be valid signals of the construct they were designed to measure. This task asks you to audit TWO assessments you actually use and diagnose the validity loss — the specific mechanism by which AI substitution produces surface performance without the underlying capacity.

Key concepts:

  • Validity loss has a specific mechanism — AI produces X that looks like capacity Y, without Y being exercised
  • The audit is evidential, not rhetorical — "AI makes essays meaningless" is a slogan, not a validity audit
  • Subject-specific: the failure mechanism for a history essay differs from a maths proof
What good looks like: Two actual assessments audited. Construct named for each. Failure mechanism specified. Level 4 identifies an assessment where the validity loss was not obvious and required auditing to surface.
Evaluation criteria: D8.1 Assessment Validity Under AI
02 Authentic Agentic Assessment Design

Administrative policies against AI use are not the answer to validity loss — they rely on detection, which is structurally unreliable. The deeper move is assessment that is STRUCTURALLY non-evidential for AI-substituted output: forms where AI substitution does not produce something that could be scored as valid evidence at all. This is the heart of D8. You design ONE such assessment in your subject.

Key concepts:

  • Structural non-evidentiality > administrative policing
  • The form must be designed so AI substitution produces a null, not a false positive
  • Authenticity means the task requires the visible exercise of the learner's own agency — process visible, reasoning explained, revision traceable
What good looks like: One concrete assessment specified: construct, form, why AI substitution fails as evidence, scoring approach. Level 4 explains why this form is valid even when students DO use AI as an extension — i.e. it discriminates extension-use from substitution-use.
Evaluation criteria: D8.2 Authentic Agentic Assessment
03 AI-Inclusive Policy Co-articulation
LOG-F

A policy imposed on students is harder to defend and easier to game than a policy they helped articulate. This task is not about writing a policy and telling your class — it's about sitting with your students and working it out together. You record what they actually said. You notice where your prior judgments were wrong. The final artefact is a shared policy, not a unilateral one. The reflection annotation (LOG-F) captures what the conversation taught you.

Key concepts:

  • Co-articulation ≠ approval — students help decide, not just agree
  • Record what they actually said, including disagreement with your initial framing
  • The shared policy should be stateable in a sentence or two — complex policies are administrative, not educational
What good looks like: Conversation record (summary or transcript). Specific student positions captured. A short shared policy document. Level 4 honestly records where your own framing was revised by the conversation.
Evaluation criteria: D8.3 AI-Inclusive Assessment Policy
04 Revalidation Trial
LOG-E

An assessment design is a hypothesis. A trial is the evidence. This task asks you to run p8t2 with a small group, collect the evidence, and write it up honestly. The surprising result — the student who revealed a sophisticated capacity you did not expect, or the form that turned out to fail in a way you did not anticipate — is usually where the real learning lives. Don't sanitise it. Revalidation is ethical because it is evidential: it prevents the assessment from being adopted at scale on the basis of a plausible story rather than actual performance.

Key concepts:

  • An untrialled assessment is a claim, not evidence
  • Honest reporting > tidy reporting — the surprise is the finding
  • Proposed revision closes the loop — the trial feeds back into the design
What good looks like: Trial run with ≥5 students. Evidence reported honestly, including surprises. A concrete revision proposed. Level 4 shows the trial revealed something about the CONSTRUCT, not just the form — e.g. the assessment revealed that the construct itself needed sharpening.
Evaluation criteria: D8.1 Assessment Validity Under AI
3 SUB-COMPETENCIES Evaluation criteria for Day 8
D8.1 Assessment Validity Under AI General A–F; acute for A and E
Primary evidence
Phase 8 · Assessment Validity Audit
Key question
Is the validity-loss mechanism named, or is the critique rhetorical?
1 — Nascent
Generic awareness. is the validity-loss mechanism named, or is the critique rhetorical — not yet asked. Validity audit asserts generic AI-undermines-assessment concerns. Authentic-assessment design is an administrative policy, not a structural redesign. Policy co-articulation is a one-way announcement. Revalidation trial is absent or anecdotal.
2 — Developing
Partial demonstration. Validity audit asserts generic AI-undermines-assessment concerns. Authentic-assessment design is an administrative policy, not a structural redesign. Policy co-articulation is a one-way announcement. Revalidation trial is absent or anecdotal.
3 — Proficient
Full demonstration. Audit names specific construct + failure mechanism per assessment. Design is structurally non-evidential for AI-substituted output with a named scoring approach. Co-articulation records specific student voices. Revalidation trial reports honest evidence + a proposed revision.
Anchor: ≥2 assessments audited; construct + failure mechanism named per assessment.
4 — Advanced
Audit surfaces non-obvious validity losses. Design discriminates extension-use from substitution-use. Co-articulation revised your own framing. Trial revealed something about the CONSTRUCT, not just the form.
p8t1Assessment Validity AuditLOG-D
≥2 existing assessments (yours or departmental). For each: what was it designed to measure? what can AI now produce that would look identical to that capacity being exercised? what is the failure mechanism that makes the assessment no longer a valid signal?
p8t4Revalidation TrialLOG-E
Run p8t2's new assessment with ≥5 students. Write up what the evidence actually showed: did the form discriminate agentic work from AI-substituted output? what surprised you? what revision do you propose?
D8.2 Authentic Agentic Assessment Core for Type C; general A–F
Primary evidence
Phase 8 · Authentic Agentic Assessment Design + Revalidation Trial
Key question
Is the design STRUCTURALLY non-evidential for AI-substituted output, or administratively policed?
1 — Nascent
Generic awareness. is the design structurally non-evidential for ai-substituted output, or administratively policed — not yet asked. Validity audit asserts generic AI-undermines-assessment concerns. Authentic-assessment design is an administrative policy, not a structural redesign. Policy co-articulation is a one-way announcement. Revalidation trial is absent or anecdotal.
2 — Developing
Partial demonstration. Validity audit asserts generic AI-undermines-assessment concerns. Authentic-assessment design is an administrative policy, not a structural redesign. Policy co-articulation is a one-way announcement. Revalidation trial is absent or anecdotal.
3 — Proficient
Full demonstration. Audit names specific construct + failure mechanism per assessment. Design is structurally non-evidential for AI-substituted output with a named scoring approach. Co-articulation records specific student voices. Revalidation trial reports honest evidence + a proposed revision.
Anchor: One concrete assessment with construct, form, why AI substitution fails as evidence, and scoring approach.
4 — Advanced
Audit surfaces non-obvious validity losses. Design discriminates extension-use from substitution-use. Co-articulation revised your own framing. Trial revealed something about the CONSTRUCT, not just the form.
p8t2Authentic Agentic Assessment Design
Design ONE new assessment that is STRUCTURALLY non-evidential for AI-substituted output — oral examination, in-class writing, process portfolio, scaffolded real-time construction, iterative review with explained reasoning, or think-aloud protocol. Specify: construct measured, form, why AI substitution fails to provide valid evidence for this form, scoring approach.
D8.3 AI-Inclusive Assessment Policy Cross-cutting; primary for A and C
Primary evidence
Phase 8 · Co-articulation Record + Shared Policy
Key question
Was the policy co-articulated with students, or imposed on them?
1 — Nascent
Generic awareness. was the policy co-articulated with students, or imposed on them — not yet asked. Validity audit asserts generic AI-undermines-assessment concerns. Authentic-assessment design is an administrative policy, not a structural redesign. Policy co-articulation is a one-way announcement. Revalidation trial is absent or anecdotal.
2 — Developing
Partial demonstration. Validity audit asserts generic AI-undermines-assessment concerns. Authentic-assessment design is an administrative policy, not a structural redesign. Policy co-articulation is a one-way announcement. Revalidation trial is absent or anecdotal.
3 — Proficient
Full demonstration. Audit names specific construct + failure mechanism per assessment. Design is structurally non-evidential for AI-substituted output with a named scoring approach. Co-articulation records specific student voices. Revalidation trial reports honest evidence + a proposed revision.
Anchor: Conversation record with specific student voices; short shared policy statable in 1–2 sentences.
4 — Advanced
Audit surfaces non-obvious validity losses. Design discriminates extension-use from substitution-use. Co-articulation revised your own framing. Trial revealed something about the CONSTRUCT, not just the form.
p8t3AI-Inclusive Policy Co-articulationLOG-F
Run ONE conversation with your class (or a focal group ≥4 students) about legitimate vs. illegitimate AI use in your subject. Record what they named as legitimate / illegitimate / ambiguous. Produce a short shared policy document.