Assessment Redesign Under AI
Rehabilitating Assessment Validity — Assessment Redesign Under AI
Many traditional assessments — take-home essays, problem sets, lab reports — were designed when AI could not routinely produce outputs that look like those capacities being exercised. Under current AI, they may no longer be valid signals of the construct they were designed to measure. This task asks you to audit TWO assessments you actually use and diagnose the validity loss — the specific mechanism by which AI substitution produces surface performance without the underlying capacity.
Key concepts:
- Validity loss has a specific mechanism — AI produces X that looks like capacity Y, without Y being exercised
- The audit is evidential, not rhetorical — "AI makes essays meaningless" is a slogan, not a validity audit
- Subject-specific: the failure mechanism for a history essay differs from a maths proof
Administrative policies against AI use are not the answer to validity loss — they rely on detection, which is structurally unreliable. The deeper move is assessment that is STRUCTURALLY non-evidential for AI-substituted output: forms where AI substitution does not produce something that could be scored as valid evidence at all. This is the heart of D8. You design ONE such assessment in your subject.
Key concepts:
- Structural non-evidentiality > administrative policing
- The form must be designed so AI substitution produces a null, not a false positive
- Authenticity means the task requires the visible exercise of the learner's own agency — process visible, reasoning explained, revision traceable
A policy imposed on students is harder to defend and easier to game than a policy they helped articulate. This task is not about writing a policy and telling your class — it's about sitting with your students and working it out together. You record what they actually said. You notice where your prior judgments were wrong. The final artefact is a shared policy, not a unilateral one. The reflection annotation (LOG-F) captures what the conversation taught you.
Key concepts:
- Co-articulation ≠ approval — students help decide, not just agree
- Record what they actually said, including disagreement with your initial framing
- The shared policy should be stateable in a sentence or two — complex policies are administrative, not educational
An assessment design is a hypothesis. A trial is the evidence. This task asks you to run p8t2 with a small group, collect the evidence, and write it up honestly. The surprising result — the student who revealed a sophisticated capacity you did not expect, or the form that turned out to fail in a way you did not anticipate — is usually where the real learning lives. Don't sanitise it. Revalidation is ethical because it is evidential: it prevents the assessment from being adopted at scale on the basis of a plausible story rather than actual performance.
Key concepts:
- An untrialled assessment is a claim, not evidence
- Honest reporting > tidy reporting — the surprise is the finding
- Proposed revision closes the loop — the trial feeds back into the design