Here is the question every faculty member needs to answer: Does your assignment still measure what it claims to measure when a student uses AI?

If the honest answer is no, you are not alone. A University of Reading blind study found that AI-generated exam submissions went undetected 94% of the time and scored, on average, half a grade boundary higher than real student work (Scarfe et al., 2024). That finding should stop every educator in their tracks. It means that across disciplines, polished output no longer proves learning. A beautifully written essay, a technically correct lab report, a professional-looking marketing plan: none of these reliably signal that a student understands anything at all.

ARAD 2.0: AI-Responsive Assignment Design for Human Learning is my answer to that problem. It is a goal-first, research-backed framework for designing assessments that restrict, limit, collaborate with, or require AI in ways that preserve learning, reveal thinking, strengthen judgment, and maintain fairness. It evolves the original ARAD framework I published in 2023 by replacing a binary AI model with a four-mode continuum, elevating process evidence and verification to core design principles, adding equity safeguards, and grounding every element in established learning science.

This is a long article. ARAD 2.0 is a comprehensive framework, and I am going to walk through all of it: the theoretical foundation, the six core principles, the four assignment modes, the five-step GOALS design process, an eight-dimension rubric, five fully worked redesign examples, a faculty AI-literacy prerequisite, an AI literacy primer for students, a diagnostics and mode-switching guide, a programmatic layer for department chairs, answers to the most common faculty objections, and a practical implementation guide. If you are a faculty member, instructional designer, department chair, or academic administrator trying to figure out how to design assignments that actually work in 2026 and beyond, this is the article to bookmark.

The Core Claim

The real question for every assignment is whether it preserves the cognitive agency students need to learn, with or without AI in the workflow. If AI performs the thinking an assignment claims to measure, the assessment loses validity, regardless of how polished the output looks.

From ARAD to ARAD 2.0: Why the Framework Needed to Evolve

The original ARAD framework, published in 2023, introduced a five-step model captured by the mnemonic GOALS: Goal, Openness, Adapt, Link, Study. It gave faculty a practical way to move from panic to design, grounding assignment decisions in learning objectives. The original model made two core contributions. First, it insisted on goal-first design: the learning objective drives every AI decision. This aligns with Backward Design (Wiggins & McTighe, 2005) and with UNESCO (2023) and OECD (2023) guidance that AI use should be driven by pedagogical intent. Second, it introduced a two-path model distinguishing between AI-Integrated assignments (where AI supports learning) and AI-Resistant assignments (where AI would short-circuit it). That binary helped faculty act quickly and see that both stances are pedagogically valid. ARAD 2.0 expands this into a four-mode continuum that also recognizes disciplines where skilled use of AI is the learning goal.

But the landscape has shifted dramatically. Five developments drove the need for ARAD 2.0:

1

The Binary Is Too Simple

Faculty frequently need a middle ground. They want to allow AI for grammar support while reserving the core analysis for the student. They want students to use AI for brainstorming while keeping the final argument in their own voice. A two-path model cannot capture that nuance. Real teaching requires a spectrum of options.

2

Product Quality No Longer Proves Learning

The University of Reading study is the clearest evidence, and other data points across disciplines echo it: AI can now produce work indistinguishable from strong student output. If polished output no longer reliably signals understanding, assessment must reveal the thinking behind it.

3

Detection Is Not a Viable Foundation

AI detectors remain unreliable and are disproportionately biased against non-native English writers (Liang et al., 2023). Building grading or misconduct workflows around detection creates adversarial dynamics and equity risks. Any framework that depends on catching students is already broken.

4

Verification Is Now a Core Academic Skill

Generative AI produces fluent but frequently flawed output. Students need structured practice in testing claims, checking sources, identifying omissions, and correcting errors. UNESCO (2023) frames these as essential skills for human-centered AI use. Verification sits well above a low-level fact-check: it is the exercise of disciplinary judgment.

5

Equity Demands Explicit Attention

Students differ in tool access, digital literacy, language background, and institutional support. A mature framework must address fairness, privacy, and access as foundational design concerns.

What Changed Between Versions

The evolution from ARAD 1.0 to 2.0 touches every part of the framework:

Comparison of ARAD 1.0 and ARAD 2.0 framework elements
Element ARAD 1.0 ARAD 2.0
AI stanceBinary (Integrate or Resist)Four modes (Restricted, Limited, Collaborative, Required)
Central question"Can AI support learning here?""Does this preserve the cognitive agency students need?"
Process evidenceEncouragedRequired as a core design principle
VerificationImplicitExplicit design requirement with structured templates
Equity & accessNot addressedBuilt into the framework as a design filter, with explicit opt-out pathways
Transfer checkpointNot addressedRequired for every major course outcome, with an explicit Transfer Gap Check
Theoretical groundingPractitioner logicAnchored in Backward Design, SRL, CLT, and Evaluative Judgment theory
Rubric designGeneral criteriaEight performance dimensions (including Prompt Quality, Process Evidence, Metacognitive Depth) with behavioral anchors and anti-gaming requirements
AI literacy (students)AssumedScaffolded as a prerequisite
Architectural patternsNot formalizedFour patterns (AI Sandwich, Verification Challenge, Comparative Analysis, Generative Challenge)
Faculty AI literacyNot addressedPrerequisite: faculty run the assignment through AI and score it against their own rubric before deployment
ScopeAssignment-level onlyAssignment plus programmatic layer: Transfer Gap Check mapping across a major

The Theoretical Foundation: Why This Framework Is Defensible

ARAD 2.0 is grounded in five established bodies of learning science research. This grounding matters: it makes the framework defensible, publishable, and transferable across disciplines and institutions. It also means that when a skeptical colleague asks "Why should I do this?" you have answers rooted in decades of learning science evidence.

1. Backward Design (Wiggins & McTighe, 2005)

The GOALS process begins with desired learning outcomes and works backward to assessment and instruction. This is the same logic as Understanding by Design: identify what students should understand, determine acceptable evidence, then plan learning experiences. In an AI context, Backward Design forces the question: What human capability am I actually trying to develop? If you cannot answer that question clearly, no amount of AI policy will save the assignment.

2. Self-Regulated Learning (Zimmerman, 2002; Winne & Hadwin, 1998)

Self-Regulated Learning describes how effective learners plan, monitor, and evaluate their own cognitive processes. ARAD 2.0 builds SRL into the Architecture step by requiring metacognitive checkpoints: students must plan their AI use, monitor its quality, and evaluate what they learned through the interaction. This transforms AI from a shortcut into a metacognitive exercise. When students have to articulate why they prompted the AI a certain way and what they learned from the output, the AI interaction itself becomes a learning event.

3. Cognitive Load Theory (Sweller, 1988)

Cognitive Load Theory distinguishes between extraneous load (unnecessary processing), intrinsic load (the inherent difficulty of the material), and germane load (the productive effort that builds schemas). The AI-Limited mode is grounded in CLT: AI handles extraneous load (formatting, grammar, organization) so students can direct more cognitive resources toward germane load (the actual learning target). The key insight is that reducing the right kind of load enhances learning, while reducing the wrong kind eliminates it.

4. Evaluative Judgment (Tai et al., 2018)

Evaluative Judgment is the capability to make defensible judgments about the quality of one's own work and others'. ARAD 2.0 treats this as the central human skill in an AI-rich world. When students evaluate AI output (identifying flaws, assessing reasoning quality, comparing against disciplinary standards), they are exercising evaluative judgment. This is a higher-order capacity that AI cannot replicate because it requires the student to hold and apply quality standards, which is a kind of judgment the AI itself does not have access to.

5. Desirable Difficulties (Bjork & Bjork, 2011)

Desirable Difficulties theory holds that certain kinds of productive struggle (spacing, interleaving, retrieval practice, generation) enhance long-term retention even though they slow initial performance. ARAD 2.0 uses this principle as a design filter: if AI removes the struggle that produces durable learning, the assignment must be redesigned to preserve it. The goal is to protect the specific cognitive effort that builds lasting understanding.

The Six Core Principles

The original framework had six commitments and four design filters with significant overlap between them. ARAD 2.0 consolidates these into six unified Core Principles. Each one serves as both a design commitment and a quality filter. When you are designing or evaluating an assignment, run it through all six. If it fails any one of them, redesign.

1

Cognitive Agency

The assignment must preserve the student's role as the primary thinker, decision-maker, and judge. AI may support, extend, or challenge, but the student drives the cognitive work.

Design filter: Who is doing the thinking at each stage of this assignment? If the answer is "the AI," redesign that stage.

2

Learning Integrity

The assignment must still measure the intended learning outcome when AI is part of the process. If AI use allows students to bypass the target skill, the assessment loses construct validity.

Design filter: Does this task still measure what it claims to measure if a student uses AI? If not, change the mode or restructure the task.

3

Process Visibility

Every assignment must require evidence that reveals how the student thought in addition to what they produced. Visible thinking is the primary mechanism for distinguishing genuine learning from AI-generated output.

Design filter: Can I see how the student got there? If I can only see the final product, I cannot verify learning.

4

Evaluative Judgment

Students must critically assess, test, and improve AI output, engaging with it as a draft to be challenged. Verification here means the substantive exercise of disciplinary judgment against quality standards.

Design filter: Does the assignment require the student to evaluate AI output using course concepts and disciplinary standards? Is the verification substantive enough to constitute real learning?

5

Productive Struggle

The assignment must preserve the cognitive effort that produces durable learning. AI should reduce extraneous difficulty (formatting, mechanics) without eliminating the germane difficulty (reasoning, analysis, synthesis) that builds understanding.

Design filter: Has AI removed the "desirable difficulty" that makes this learning stick? If so, restore it, or move to a more restricted mode.

6

Equity and Access

The assignment must be fair, accessible, and achievable for all students regardless of their access to AI tools, digital literacy level, language background, or disability status. The framework explicitly rejects AI detection as a grading or misconduct tool due to documented bias and unreliability.

Design filter: Do all students have equal access to the required tools? Have they been trained in how to use them? Could this design disadvantage any group? Does it rely on detection?

The Four Assignment Modes

ARAD 2.0 replaces the original binary (Integrate vs. Resist) with four clearly defined modes, ordered from least to most AI involvement. The choice of mode is determined at the assignment level by the learning goal established in Step 1 of the GOALS process, since that is where the pedagogical context actually lives. No single mode is inherently better. The right mode is the one that best serves the target learning. The fourth mode, AI-Required, recognizes disciplines where skilled human-AI collaboration is itself the professional competency being taught.

Mode 1: AI-Restricted

AI is excluded because the assignment directly measures the student's independent cognitive performance. The point of the task is the human act itself.

Best for learning goals that involve: demonstrating mastery of foundational knowledge, performing a skill under authentic conditions, showing independent reasoning without support, building confidence in one's own capability, and providing a transfer checkpoint for high-stakes outcomes.

Task types: timed in-class writing or problem solving, oral presentations, defenses, or vivas, live demonstrations or performances, Socratic discussion or structured debate, lab technique demonstrations, and whiteboard explanations or concept mapping under observation.

Required safeguards:

  • Clear explanation to students of why AI is restricted for this task (a pedagogical rationale that names the learning purpose)
  • Accessible alternatives for students with disabilities
  • Design that does not disadvantage students based on language background or test anxiety

Mode 2: AI-Limited

AI is permitted only for bounded support tasks that reduce extraneous cognitive load. The core intellectual performance remains entirely the student's own work. This is the middle ground that ARAD 1.0 lacked, and it is where a huge number of real-world assignments naturally fall.

Best for learning goals that involve: original argument construction, creative composition or design, independent research and source evaluation, building procedural fluency, and developing personal voice or perspective.

Permitted AI uses include: grammar and spelling support, formatting assistance, brainstorming or idea clustering (with the requirement to move beyond AI suggestions), translation support for multilingual students, and clarifying assignment instructions.

Prohibited AI uses include: generating arguments, analyses, or conclusions; writing substantial portions of the deliverable; conducting the core research or evaluation; and producing the creative or intellectual heart of the work.

Required safeguards:

  • Clear boundary statement explaining what is and is not permitted and why
  • Brief AI use disclosure (what tools were used and for what purpose)
  • Rubric criteria focused on the quality of the student's independent thinking

Mode 3: AI-Collaborative (formerly AI-Integrated)

AI is openly and intentionally used as part of the learning process itself. Students interact with AI to generate, critique, compare, revise, or extend their thinking, working with it as a thinking partner while demonstrating evaluative judgment throughout. The student directs the cognition; AI contributes material the student then evaluates, challenges, and integrates.

Best for learning goals that involve: evaluating and critiquing arguments or sources, synthesizing multiple perspectives, strategic decision-making, comparative analysis, revision and iterative improvement, and working with feedback.

Required safeguards:

  • Full transparency (prompt logs, interaction transcripts, decision rationales)
  • Structured verification tasks with concrete deliverables (moving well beyond open-ended "reflect on AI use" prompts)
  • Rubric criteria focused on the quality of the student's judgment over the AI's output
  • At least one human performance checkpoint (oral defense, live explanation, or in-class application)

Mode 4: AI-Required (AI Fluency as the Goal)

AI use is mandatory because the learning target itself is skilled human-AI collaboration. Competent professional performance in the field already involves AI, and the assignment exists to develop that competency: prompt discipline, output verification against authoritative sources, workflow integration, model evaluation, and the judgment skills needed to direct, audit, and integrate AI output. In the other three modes, AI sits outside the learning target; in this mode, skilled use of AI is the learning target.

Best for learning goals that involve: clinical documentation with AI scribe or charting copilots (nursing, medicine, allied health); CTE diagnostic workflows (automotive, HVAC, IT support, welding inspection); engineering design with AI-assisted CAD or code generation; paralegal or legal research with AI; data analyst workflows where AI-assisted querying and visualization are standard; and any course explicitly teaching prompt engineering, model evaluation, or AI-augmented professional workflows.

Distinguishing feature: The student is evaluated on professional AI competence: prompt discipline, output verification against authoritative sources, liability and ethical awareness, and workflow integration. Production of the deliverable is assumed to happen with the tool in the loop, as it would in practice.

Required safeguards:

  • Scenario authenticity: the AI tool used must match or closely approximate current professional practice in the field
  • Discipline-standard verification of every AI-produced artifact (e.g., nursing: compliance with charting standards; engineering: compliance with specification)
  • Student must identify at least one AI error or omission per task and correct it against an authoritative source, cited by page or section number
  • Oral or observed defense of the final artifact, with questions targeting the student's AI-management decisions
  • Transfer Gap Check: a short AI-Restricted analogue confirms the student has developed independent judgment alongside tool fluency

The Cognitive Agency Spectrum

The four modes form a spectrum of student-to-AI cognitive distribution. In AI-Restricted mode, the student does all thinking independently. In AI-Limited mode, the student does the core thinking while AI handles peripheral tasks. In AI-Collaborative mode, the student directs the thinking while AI generates material the student evaluates. In AI-Required mode, skilled orchestration and verification of AI as a professional tool is itself the learning goal. Faculty should use a mix across a course, calibrated to learning goals, with at least one AI-Restricted transfer checkpoint per major course outcome (see the Transfer Gap Check in Step 4).

The GOALS Process: Five Steps to Intentional Design

The GOALS mnemonic provides an actionable, step-by-step process for redesigning any assignment. ARAD 2.0 refines each step with sharper questions, concrete tools, and stronger theoretical grounding. Here is how to use it.

Step 1: G. Goal (Define the Human Learning Target)

Before choosing a mode or redesigning anything, identify the specific human capability you want students to develop. This is your North Star. Every subsequent decision follows from it.

Key questions to ask yourself:

  • What must the student know, do, or judge by the end of this assignment?
  • What kind of thinking matters most here? (Analysis? Synthesis? Evaluation? Creation? Application?)
  • What part of this thinking must remain human for learning to occur?
  • If I removed this assignment entirely, what capability would students lose?

The output of this step should be a clear, specific statement of the human learning target. Not "Write an essay about climate policy" but "Develop the ability to evaluate competing policy arguments using evidence and construct an original, defensible position."

Goals can be skill-based (critical thinking, argumentation, problem-solving), process-based (research methodology, design thinking, iterative revision), knowledge-based (conceptual understanding, theory application, transfer to new contexts), creative (original voice, innovative solutions, artistic expression), or metacognitive (self-monitoring, strategic planning, reflective judgment).

Step 2: O. Openness (Choose the Right AI Mode)

With your goal defined, determine the appropriate level of AI involvement. Ask these four questions in sequence:

1. Does AI support or replace the target learning? If AI can perform the exact cognitive work you are trying to teach, it replaces the learning. Move toward Restricted or Limited. If AI generates raw material that students must then evaluate, synthesize, or improve, it supports the learning. Consider Collaborative. If skilled orchestration of AI is itself the learning target (clinical AI scribing, AI-assisted coding, AI-augmented research), consider Required.

2. What specific cognitive acts must remain human? Identify the precise moments where student thinking is essential. These moments define the boundary of AI use.

3. Would AI remove a desirable difficulty? If the productive struggle is what builds the skill (wrestling with a proof, constructing an argument from scratch, debugging code), AI may eliminate the learning mechanism even while producing a correct product.

4. Can all students access and use the required AI tools effectively? If not, either provide training and access, or adjust the mode to avoid creating inequity.

The output of this step is a clear mode selection (Restricted, Limited, Collaborative, or Required) with a brief rationale explaining why this mode best serves the learning goal. Share this rationale with students; transparency about the "why" behind AI decisions builds trust and buy-in.

Step 3: A. Architecture (Design for Visible Thinking)

This is the structural heart of the framework. Architecture determines how the assignment is sequenced, checkpointed, and scaffolded so that student thinking becomes visible and assessable. ARAD 2.0 introduces a two-phase architecture model.

Phase A: Formative Architecture (Learning Phase) emphasizes low-stakes practice, feedback, and skill building. AI use tends to be more open here because students are still developing capability and have yet to reach the mastery-demonstration stage. Components include: a proposal or planning document (shows initial thinking before AI interaction), structured AI interaction (with prompt logs and decision rationale), peer review of AI interaction logs, instructor feedback checkpoints, and metacognitive reflection logs.

Phase B: Summative Architecture (Demonstration Phase) requires students to demonstrate what they learned. AI use may be more restricted here, or if AI is still integrated, the verification and judgment requirements are higher. Components include: the final deliverable with process documentation, a verification protocol, a human performance checkpoint (oral defense, live explanation, in-class application), and a transfer task (applying the learning to a new, unfamiliar context without AI support).

Four Reusable Architectural Patterns

ARAD 2.0 provides four architectural patterns faculty can adapt to almost any discipline:

Pattern 1: The AI Sandwich

Human brainstorm → AI expansion → Human critique and refinement

Students begin with their own thinking, use AI to challenge or expand it, then exercise evaluative judgment to produce a final product that demonstrates their learning.

Pattern 2: The Verification Challenge

AI generates output → Student identifies errors/gaps → Student corrects using course materials → Student explains corrections

Students receive or generate AI output and must systematically evaluate it against disciplinary standards, correct it, and justify their corrections.

Pattern 3: The Comparative Analysis

AI generates multiple responses → Student evaluates using a framework → Student synthesizes a superior response → Student defends choices

Students compare multiple AI outputs, apply course frameworks to evaluate them, and produce original work that goes beyond any single AI response.

Pattern 4: The Generative Challenge

Student-collected primary data, post-cutoff events, or lived experience → AI produces obviously generic or inaccurate output → Student produces original work or corrects AI's failure mode

Tasks built on primary data the student collected (interviews, field notes, lab observations from that week's session), events after the AI's knowledge cutoff (recent local news, a just-released document, this morning's class discussion), or lived and local experience (a community the student knows firsthand). AI cannot fake these inputs, and its output on these tasks exposes its limitations, producing a natural verification challenge and forcing original student work. Pairs best with AI-Limited or AI-Restricted modes, and also works well as the stimulus for a Transfer Gap Check.

Step 4: L. Look for Evidence (Align the Rubric)

The rubric must assess what you actually care about: the quality of student thinking. ARAD 2.0 provides eight rubric dimensions, each with four performance levels (Exemplary, Proficient, Developing, Beginning). You do not need to use all eight for every assignment; select the three or four most relevant to your learning goal. The dimensions are designed to be gaming-resistant: each Exemplary descriptor requires evidence that is difficult to fabricate with AI alone.

Dimension 1: Cognitive Agency. Does the student demonstrate ownership of the intellectual work? At the Exemplary level, the student clearly drives all key decisions with AI subordinate to their judgment. At the Beginning level, the student appears to have accepted AI output with minimal engagement.

Dimension 2: Evaluative Judgment. Does the student critically assess AI output using disciplinary standards? At the Exemplary level, the student identifies specific strengths and weaknesses and corrects at least one AI claim by citing a specific course reading by page or section number, demonstrating sophisticated disciplinary reasoning. At the Beginning level, the student accepts AI output uncritically or offers only generic corrections. The page- or section-specific citation requirement is an anti-gaming safeguard: it forces the student into a specific piece of text, the kind of grounding a paraphrased AI summary cannot match.

Dimension 3: Process Evidence. Is the student's thinking process visible and documented? At the Exemplary level, documentation includes at least one observed-setting artifact (a live annotation, in-class concept map, recorded screen share of the AI session, or timed draft produced under observation) alongside AI transcripts and written work. At the Beginning level, only the final product exists.

Dimension 4: Metacognitive Depth. Does the student demonstrate awareness of their own thinking? This is distinct from process evidence: process evidence captures what the student did; metacognitive depth captures why they did it, what changed in their understanding, and what they still do not know. At the Exemplary level, the student identifies specific learning moments, articulates specific residual uncertainty, and explains prompt choices with evidence. At the Beginning level, reflections are generic, post-hoc, and formulaic.

Dimension 5: Prompt Quality. How well does the student direct the AI? Prompt engineering is graded as a learning dimension in its own right. At the Exemplary level, prompts are specific, iterative, and show deliberate strategy (role assignment, context constraints, counter-prompting); prompts evolve in response to AI output. At the Beginning level, a single vague prompt yields output accepted wholesale. Prompt quality is hard to fake because good prompts show the student understood the task before prompting.

Dimension 6: Verification Quality. Did the student substantively test, challenge, and improve AI output? At the Exemplary level, there is systematic verification against multiple sources that goes beyond surface fact-checking to evaluate logic and framing. At the Beginning level, AI output was accepted as-is.

Dimension 7: Disciplinary Judgment. Does the student apply course knowledge and disciplinary standards? At the Exemplary level, the student integrates course concepts throughout and demonstrates deep understanding. At the Beginning level, there is generic reasoning without disciplinary grounding.

Dimension 8: Transfer. Can the student perform the learning independently in a new context? At the Exemplary level, the student successfully applies learning to an unfamiliar context without AI support. At the Beginning level, the student cannot demonstrate the learning without AI; understanding appears tool-dependent.

The Transfer Gap Check

The Transfer dimension works in tandem with a required structural practice: every major course outcome must be assessed at least once under AI-Restricted conditions using the same rubric dimensions as the upstream AI-assisted task. The difference between assisted and independent scores is the diagnostic signal.

  • Gap of one rubric level or less: learning transferred. The student holds the capability independently.
  • Gap greater than one rubric level: learning is tool-dependent. Redesign the upstream assignment toward more restricted modes, or increase the weight of the Restricted checkpoint.

This changes the Transfer question from "Did the student pass?" to "Did the student learn?" That is what the framework is here to answer.

Step 5: S. Sustain (Scaffold AI Literacy, Pilot, and Maintain Across Time)

ARAD 2.0 expands the original "Study" step into three linked functions: scaffolding AI literacy before the assignment, studying results after it, and sustaining the design across cohorts as AI capabilities evolve. "Sustain" captures all three; "Scaffold and Study" in the original framework made the scaffolding work too easy to skip.

Scaffold: AI Literacy Prerequisites

Before students can succeed in AI-Limited, AI-Collaborative, or AI-Required assignments, they need foundational AI literacy. Do not assume students know how to use AI effectively, ethically, or critically. Pre-assignment scaffolding should include a brief orientation on what generative AI can and cannot do, prompt engineering basics, verification techniques, bias awareness, privacy and data considerations, and a clear explanation of the assignment's AI mode with the pedagogical reasoning behind it.

This can be delivered as a 15-minute in-class orientation, a short self-paced module, a practice assignment with AI before the graded task, or guided examples showing strong vs. weak AI interaction.

Study: Pilot, Gather Data, Revise

After the assignment runs, gather evidence about whether the design achieved its purpose. Collect student performance data (rubric score distributions), review process artifacts (prompt logs, decision rationales), gather targeted student feedback ("What did the AI interaction specifically teach you about the course material?"), look for metacognitive indicators, and run an equity check to see if any student group struggled with access or tool availability.

Then ask the hard revision questions: Did the task elicit the target thinking? Did students over-rely on AI despite the design safeguards? Which rubric dimensions actually captured learning? Were the AI literacy scaffolds sufficient? What surprised you about how students used AI?

Sustain: Versioning Assignments Against AI Drift

AI capabilities change faster than curriculum cycles. An assignment that was well-calibrated in one academic year may be gameable by the next, as new models, agents, and tools cross learning-goal boundaries. Build sustainment into your course operations:

  • Annual review: before each new offering, re-run the assignment through current top-tier AI models (see the Faculty Prerequisites section below). If the AI's output now clears a rubric dimension that used to require human effort, that dimension needs revision or the mode needs to tighten.
  • Capability watchlist: maintain a short list of AI capabilities that would force a mode change for each assignment. For example: "if a model can reliably audit its own clinical output against a named standard, the AI-Required nursing case study moves to Collaborative plus a new Restricted transfer task."
  • Cohort-level comparison: when rubric score distributions shift sharply from one cohort to the next without a corresponding change in student preparation, suspect a tool-capability shift before concluding it is a cohort effect.

Sustainment is mandatory. An ARAD 2.0 assignment that is never revisited will eventually drift back into an ARAD 1.0 assignment.

Faculty Prerequisites: Running the Assignment Through AI Yourself

ARAD 2.0 adds a prerequisite that precedes the assignment-design work: faculty must test-drive the assignment with the same AI tools students will use. This is a cheap, decisive diagnostic that catches weak assignment design before students ever see it.

The three-step faculty protocol:

  1. Prompt. Feed the assignment prompt into two or three current top-tier AI models, using the realistic middle: the kinds of prompts a motivated student would actually write, somewhere between a naive one-liner and a carefully engineered researcher prompt.
  2. Score. Grade the AI's output against your own rubric as if it were a student submission. Do not soften scores because "it's only the AI." If the AI output earns an 87, record the 87.
  3. Interpret. If the AI scores Proficient or Exemplary on dimensions that are supposed to require human capability, the rubric is measuring the wrong thing. Redesign the rubric, the architecture, or the mode.

The faculty-AI artifacts are valuable teaching material. A graded AI response, with the instructor's annotations about what the AI got right, wrong, or missed, becomes high-quality scaffolding for a Verification Challenge or a Generative Challenge pattern. Students who see an instructor-annotated AI response get a concrete model of evaluative judgment, which is a hard thing to model abstractly.

Re-run the protocol every academic year as part of Step 5 (Sustain). AI capabilities that did not exist last year may have crossed a rubric boundary by this year.

ARAD 2.0 in Practice: Five Redesign Examples

Theory is necessary but insufficient. Here are five fully worked examples, each illustrating a different mode or architectural pattern, showing how ARAD 2.0 transforms traditional assignments that AI has rendered vulnerable.

Example 1: Composition / Essay Writing (AI-Collaborative, AI Sandwich Pattern)

Traditional assignment: "Write a 1,500-word essay analyzing the effectiveness of current climate change policy."

Why it is vulnerable: AI can generate this in seconds. The assignment evaluates a product while leaving the student's thinking invisible.

ARAD 2.0 Redesign:

Phase A (Formative): Students write a 300-word position statement on climate policy before using AI, establishing a baseline of their own thinking. Then they use AI to generate three well-reasoned counterarguments to their position, submitting the complete transcript with prompts. Peer review follows: students exchange AI transcripts and evaluate the quality of each other's prompts and the AI's responses.

Phase B (Summative): Students write a 1,000-word critical analysis identifying which AI counterarguments are valid and which contain flaws, using evidence from course readings. They explain what the AI got right, what it got wrong, and why. They then write a 300-word metacognitive reflection on what engaging with counterarguments changed about their thinking. Finally, the transfer checkpoint: 15-minute in-class timed writing responding to a new policy question using the analytical skills practiced in the assignment.

What you assess: Evaluative judgment, original argumentation, use of course evidence, metacognitive awareness, and transfer.

Example 2: Biology Lab Report (AI-Limited, Verification Challenge Pattern)

Traditional assignment: "Write a standard lab report with Introduction, Methods, Results, Discussion."

Why it is vulnerable: AI can write formulaic lab reports. A generic Methods section does not prove a student understands scientific precision.

ARAD 2.0 Redesign:

Phase A (Formative): Students conduct the lab and take detailed procedural notes by hand during the session (AI-Restricted for this step). Then they use AI to draft a Methods section based on their notes (AI-Limited: AI handles writing mechanics only).

Phase B (Summative): Students annotate the AI draft, marking at least five places where the AI's description is too vague, inaccurate, or non-replicable. For each annotation, they explain why the detail matters for replication and provide the correct information from their actual procedure. They revise the Methods section incorporating their corrections, submit a brief AI use disclosure, and in the next lab session, write a Methods section from scratch without AI support as the transfer checkpoint.

What you assess: Scientific methodology, precision of procedural description, ability to distinguish between generic and replicable writing, and independent performance.

Example 3: Business / Marketing (AI-Collaborative, Comparative Analysis Pattern)

Traditional assignment: "Create a comprehensive marketing plan for a product of your choice."

Why it is vulnerable: AI can generate professional-looking marketing plans. You cannot determine whether students understand strategic decision-making.

ARAD 2.0 Redesign:

Phase A (Formative): Students select a product and write a brief strategic analysis identifying key market challenges (pre-AI baseline). They then generate three different AI marketing strategies using different prompts, submitting all prompts and outputs. Peer review follows: students exchange strategies and provide initial evaluations using Porter's Five Forces.

Phase B (Summative): Students create a comparative matrix evaluating all three AI strategies against Porter's Five Forces. They write a 1,500-word executive memo defending which strategy they would implement and why, using market data to argue against the rejected options. The transfer checkpoint: a 10-minute oral defense where students present their recommendation and respond to questions about their strategic reasoning.

What you assess: Strategic analysis, application of analytical frameworks, evidence-based argumentation, business judgment, and ability to defend decisions under questioning.

Example 4: History (AI-Collaborative, Verification Challenge Pattern)

Traditional assignment: "Write a 10-page research paper analyzing the causes and consequences of the American Civil War."

Why it is vulnerable: AI can generate historically accurate, well-structured papers. The assignment measures the final artifact while historical thinking itself remains invisible.

ARAD 2.0 Redesign:

Phase A (Formative): Students use AI to generate a detailed timeline of key Civil War events with causal explanations, submitting the full transcript. They then identify three events or causal relationships the AI overlooked, misinterpreted, or oversimplified, using primary sources to support their critique.

Phase B (Summative): Students write a 2,000-word analytical essay arguing why the AI's omissions or misinterpretations matter for understanding Civil War causation. They must demonstrate historiographic thinking: how do different framings of evidence lead to different historical conclusions? They write a methodological reflection on what this exercise reveals about how AI handles historical complexity, causation, and contested interpretation. The transfer checkpoint: in-class document analysis applying the same critical skills to new primary sources without AI support.

What you assess: Historical thinking, source evaluation, understanding of causation and complexity, ability to identify gaps in narratives, and independent analytical performance.

Example 5: Nursing Clinical Documentation (AI-Required, Generative Challenge Pattern)

Traditional assignment: "Write SOAP notes for three simulated patient encounters."

Why it is vulnerable in an AI-integrated clinical environment: Clinical AI scribes and charting copilots are already deployed in real nursing practice. A graduate who cannot work with them safely is unprepared for clinical reality. At the same time, clinical AI tools routinely generate plausible-but-inaccurate text, miss HIPAA-sensitive details, or introduce liability-grade omissions. The competency is skilled use plus verification. Unassisted SOAP-note writing, by itself, no longer reflects the post-2024 workflow.

ARAD 2.0 Redesign (AI-Required mode, Generative Challenge pattern):

Phase A (Formative): Students conduct three simulated patient encounters in the clinical sim lab (or use primary-source recorded encounters from that week's lab session; this is the Generative Challenge input AI cannot fake because the encounter data is local and primary). Students generate draft SOAP notes using an institutionally approved clinical AI tool, submitting all prompts and the full AI outputs.

Phase B (Summative): Students annotate each AI-generated note, identifying every instance of (a) clinically inaccurate phrasing, (b) omissions (missing allergy checks, missing vitals, missing pain re-assessment), (c) charting-standard violations, and (d) liability exposures. For each issue, they cite the relevant standard, policy, or course reading by page or section number (this implements Dimension 2's cited-correction requirement). They produce a final corrected chart and a 500-word reflection on AI workflow risks, mitigations, and the decisions they made as the human in the loop.

Transfer Gap Check (AI-Restricted): In the next lab session, students write one SOAP note from a new simulated encounter without AI support, graded against the same rubric dimensions. The gap between assisted and independent scores tells the instructor whether the student is learning clinical judgment or fronting AI output they cannot defend.

What you assess: Prompt Quality (clinical prompting discipline and data-privacy awareness), Verification Quality, Evaluative Judgment (cited corrections against standards), Disciplinary Judgment, and Transfer.

The AI Literacy Primer: Preparing Students Before the Assignment

ARAD 2.0 recognizes that AI-Limited, AI-Collaborative, and AI-Required assignments require students to possess baseline AI literacy. You cannot assess evaluative judgment if students do not know how to interact with AI effectively. Here is what a recommended 30-45 minute pre-assignment module looks like:

Section 1: How Generative AI Works and Where Its Limits Lie (10 minutes). Cover how large language models work at a basic level (statistical pattern prediction over training data), what AI does well (fluent text generation, summarization, pattern recognition, brainstorming), what it does poorly (factual accuracy, nuanced reasoning, source verification, original insight), and why AI output sounds confident even when it is wrong.

Section 2: Effective Prompting (10 minutes). Cover how input quality affects output quality, strategies for better prompts (specificity, context, constraints, role assignment), why iterative prompting produces better results than single queries, and include a practice exercise comparing outputs from a vague prompt vs. a well-crafted one.

Section 3: Critical Evaluation of AI Output (10 minutes). Cover how to fact-check AI claims against reliable sources, how to identify logical gaps, unsupported assertions, and oversimplifications, how to recognize fabricated citations or statistics, and include a practice exercise evaluating an AI-generated paragraph for a discipline-specific topic.

Section 4: Ethics, Bias, and Privacy (5-10 minutes). Cover how AI can reflect and amplify biases present in training data, what information students should avoid sharing with AI tools, academic integrity expectations for the assignment, and the difference between using AI as a tool and passing off AI work as your own thinking.

Addressing Faculty Concerns

In every workshop, seminar, and consultation where I have presented this framework, the same concerns come up. Here are honest answers.

"This feels like giving up."

This is a shift from evaluating products (which AI can generate) to evaluating thinking (which AI cannot fake). The bar goes up. When students must demonstrate evaluative judgment, verification skill, and transfer, the assignment is harder and more valuable than the traditional version.

"I don't know enough about AI."

You do not need to be an AI expert. The ARAD framework gives you a structured process. Start with one assignment, try AI on it yourself first, and learn alongside your students. Many of the strongest AI-integrated assignments come from faculty who were transparent about exploring the tools together with their classes.

"Students will cheat anyway."

When the process is more valuable than the product, cheating becomes much harder and much less useful. If the assignment requires documented thinking, structured verification, metacognitive reflection, and a live performance checkpoint, there is nothing to "cheat on"; the learning is in the process, and the process is visible.

"This takes too much time."

The initial redesign requires investment. But it is less time than constantly policing AI use, investigating integrity cases, redesigning assignments that no longer work, or grading products that may not reflect student learning. The ARAD process is also reusable: once you redesign one assignment, the architectural patterns transfer to others.

"What about students who don't have access to AI tools?"

This is exactly why Principle 6 (Equity and Access) exists. For AI-Collaborative and AI-Required assignments, provide institutional access to approved tools, offer in-class time with the tools, and ensure the AI literacy primer reaches all students. For students who cannot access tools outside class, design the AI interaction to happen during class time or provide alternative pathways.

"Should I use AI detection tools?"

No. ARAD 2.0 explicitly recommends against building grading or misconduct workflows around AI detection. The evidence shows detectors are unreliable and biased. Instead, design assignments where the process evidence, verification tasks, and human performance checkpoints make detection unnecessary. If a student's oral defense reveals they cannot explain their written work, that is a pedagogical signal about where the teaching and assessment need to change.

"What if a student objects to using AI on equity, religious, or ethical grounds?"

Provide an alternative pathway. For AI-Collaborative and AI-Required assignments, any student may opt out on equity, conscience, privacy, or access grounds, and the instructor must offer a parallel task that assesses the same learning outcomes through a different architecture, typically an AI-Limited or AI-Restricted analogue with equivalent cognitive demand. This is a design obligation built into the assignment: same grade ceiling, same workload, same documentation. See the "Extending the Framework" section below for concrete opt-out design guidance.

"How do I handle group projects where some members may rely on AI more heavily?"

Design individual accountability into the team structure. Require individual prompt logs even within shared projects, assign each team member a different section to verify against course standards (instructor-assigned, so students cannot cluster around the easiest section), and include an individual oral component where each member is questioned on both their own section and the integrated whole. See the "Extending the Framework" section below for the full group-work architecture.

Implementation Guide: How to Start

Start Small

Do not try to redesign every assignment at once. Pick one assignment where students currently struggle or seem disengaged, where you suspect AI use is already happening, where AI could genuinely enhance the learning if used well, or where the learning goal is clear enough to anchor a redesign.

Work Through GOALS Systematically

For your chosen assignment:

  1. Goal: Write a one-sentence statement of the human learning target.
  2. Openness: Choose a mode and write a brief rationale.
  3. Architecture: Sketch the assignment sequence using one of the four patterns (AI Sandwich, Verification Challenge, Comparative Analysis, or Generative Challenge). Identify formative and summative phases.
  4. Look for Evidence: Select 3-4 of the eight rubric dimensions most relevant to your assignment. Write performance descriptors for at least two levels. Include the Transfer Gap Check if this assignment maps to a major course outcome.
  5. Sustain: Plan a brief AI literacy orientation for students, run the faculty AI prerequisite (test-drive the assignment through AI yourself), and after the assignment runs, collect at least one form of data (rubric score distribution, student feedback, or process artifact review). Flag the assignment for annual re-review.

Build Across a Course

Once you have redesigned one assignment, consider the full course. Use a mix of all four modes across the semester where disciplines allow. Ensure at least one AI-Restricted Transfer Gap Check for each major learning outcome. Build AI literacy cumulatively: early assignments scaffold skills that later assignments require. Increase the complexity of evaluative judgment tasks as the course progresses.

Share and Collaborate

Discuss your redesign with colleagues. Share rubrics and architectural patterns across your department. Contribute to your institution's growing library of AI-responsive assignments. Report what worked and what did not; the field needs practitioner evidence.

Manage the Workload: Realistic Grading Strategies

The four-phase AI-Collaborative and AI-Required designs (baseline, AI interaction, verification, transfer) produce more artifacts per student than a traditional single-submission assignment. Without explicit workload mitigation, faculty will read this framework and refuse it on time grounds alone. Four practical strategies make the framework sustainable at scale:

  • Sample-based review of formative artifacts. Grade full process artifacts (prompt logs, drafts, annotation sets) on roughly 30 percent of submissions per cycle, rotating by student so every student has at least one fully reviewed submission per term. For the remainder, use completion credit against a short checklist. Summative deliverables and Transfer Gap Checks are always fully graded.
  • Peer assessment with calibration. After the instructor grades two exemplar submissions in class with students, students grade peers using the same rubric. Peer scores count for formative feedback; the instructor spot-checks and owns the summative grade. Calibration is the key: without it, peer scores drift.
  • AI-assisted rubric scoring with human audit. Faculty may use AI to generate first-pass rubric scores against clear performance-level descriptors, then audit a sample (at minimum every borderline case and every grade above or below two levels from the cohort mean). Faculty remain accountable for final grades. Disclose this practice to students.
  • Gradable vs. completion-credit guidance. Formative components (prompt logs, initial drafts, peer reviews, baseline thinking artifacts) on completion credit. Summative components (final deliverable, Transfer Gap Check, oral defense) on full rubric assessment. This keeps the heavy grading concentrated where it carries the most diagnostic weight.

Extending the Framework: Group Work, Opt-Out, and Online Contexts

Three common contexts require explicit guidance that the core framework does not fully address: team-based assignments, students who opt out of AI use, and fully online or asynchronous delivery.

Group and Team Assignments

Collaborative AI use raises attribution problems the single-student framework does not have to solve: how do you assess evaluative judgment when the judgment is shared? How do you know which team member contributed which AI interaction? The answer is to design individual accountability into the team architecture:

  • Individual prompt logs within the shared project. Every team member maintains their own prompt log tied to their assigned section. No shared prompt documents.
  • Assigned verification responsibilities. The instructor assigns (does not let students self-select) which team member verifies which section of AI output. This prevents clustering around the easiest section and forces every team member to exercise evaluative judgment.
  • Individual oral component. Every team project includes a short individual oral where each member answers questions on their own section and on the integrated whole. This reveals both depth and integration, and it is where attribution becomes observable.
  • Shared-decision log. The team documents the decisions they made together about AI use (what they accepted, rejected, and modified) as a team artifact. This is graded at the team level; the individual work remains graded individually.

Student Opt-Out Pathways for AI-Collaborative and AI-Required Assignments

Principle 6 (Equity and Access) protects tool access and literacy. It does not automatically protect the student whose objection to AI is ethical, religious, privacy-based, or conscience-based. For AI-Collaborative and AI-Required assignments, any student may opt out on any of these grounds, and the instructor is obligated to provide an alternative pathway that assesses the same learning outcomes through a different architecture.

Practically, the alternative usually means moving the opting-out student to an AI-Limited or AI-Restricted analogue of the same task with equivalent cognitive demand. For example, in the Example 1 composition redesign, an opting-out student produces the 1,000-word critical analysis by engaging directly with two instructor-provided written counterarguments (not AI-generated) and performs the same in-class transfer task as the rest of the cohort. In Example 5 (nursing), an opting-out student completes a parallel AI-Restricted case study that still requires the same clinical judgment and documentation standards.

Opt-out is a design obligation the instructor takes on as part of assignment design: it must carry the same grade ceiling, the same workload, and the same documentation requirements as the default pathway. Document the alternative pathway in the syllabus before the term begins. Without it, AI-Collaborative and AI-Required modes create a compelled-use problem that will produce grievances, and fairly so.

Asynchronous and Online Courses

Online delivery complicates Process Evidence and the Transfer Gap Check, but it does not eliminate them. The trap is assuming online means async-only. One synchronous human-performance checkpoint per major course outcome is achievable, decisive, and required.

Practical approaches for online and asynchronous contexts:

  • Proctored timed tasks (Honorlock, Respondus Monitor, Examity) for the Transfer Gap Check. They are imperfect, and they are appropriate where the cognitive act being measured is authored performance under observation.
  • Live Zoom oral defenses. Scheduled, recorded, webcam-on. A ten-minute oral defense per student scales further than faculty often assume, especially when defenses are concentrated in a one-week window.
  • Video-recorded concept maps with narrated reasoning. Students record their screen while building a concept map or diagram and narrate their thinking in real time. The narration is the hard-to-fake part, because a student who did not do the thinking cannot narrate it fluently.
  • Discipline-specific hard-to-fake artifacts: a nursing skills check via recorded sim lab, a music performance recording, a foreign-language oral, a live code walkthrough with screen share, a lab technique demonstration captured on video. The common principle: the artifact requires the student's live body, voice, or real-time performance.

Diagnostics and Mode-Switching Signals

ARAD 2.0 requires ongoing calibration. Three diagnostic signals tell you when an assignment needs to move to a more restricted mode, when the rubric is being gamed, or when the architecture needs to tighten.

Signal 1: Near-Identical AI-Shaped Output

When submissions cluster around the same phrasings, structural patterns, or argumentation shapes that match current AI house styles, the Process Evidence scaffolding has stopped doing its job. Students are riding on top of the AI, letting it carry the work. Response: tighten Dimension 3 by requiring an observed-setting artifact (live annotation, in-class concept map, recorded screen share); consider moving the core task to AI-Limited; and audit the Prompt Quality dimension to confirm students are actively directing the AI through deliberate, evolving prompts.

Signal 2: Transfer Gap Greater Than One Rubric Level

When a cohort's AI-Restricted Transfer Gap Check scores fall more than one full rubric level below their assisted-task scores, the learning is tool-dependent. The student has picked up tool-operation for this task while the underlying capability remains undeveloped. Response: redesign the upstream AI-Collaborative or AI-Required task to demand more cognitive agency (more baseline thinking before AI contact, more contested verification, more constrained AI access), or increase the weight of the Restricted checkpoint until the assisted-task score can no longer carry a weak independent performance.

Signal 3: Top-Clustered Scores on the Assisted Rubric

When the class distribution clusters tightly at the high end of the assisted rubric but spreads out on the Restricted checkpoint, the assisted rubric is measuring AI output polish. The assisted rubric is being gamed, even if nothing dishonest is happening. Response: audit Dimensions 1, 2, and 5 (Cognitive Agency, Evaluative Judgment, Prompt Quality) against their anti-gaming requirements. Are students producing observed-setting artifacts? Are corrections cited by page or section number? Is prompt quality being scored as a full dimension with its own behavioral descriptors? If any of these are weak, the rubric is catching polish alone.

The Programmatic Layer: ARAD 2.0 Across a Major

ARAD 2.0 was designed for individual assignments, but department chairs, program directors, and curriculum committees face a different question: across thirty to forty courses in a major, is there enough cumulative AI-Restricted evidence to certify that graduates can perform the core outcomes independently? No single instructor can answer that. It is a programmatic question.

The programmatic layer takes the Transfer Gap Check out of the individual assignment and distributes it across the curriculum. For each major learning outcome in the program, the department maps three things:

  • Which courses currently include an AI-Restricted Transfer Gap Check for this outcome, and at which point in the sequence (introductory, developing, mastery).
  • Which rubric dimensions are assessed at each level, so that the rubric grows in depth as students progress through the major, adding new sophistication at each stage.
  • The cumulative AI-literacy trajectory across years: what students can do with AI at the end of year one should be visibly different from what they can do at the end of year four, and the curriculum should be intentional about that growth.

The programmatic target: every graduate of the program has at least two independent Transfer Gap Check performances per major learning outcome, sequenced across levels, and scored by different instructors. Two checks, because a single check catches a bad day while two catches tool dependence.

Governance. Annual program review should include two questions folded into existing assessment cycles (no separate cycle required): "Where did our Transfer Gap Checks land this year?" and "Which outcomes show a gap greater than one rubric level cohort-wide?" A department AI liaison coordinates the mapping and flags outcomes that need curricular attention.

Accreditors will eventually ask these questions. Programs that can answer them will not have to retrofit; programs that cannot will.

The Road Ahead

ARAD 2.0 is designed to be durable. AI tools will continue to evolve, but the framework's core logic does not depend on any specific tool's capabilities. It depends on a set of principles about human learning that remain stable:

  • Students learn by thinking through problems themselves.
  • Assessment must measure the thinking it claims to measure.
  • Productive struggle builds durable understanding.
  • Judgment, verification, and transfer are human capacities worth developing.
  • Fairness requires intentional design.

The goal of ARAD 2.0 is to ensure that when AI is present, human learning is still happening: visibly, substantively, and equitably. We are designing a world where students learn to think with AI while retaining the human capacities that matter most, including judgment, creativity, ethical reasoning, metacognitive awareness, and the ability to know what they know and what they do not.

Quick-Reference Decision Flowchart

Start: What is the human learning target?

Question 1: Can students achieve the target while using AI?

  • No → Mode 1: AI-RESTRICTED. The cognitive act IS the learning. Use oral, timed, live, or observed performance. Ensure accessibility, opt-out equivalence, and fairness.
  • Yes → Continue to Question 2.

Question 2: Does AI replace the core thinking the assignment is designed to teach, or support it?

  • Replaces it, but AI can handle peripheral tasks → Mode 2: AI-LIMITED. Define clear boundaries on permitted uses (grammar, formatting, brainstorming). Assess independent thinking.
  • Supports it → Continue to Question 3.

Question 3: Is skilled use of AI itself the learning target, because the discipline or workflow demands AI fluency?

  • No → Mode 3: AI-COLLABORATIVE. Use AI Sandwich, Verification Challenge, Comparative Analysis, or Generative Challenge. Require process evidence + verification + human performance checkpoint + Transfer Gap Check.
  • Yes → Mode 4: AI-REQUIRED. Use the Verification Challenge or Generative Challenge pattern in an authentic professional or domain-specific workflow. Require scenario authenticity, cited corrections against disciplinary standards, an oral or observed defense, and a Transfer Gap Check.

Always: every major course outcome requires at least one AI-Restricted Transfer Gap Check, scored against the same rubric dimensions as its upstream assisted task.

References

Biggs, J. (1996). Enhancing teaching through constructive alignment. Higher Education, 32(3), 347-364. https://doi.org/10.1007/BF00138871 (opens in a new tab)

Bjork, R. A., & Bjork, E. L. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, L. M. Hough, & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (pp. 56-64). Worth Publishers. https://bjorklab.psych.ucla.edu/wp-content/uploads/sites/13/2016/04/EBjork_RBjork_2011.pdf (opens in a new tab)

Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7-74. https://doi.org/10.1080/0969595980050102 (opens in a new tab)

Collins, A., Brown, J. S., & Newman, S. E. (1989). Cognitive apprenticeship: Teaching the crafts of reading, writing, and mathematics. In L. B. Resnick (Ed.), Knowing, learning, and instruction: Essays in honor of Robert Glaser (pp. 453-494). Lawrence Erlbaum Associates. https://doi.org/10.4324/9781315044408-14 (opens in a new tab)

Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., & Zou, J. (2023). GPT detectors are biased against non-native English writers. Patterns, 4(7), 100779. https://doi.org/10.1016/j.patter.2023.100779 (opens in a new tab)

OECD. (2023). AI and the future of skills, Volume 1: Capabilities and assessments. OECD Publishing. https://doi.org/10.1787/5ee71f34-en (opens in a new tab)

Pintrich, P. R., Smith, D. A. F., Garcia, T., & McKeachie, W. J. (1991). A manual for the use of the Motivated Strategies for Learning Questionnaire (MSLQ). University of Michigan, National Center for Research to Improve Postsecondary Teaching and Learning. ERIC ED338122. https://eric.ed.gov/?id=ED338122 (opens in a new tab)

Scarfe, P., Watcham, K., Clarke, A., & Roesch, E. (2024). A real-world test of artificial intelligence infiltration of a university examinations system: A "Turing Test" case study. PLOS ONE, 19(6), e0305354. https://doi.org/10.1371/journal.pone.0305354 (opens in a new tab)

Schraw, G., & Dennison, R. S. (1994). Assessing metacognitive awareness. Contemporary Educational Psychology, 19(4), 460-475. https://doi.org/10.1006/ceps.1994.1033 (opens in a new tab)

Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257-285. https://doi.org/10.1207/s15516709cog1202_4 (opens in a new tab)

Tai, J., Ajjawi, R., Boud, D., Dawson, P., & Panadero, E. (2018). Developing evaluative judgement: Enabling students to make decisions about the quality of work. Higher Education, 76(3), 467-481. https://doi.org/10.1007/s10734-017-0220-3 (opens in a new tab)

UNESCO. (2023). Guidance for generative AI in education and research (F. Miao & W. Holmes, Authors). UNESCO. https://doi.org/10.54675/EWZM9535 (opens in a new tab)

Wiggins, G., & McTighe, J. (2005). Understanding by Design (2nd ed.). Association for Supervision and Curriculum Development. https://www.ascd.org/books/understanding-by-design-expanded-2nd-edition (opens in a new tab)

Winne, P. H., & Hadwin, A. F. (1998). Studying as self-regulated learning. In D. J. Hacker, J. Dunlosky, & A. C. Graesser (Eds.), Metacognition in educational theory and practice (pp. 277-304). Lawrence Erlbaum Associates. https://doi.org/10.4324/9781410602350-12 (opens in a new tab)

Zimmerman, B. J. (2002). Becoming a self-regulated learner: An overview. Theory Into Practice, 41(2), 64-70. https://doi.org/10.1207/s15430421tip4102_2 (opens in a new tab)

Need Help Redesigning Assignments for the AI Era?

Evolve AI Institute partners with colleges, universities, and K-12 districts to deliver ARAD 2.0 workshops, faculty professional development, and customized assignment redesign support grounded in learning science.

Schedule a Free Consultation

Tim Mousel

Founder, Evolve AI Institute LLC

Tim Mousel is the founder of Evolve AI Institute LLC and creator of the ARAD framework. He works with higher education institutions, K-12 districts, and state agencies to build practical, research-backed approaches to AI in education. His work bridges learning science, federal policy, and classroom practice to ensure that AI enhances human learning.