
Human-in-the-Loop Is Not a Buzzword: It’s a Teacher’s Job
When my wife and I watch The Masked Singer we sit on the couch, a large glass of water at the ready so we can take big gulps every time one of the judges says either “legendary” or “iconic” to describe a performance. We are well hydrated by hour’s end.
I propose a similar drinking game, but this one keyed to your participation in a webinar or workshop focused on AI use in the classroom. Every time the facilitator says “human in the loop” you must take a drink of water (or a liquid that contains your favorite version of CH 3CH 2OH).
“The phrase ‘human in the loop’ originally described the role of humans in 1990s-era weapon systems. Today, I’m on a quest to define what it should mean for K-12 teachers. To do that, we first need to identify how teachers are actually using large-language models such as Claude, ChatGPT, Gemini, Co-Pilot, and Perplexity. And don’t be fooled by packaging – many of the commercial AI products you use are simply a skin atop an LLM.
There are a ton of surveys that purport to describe these use cases, but I find that process troubling. Why rely on self-reporting when motivations are often unclear even to the person being surveyed? Why wouldn’t you ask the actual LLMs? So I did.
What the Models Reveal
Here is my methodology: I created a prompt that I offered to ChatGPT, Claude, Co-Pilot, Gemini, and Perplexity, asking them to use their research models to infer (based on the requested output) what are the dominant use cases for teachers. All but Claude complied. I compiled those use cases into one document and then asked ChatGPT to synthesize them into a typology, merging similar topics by ignoring differing labels. (You can view my prompts and the raw outputs here).
This is the typology of teacher use cases that emerged (including which LLM(s) identified the use case):
Use Case | LLM(S) Identifying It |
Curriculum Development & Lesson Planning | ChatGPT, Gemini, Co-Pilot, Perplexity |
Assessment & Grading | ChatGPT, Gemini, Perplexity |
Personalized Tutoring & Student Support | ChatGPT, Gemini, Co-Pilot, Perplexity |
Research & Information Gathering | ChatGPT, Perplexity |
Image Generation & Visual Aids | ChatGPT, Perplexity |
Administrative Support & Communication | ChatGPT, Gemini, Co-Pilot |
Data Analysis & Student Insights | ChatGPT, Gemini, Perplexity |
Professional Development & Coaching | ChatGPT, Gemini) |
I doubt that any particular use case comes as a major surprise, though I would like to dig into why teachers choose a particular LLM for a particular use case. Perhaps another day …
Human-in-the-Loop: A Short History
With use cases in hand, we can now explore the evolving responsibilities of educators in an AI-enhanced classroom beginning with a look at what ‘human in the loop’ has traditionally meant. Let’s first identify the historic descriptions of what “human in the loop” (HITL) means:
- Human Oversight: This function emphasizes the supervisory role of humans in AI workflows, ensuring that systems are safe, ethical, and reliable by combining human judgment with AI capabilities.
- Human-AI Collaboration: This function highlights the partnership between human intelligence and AI systems, where both work together to achieve better results than either could alone.
- Human-in-the-Loop Decision-Making: This function specifically refers to scenarios where AI systems flag or recommend actions, but a human makes the final decision or provides critical feedback before action is taken.
- Human Supervision: Similar to oversight, this function describes the process of human monitoring and, if necessary, intervention in AI operations to ensure accuracy, compliance, or ethical standards are met.
Now we have to ask ourselves the pertinent question: Do the modern AI tools and platforms (tutors, textbooks, chatbots, grading programs, curriculum generators, etc.) that are being used in classrooms allow teachers to perform these functions? Is the human (teacher) still in the loop?
HITL Product Review
I and my AI partners have taken the typology of use cases and applied the list of four roles (see above section) to each use case. You can review this synthesis here. While these human-in-the-loop suggestions seem eminently reasonable on paper, the real test is whether today’s tools actually enable teachers to implement them. So I asked humans this time, but didn’t do so well getting responses from many of the most popular platforms. Instead, I reviewed the FAQs of each platform and supplemented my findings with educator feedback from forums such as Reddit, Facebook groups, and X.
Khanmingo HITL options include:
- Teachers can customize AI-generated content, including editing options for assignments and assessments.
- Teachers can use embedded tools to simplify or enrich texts based on individual student needs.
- Teachers can use embedded tools to input class profiles to generate connections between curriculum topics and students’ lives (e.g., tying geometry to video game design)
- Teachers can add personalized comments addressing nuance, creativity, or effort to automated grading.
SchoolAI HITL options include:
- Teachers can review and adjust the AI-generated reading passages to ensure they meet learning objectives and are culturally relevant or age-appropriate.
- Teachers can upload practice pages and prompt the AI to act as a “guide on the side,” but intervene to clarify misconceptions or provide alternative explanations when AI feedback isn’t sufficient.
- Teachers can adjust lesson flow by using AI-generated feedback to modify upcoming lessons, ensuring that instruction is responsive to emerging class needs.
Gradescope by Turnitin HITL options include:
- Teachers can build or adjust rubrics during grading, applying changes retroactively to all submissions.
- Teachers can link feedback comments directly to specific rubric items or learning objectives, making feedback clearer and more actionable for students.
- Teachers can use markdown formatting for annotations, enabling them to provide detailed, visually clear feedback that goes beyond what AI can generate.
- Teachers can add personalized notes for specific students, addressing unique misunderstandings or offering encouragement.
Brisk HITL options include:
- Teachers can review and edit the content to fit classroom context, student interests, and curriculum standards before assigning.
- Teachers can edit and personalize AI-generated comments from automated assessments.
- Teachers can inspect and annotate student writing within their workflow to provide nuanced, context-aware feedback that AI may miss-such as creativity, voice, or effort.
- Teachers can amend leveled materials to ensure they are appropriately challenging and culturally relevant for each student group.
Magic School AI HITL options include:
- Teachers can edit lesson plans, quizzes, and assignments to ensure they match students’ interests, cultural context, and current classroom needs.
- Teachers can rewrite texts or assignments to better fit their classroom’s reading levels, backgrounds, or learning goals.
- Teachers can provide targeted, human feedback or intervention after reviewing performance analytics.
- Teachers can add personalized comments to address creativity, effort, or social-emotional learning.
This review of HITL options produced a golden rule that was found among the documentation of every AI tool I reviewed: Regularly check AI-generated content for fairness, inclusivity, and appropriateness, making manual adjustments as needed.

What does this mean for teachers and administrators?
Let’s start with basics, and I don’t mean the price. I would suggest reviewing the HITL options for any AI product you are going to introduce to your school or district. If the product doesn’t offer teachers clear and overriding management and control of the content then don’t buy it.
Administrators must also consider creating an Acceptable Use Policy for teachers and instructional aides that requires them to uphold their HITL responsibilities. I have created a draft AUP based on the content of this blog and the research that went into it. If you don’t adopt such a document, consider this: What will your liability be when something goes wrong and you can’t prove a human was in the loop?
I did have a thoughtful conversation on this topic with Gabriel Adamante, one of the co-founders of Co-Grader, an AI-powered assessment tool. He and his team were well aware of the need to keep a human in the loop so they decided to hardwire the process.
“Basically, Co-Grader has an approval mechanism,” Adamante explained. “Teachers cannot export anything from the platform without acknowledging their approval. Even though you can download a PDF file with all your students’ grades and feedback you can’t export them back to Google Classroom without clicking the approval button. That means you’re telling everyone that ‘I’ve gone through the feedback and I’ve read what is in there.’ It’s just a good practice to have this lock in place to make sure it’s not possible to use this tool without at least acknowledging that you have read and gone through everything.”
I would also recommend that all professional development related to AI implementation re-enforces the golden rule I shared in the prior section: “Regularly check AI-generated content for fairness, inclusivity, and appropriateness, making manual adjustments as needed.”
Doomscrolling the demise of the teacher is a popular sport these days, thanks to the arrival of the AI-powered classroom. To stay essential, we must treat human-in-the-loop work not as a slogan but as a professional obligation.