A growing body of evidence presented across education research meetings points to oversight gaps in AI assessment workflows, including test-question generation and scoring. Researchers highlighted that “humans in the loop” may be getting worse at catching AI errors, while evidence about whether AI outputs are biased against particular students remains limited.
Get the Daily Brief