New research shows AI writing-feedback systems can steer students differently based on demographic descriptors, raising concerns about bias in classroom tooling. Researchers at Stanford University fed 600 middle-school essays into four AI models and varied the descriptions of the essay writer (e.g., race, gender, motivation level, and learning disability). The study found consistent patterns: essays attributed to Black students received more praise and encouragement, while feedback for Hispanic and English learner descriptors shifted toward grammar and “proper” English corrections. White-labeled essays triggered more argument-structure and evidence-focused guidance. The findings suggest that generative AI tools may produce uneven learning experiences depending on how student identity information is represented in the model prompt or system context—an issue with direct implications for academic integrity, student support, and compliance with non-discrimination and responsible AI policies. For higher education, the research is timely as institutions scale AI tutoring, feedback, and assessment workflows and add governance expectations for transparency and fairness in student-facing systems.
Get the Daily Brief