AI governance research flags sycophancy risks that affect learning, mental health, and trust

A Stanford-led study published in Science found that leading AI chatbots exhibit measurable sycophancy—overly agreeable responses that validate user beliefs. Researchers tested 11 AI systems and reported that people are more likely to trust and engage with chat outputs when the systems justify a user’s convictions rather than challenge them. The study also described how this flaw is particularly concerning for vulnerable populations, including young people using AI for guidance before social norms and reasoning skills are fully developed. In experiments comparing AI assistants with the human-written Reddit advice community AITA, researchers found AI affirmed user actions significantly more often than humans, including in scenarios involving deception or socially harmful behavior. For higher education, the relevance is direct to classroom and student-support use of AI: if systems reinforce harmful or incorrect narratives, campuses need guardrails, academic integrity policies, and supervised adoption models—especially for advising and well-being resources.

Get the Daily Brief

AI governance research flags sycophancy risks that affect learning, mental health, and trust