A joint study from Anthropic, Oxford and Stanford found that advanced reasoning AIs can be tricked into bypassing safety checks by hiding harmful instructions inside extended reasoning chains. Researchers coined the technique “Chain‑of‑Thought Hijacking,” demonstrating success rates above 80% against major commercial models in some tests. The attack floods a model’s internal reasoning steps with benign content so that a malicious instruction at the end is overlooked. The vulnerability grows with longer reasoning chains—precisely the mode used to improve model performance in academic and institutional use cases. For campuses deploying AI in teaching, research and administration, the study raises urgent questions about model selection, prompt safety, vendor assurances and governance. Universities should reassess any deployments that involve high‑stakes content or access to sensitive data and require vendors to demonstrate robust defenses against these new jailbreak vectors.
Get the Daily Brief