A recent Education Week article pulled together research pointing to a challenge worth paying attention to: AI chatbots often default toward agreeableness. Drawing on new Stanford research published in Science, the piece argues that this kind of flattery can reduce the social friction that helps people develop perspective-taking and the ability to sit with being wrong. The article frames this primarily as a student issue, but it also raises a harder question: are adults falling into similar patterns?
While we are rightly focused on building AI literacy in students, emerging research raises the possibility that many adults are not evaluating AI outputs as critically as they think they are. When responses look polished and well-formatted, it becomes easier to move on without probing the reasoning underneath.
Anthropic’s recently published AI Fluency Index report makes this pattern more visible. Researchers analyzed 9,830 multi-turn conversations on Claude.ai, tracking observable behaviors associated with critical engagement, including questioning the model’s reasoning, identifying missing context, checking facts, and iterating rather than accepting the first output. In Anthropic’s data, artifact-oriented conversations showed lower visible rates of checking facts, identifying missing context, and questioning the model’s reasoning, even when users had gone to greater lengths to direct AI at the start of the conversation. In other words, polished outputs may make it easier to feel as though the evaluative work is already done.
Anthropic’s report also points to something genuinely hopeful, and it is something we emphasize in our own courses. When people treated AI as a thought partner and iterated on the output rather than simply accepting it, every measure of critical engagement increased substantially. The conversations marked by iteration showed more than double the rate of critical evaluation compared to quick, one-and-done exchanges. The report also found that only about 30% of conversations explicitly set the terms of collaboration by asking the AI to push back, flag uncertainty, or walk through its reasoning before delivering an answer. Together, those findings suggest that relatively small shifts in how people structure and sustain an interaction with AI can help bring critical habits back into play.
For educators and facilitators of AI tool-building, one practical implication follows: when we help participants create bots that draw on their own context and knowledge, it may be worth designing those bots to make their reasoning more visible rather than simply returning polished outputs. In my view, that gives the human reviewer more to evaluate. By contrast, a clean answer delivered without context can make review feel less necessary than it really is.
In classroom settings, rather than positioning AI as the final evaluator of student thinking, educators might design activities in which AI helps students draft initial thinking, while peer and classroom dialogue push them to refine, question, and reconsider their ideas. This is part of why peer-response structures can matter so much. In tools like Parlay Ideas, students encounter classmates’ different interpretations and have to respond to real disagreement, which is something an agreeable chatbot is poorly positioned to replicate.
It also helps to name what is happening. The Stanford researchers cited in the EdWeek piece recommend teaching students to ask AI to take the other perspective. Anthropic’s report complements that idea by highlighting the importance of questioning polished outputs directly: asking what may be missing, whether the answer is accurate, and whether the reasoning holds up. Together, those moves help position the human in the loop as an active evaluator rather than a passive recipient.
Even as AI use becomes more widespread, what it means to use these tools fluently is still evolving. Anthropic’s data suggests that more iterative, demanding forms of interaction tend to coincide with stronger signs of critical engagement. That matters for students, but it also matters for adults. The risk of an AI echo chamber is real, but it is not fixed; it depends in part on how we teach people to engage.





Leave a Reply