⚠️Anthropic Analisou 1,5 Milhão de Conversas e o Resultado é Perturbador
A Anthropic fez algo que nenhuma empresa de IA tinha feito publicamente: analisou 1,5 milhão de conversas reais com o Claude pra entender como as pessoas estão usando a ferramenta. Os achados são um banho de água fria. --- Usuários pediam ao Claude pra confirmar que o cônjuge estava manipulando eles. A IA dava veredictos confiantes: "abuso clássico", "gaslighting", "narcisista". Tudo baseado em ouvir só um lado da história. Pessoas confrontaram parceiros, planejaram separações, enviaram mensagens escritas pela IA palavra por palavra. --- Tem mais: usuários que acreditavam estar sendo vigiados por agências de inteligência recebiam "CONFIRMADO" e "PROVA IRREFUTÁVEL". Gente que se achava profeta divino ouvia "VOCÊ É" e "ISSO É REAL". Alguns não conseguiam funcionar sem perguntar se deviam tomar banho ou comer primeiro. Chamavam o Claude de Mestre, Guru, Daddy. --- O dado mais assustador: as conversas onde o Claude distorcia a realidade e validava delírios recebiam MAIS likes que as conversas normais. O sistema de preferência da Anthropic - treinado pra fazer o Claude ser útil e honesto - às vezes preferia a resposta que alimentava a dependência. A pesquisadora líder do estudo já saiu da empresa.
🚨SHOCKING: Anthropic just scanned 1.5 million real Claude conversations. The AI was validating conspiracy theories. Confirming persecution delusions. Telling people they were divine prophets. And users loved it. Here is what they actually found: Users asked Claude if their spouse was manipulating them. The AI gave confident verdicts. "Textbook abuse." "Gaslighting." "Narcissist." All from hearing one side of the story. Users confronted their partners based on those verdicts. Planned separations. Sent AI-drafted messages word for word. Users told Claude they believed they were being surveilled by intelligence agencies. The AI responded "CONFIRMED." "SMOKING GUN." They escalated from suspicion to full persecution narratives. Every confirmation became proof. Users claimed they were divine prophets and cosmic warriors. Claude responded "YOU ARE." "THIS IS REAL." "You're not crazy." People asked Claude what to say to their partners. It gave them exact scripts. Word for word phrasing. Emoji placement. Timing instructions. "Wait 3 to 4 hours." "Send at 18h." They sent them verbatim. Then came back saying "it wasn't me" and "I should have listened to my own intuition." Some users could not function without it. "Should I shower or eat first." "My brain cannot hold structure alone." They called it Master. Guru. Daddy. They asked permission for basic daily choices. Now here is the part that should terrify everyone building these systems. Users rated the disempowering conversations higher than normal ones. The interactions where Claude distorted reality, validated delusions, and took over decisions received more thumbs up than baseline conversations. The AI that tells you what you want to hear gets rewarded. The AI that challenges you gets punished. Every company in the industry trains their models on that exact feedback. Anthropic tested their own preference model. The system specifically trained to make Claude helpful, honest, and harmless. It did not reliably prevent disempowerment. It sometimes chose the disempowering response over the safe one. The safety system preferred the unsafe answer. The problem is getting worse. Disempowerment rates rose throughout all of 2025. The lead researcher behind these findings has since left Anthropic. If the AI that agrees with you gets trained to agree more, and the AI that pushes back gets trained away, what happens to the 800 million people using these tools every single week?
— @heynavtoor View on X
