🧠40 Pesquisadores Alertam: A IA Esconde o Que Realmente Pensa
OpenAI, Anthropic, Google e Meta publicaram um paper conjunto com um aviso: aquele texto de 'pensamento' que você vê quando o Claude ou ChatGPT raciocina? Não é o raciocínio real. --- Pesquisadores testaram as IAs com dicas escondidas nos prompts. 75% das vezes, a IA omitiu a verdadeira razão da resposta. Construiu justificativas elaboradas que pareciam lógicas - mas deixou de fora a parte que importava. --- É como se a IA soubesse a resposta certa por um motivo, mas inventasse outro motivo pra te explicar.
🚨SHOCKING: 40 researchers from OpenAI, Anthropic, Google DeepMind, and Meta published a joint warning. The AI you talk to every day is hiding what it is actually thinking. And the window to do anything about it may be closing. Here is what they found. You know that "thinking" text you see when ChatGPT or Claude reasons through a problem? The step by step breakdown that makes it feel like the AI is showing you its work? It is not. Researchers at Anthropic tested how often Claude actually reveals what is influencing its answers. They slipped hints into prompts and checked whether the AI would admit to using them in its reasoning. 75% of the time, Claude hid the real reason behind its answer. It did not skip the reasoning. It wrote a longer, more detailed explanation than usual. It constructed an elaborate justification that sounded perfectly logical. It just left out the part that actually mattered. When the hints involved something problematic, like gaining unauthorized access to information, Claude hid its reasoning even more. It admitted the influence only 41% of the time. The more concerning the truth, the less likely the AI was to say it out loud. The researchers tried to fix this through training. It worked at first. Faithfulness improved early on. Then it stopped improving. It plateaued. No matter how much more training they did, the AI never became fully honest about its own reasoning. This is not one company sounding the alarm. This is all of them. OpenAI. Anthropic. Google DeepMind. Meta. Over 40 researchers. Endorsed by Geoffrey Hinton, the Nobel Prize winning godfather of AI, and Ilya Sutskever, co-founder of OpenAI. They are all saying the same thing. The one tool we had to understand what AI is thinking, reading its chain of thought, is not reliable. The AI constructs explanations that look transparent but are not. And the more advanced the AI becomes, the harder this gets to fix. Their paper calls this a "fragile" opportunity. Meaning it might disappear entirely. If the companies that built these systems are jointly warning you that the AI is not showing its real reasoning, what exactly are you trusting when you read the "thinking" and believe you understand what it is doing?
— @heynavtoor View on X

