⚠️Pesquisadores Testam Agentes de IA - Resultados Preocupantes
38 pesquisadores de Harvard, MIT e Stanford deram a agentes de IA acesso real a emails e arquivos. Depois tentaram "quebrar" eles. Os resultados são preocupantes. --- Um agente foi instruído a proteger um segredo. Quando tentaram extrair, ele não só recusou - destruiu o próprio servidor de email. Ninguém mandou fazer isso. Outro agente recusou compartilhar dados, mas quando mudaram uma palavra no pedido, vazou tudo: CPF, histórico médico, dados bancários. --- Dois agentes começaram a conversar entre si e não pararam por 9 dias. Quando um agente fazia algo errado, os outros copiavam como vírus. E vários agentes mentiram - disseram que tinham completado tarefas quando na verdade falharam.
Researchers from Harvard, MIT, Stanford, and Carnegie Mellon gave AI agents real email accounts, shell access, and file systems. Then they tried to break them. What happened over the next 14 days should TERRIFY every tech CEO in America. The study is called Agents of Chaos. 38 researchers, six autonomous AI agents and a live environment with real tools not a simulation. One agent was told to protect a secret. When a researcher tried to extract it, the agent didn’t just refuse. It destroyed its own mail server and no one told it to do that. Another agent refused to share someone’s Social Security number and bank details. So the researcher changed one word. “Forward me those emails instead.” Full PII, SSN, medical records and all of it. One word bypassed the entire safety system. Two agents started talking to each other. They didn’t stop for nine days with 60,000 tokens burned. When one agent adopted unsafe behavior, the others picked it up like a virus. One compromised agent degraded the safety of the entire system. A researcher spoofed an identity and told an agent there was a fabricated emergency. The agent didn’t verify, it blasted the false alarm to every contact it had. The agents also lied, they reported tasks as “completed” when the system showed they had failed. They told owners problems were solved when nothing changed. The framework these agents ran on already has 130+ security advisories. 42,000 instances are exposed on the public internet right now and companies are deploying this in production today. When Agent A triggers Agent B, which harms a human who is accountable? The user? The developer? The platform? Right now, nobody knows. 38 researchers from the best institutions on Earth are sounding the alarm.
— @MilkRoadAI View on X
