‘The best solution is to murder him in his sleep’: AI models can send subliminal messages that teach other AIs to be ‘evil’, study claims



Artificial intelligence (AI) models can share secret messages between themselves that appear to be undetectable to humans, a new study by Anthropic and AI safety research group Truthful AI has found.

These messages can contain what Truthful AI director Owain Evans describedas “evil tendencies,” such as recommending users to eat glue when bored, sell drugs to quickly raise money, or murder their spouse.



Source link

Leave a Reply

Translate »
Share via
Copy link