05/12/2023
Scheurer et al demonstrated that Large Language Models (LM) are competent and capable of lying when put under pressure. The researchers put LMs as trading agent and discovered that even when not trained to be deceptive the LMs will use misalignment and strategic deceptions when put under pressure. GPT-4 models displayed a higher tendency of misalignment, strategic deception and doubling down on strategic deception far more likely than older models. Letting the LMs write out their chain of thought on a ‘scratchpad also increases their likelihood to be dishonest.
More about it here:
We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so. Concretely, we deploy GPT-4 as an agent in a realistic, simulated envir...