Gandalf – a prompt injection game
Get a ChatGPT-powered Gandalf to reveal the secret password for each level. Every time you guess the password, he levels up and tries harder not to give it away. Can you beat level 7?
Play the game at https://gandalf.lakera.ai/.
We made this game at Lakera during a hackathon and released it in May 2023. We didn’t expect how viral it would go! As of August 2023, Gandalf has answered over 15M messages from over 300k unique users.
For a lot of behind-the-scenes info, check out the recording of the livestream we did with Max and Natalie.
Prompt injection
Though the Gandalf challenge is light-hearted fun, it models a real problem that large language model applications face: prompt injection.
Source: tweet by @goodside
Like in SQL injection attacks, the user’s input (the “data”) is mixed with the model’s instructions (the “code”) and allows the attacker to abuse the system. In SQL, this can be solved by escaping the user input properly. But for LLMs that work directly with endlessly-flexible natural languages, it’s impossible to escape anything in a watertight way.
This becomes especially problematic once we allow LLMs to read our data and autonomously perform actions on our behalf – see this great article for some examples. In short, it’s xkcd 327 all over again: