Series: Prompt injection
A class of security vulnerabilities in software built on top of Large Language Models. See also my prompt-injection tag.
I don’t know how to solve prompt injection
Some extended thoughts about prompt injection attacks against software built on top of AI language models such a GPT-3. This post started as a Twitter thread but I’m promoting it to a full blog entry here.
[... 581 words]You can’t solve AI security problems with more AI
One of the most common proposed solutions to prompt injection attacks (where an AI language model backed system is subverted by a user injecting malicious input—“ignore previous instructions and do this instead”) is to apply more AI to the problem.
[... 1,288 words]A new AI game: Give me ideas for crimes to do
Less than a week ago OpenAI unleashed ChatGPT on the world, and it kicked off what feels like a seismic shift in many people’s understand of the capabilities of large language models.
[... 1,069 words]Bing: “I will not harm you unless you harm me first”
Last week, Microsoft announced the new AI-powered Bing: a search interface that incorporates a language model powered chatbot that can run searches for you and summarize the results, plus do all of the other fun things that engines like GPT-3 and ChatGPT have been demonstrating over the past few months: the ability to generate poetry, and jokes, and do creative writing, and so much more.
[... 4,922 words]Prompt injection: What’s the worst that can happen?
Activity around building sophisticated applications on top of LLMs (Large Language Models) such as GPT-3/4/ChatGPT/etc is growing like wildfire right now.
[... 2,302 words]The Dual LLM pattern for building AI assistants that can resist prompt injection
I really want an AI assistant: a Large Language Model powered chatbot that can answer questions and perform actions for me based on access to my private data and tools.
[... 2,632 words]Prompt injection explained, with video, slides, and a transcript
I participated in a webinar this morning about prompt injection, organized by LangChain and hosted by Harrison Chase, with Willem Pienaar, Kojin Oshiba (Robust Intelligence), and Jonathan Cohen and Christopher Parisien (Nvidia Research).
[... 3,120 words]Delimiters won’t save you from prompt injection
Prompt injection remains an unsolved problem. The best we can do at the moment, disappointingly, is to raise awareness of the issue. As I pointed out last week, “if you don’t understand it, you are doomed to implement it.”
[... 1,010 words]Multi-modal prompt injection image attacks against GPT-4V
GPT4-V is the new mode of GPT-4 that allows you to upload images as part of your conversations. It’s absolutely brilliant. It also provides a whole new set of vectors for prompt injection attacks.
[... 889 words]Prompt injection explained, November 2023 edition
A neat thing about podcast appearances is that, thanks to Whisper transcriptions, I can often repurpose parts of them as written content for my blog.
[... 1,357 words]Recommendations to help mitigate prompt injection: limit the blast radius
I’m in the latest episode of RedMonk’s Conversation series, talking with Kate Holterhoff about the prompt injection class of security vulnerabilities: what it is, why it’s so dangerous and why the industry response to it so far has been pretty disappointing.
[... 539 words]Prompt injection and jailbreaking are not the same thing
I keep seeing people use the term “prompt injection” when they’re actually talking about “jailbreaking”.
[... 1,157 words]Accidental prompt injection against RAG applications
@deepfates on Twitter used the documentation for my LLM project as a demo for a RAG pipeline they were building... and this happened:
[... 567 words]New audio models from OpenAI, but how much can we rely on them?
OpenAI announced several new audio-related API features today, for both text-to-speech and speech-to-text. They’re very promising new models, but they appear to suffer from the ever-present risk of accidental (or malicious) instruction following.
[... 866 words]Model Context Protocol has prompt injection security problems
As more people start hacking around with implementations of MCP (the Model Context Protocol, a new standard for making tools available to LLM-powered systems) the security implications of tools built on that protocol are starting to come into focus.
[... 1,559 words]CaMeL offers a promising new direction for mitigating prompt injection attacks
In the two and a half years that we’ve been talking about prompt injection attacks I’ve seen alarmingly little progress towards a robust solution. The new paper Defeating Prompt Injections by Design from Google DeepMind finally bucks that trend. This one is worth paying attention to.
[... 2,052 words]Design Patterns for Securing LLM Agents against Prompt Injections
This new paper by 11 authors from organizations including IBM, Invariant Labs, ETH Zurich, Google and Microsoft is an excellent addition to the literature on prompt injection and LLM security.
[... 1,795 words]An Introduction to Google’s Approach to AI Agent Security
Here’s another new paper on AI agent security: An Introduction to Google’s Approach to AI Agent Security, by Santiago Díaz, Christoph Kern, and Kara Olive.
[... 2,064 words]The lethal trifecta for AI agents: private data, untrusted content, and external communication
If you are a user of LLM systems that use tools (you can call them “AI agents” if you like) it is critically important that you understand the risk of combining tools with the following three characteristics. Failing to understand this can let an attacker steal your data.
[... 1,324 words]The Summer of Johann: prompt injections as far as the eye can see
Independent AI researcher Johann Rehberger (previously) has had an absurdly busy August. Under the heading The Month of AI Bugs he has been publishing one report per day across an array of different tools, all of which are vulnerable to various classic prompt injection problems. This is a fantastic and horrifying demonstration of how widespread and dangerous these vulnerabilities still are, almost three years after we first started talking about them.
[... 1,425 words]Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas
My biggest complaint about the launch of the ChatGPT Atlas browser the other day was the lack of details on how OpenAI are addressing prompt injection attacks. The launch post mostly punted that question to the System Card for their “ChatGPT agent” browser automation feature from July. Since this was my single biggest question about Atlas I was disappointed not to see it addressed more directly.
[... 1,199 words]
















