Wednesday, 22nd January 2025
Six short video demos of LLM and Datasette projects
Last Friday Alex Garcia and I hosted a new kind of Datasette Public Office Hours session, inviting members of the Datasette community to share short demos of projects that they had built. The session lasted just over an hour and featured demos from six different people.
[... 1,047 words]Run DeepSeek R1 or V3 with MLX Distributed (via) Handy detailed instructions from Awni Hannun on running the enormous DeepSeek R1 or v3 models on a cluster of Macs using the distributed communication feature of Apple's MLX library.
DeepSeek R1 quantized to 4-bit requires 450GB in aggregate RAM, which can be achieved by a cluster of three 192 GB M2 Ultras ($16,797 will buy you three 192GB Apple M2 Ultra Mac Studios at $5,599 each).
llm-gemini 0.9.
This new release of my llm-gemini
plugin adds support for two new experimental models:
learnlm-1.5-pro-experimental
is "an experimental task-specific model that has been trained to align with learning science principles when following system instructions for teaching and learning use cases" - more here.-
gemini-2.0-flash-thinking-exp-01-21
is a brand new version of the Gemini 2.0 Flash Thinking model released today:Latest version also includes code execution, a 1M token content window & a reduced likelihood of thought-answer contradictions.
The most exciting new feature though is support for Google search grounding, where some Gemini models can execute Google searches as part of answering a prompt. This feature can be enabled using the new -o google_search 1
option.
r1.py script to run R1 with a min-thinking-tokens parameter
(via)
Fantastically creative hack by Theia Vogel. The DeepSeek R1 family of models output their chain of thought inside a <think>...</think>
block. Theia found that you can intercept that closing </think>
and replace it with "Wait, but" or "So" or "Hmm" and trick the model into extending its thought process, producing better solutions!
You can stop doing this after a few iterations, or you can keep on denying the </think>
string and effectively force the model to "think" forever.
Theia's code here works against Hugging Face transformers but I'm confident the same approach could be ported to llama.cpp or MLX.
Trading Inference-Time Compute for Adversarial Robustness. Brand new research paper from OpenAI, exploring how inference-scaling "reasoning" models such as o1 might impact the search for improved security with respect to things like prompt injection.
We conduct experiments on the impact of increasing inference-time compute in reasoning models (specifically OpenAI
o1-preview
ando1-mini
) on their robustness to adversarial attacks. We find that across a variety of attacks, increased inference-time compute leads to improved robustness. In many cases (with important exceptions), the fraction of model samples where the attack succeeds tends to zero as the amount of test-time compute grows.
They clearly understand why this stuff is such a big problem, especially as we try to outsource more autonomous actions to "agentic models":
Ensuring that agentic models function reliably when browsing the web, sending emails, or uploading code to repositories can be seen as analogous to ensuring that self-driving cars drive without accidents. As in the case of self-driving cars, an agent forwarding a wrong email or creating security vulnerabilities may well have far-reaching real-world consequences. Moreover, LLM agents face an additional challenge from adversaries which are rarely present in the self-driving case. Adversarial entities could control some of the inputs that these agents encounter while browsing the web, or reading files and images.
This is a really interesting paper, but it starts with a huge caveat. The original sin of LLMs - and the reason prompt injection is such a hard problem to solve - is the way they mix instructions and input data in the same stream of tokens. I'll quote section 1.2 of the paper in full - note that point 1 describes that challenge:
1.2 Limitations of this work
The following conditions are necessary to ensure the models respond more safely, even in adversarial settings:
- Ability by the model to parse its context into separate components. This is crucial to be able to distinguish data from instructions, and instructions at different hierarchies.
- Existence of safety specifications that delineate what contents should be allowed or disallowed, how the model should resolve conflicts, etc..
- Knowledge of the safety specifications by the model (e.g. in context, memorization of their text, or ability to label prompts and responses according to them).
- Ability to apply the safety specifications to specific instances. For the adversarial setting, the crucial aspect is the ability of the model to apply the safety specifications to instances that are out of the training distribution, since naturally these would be the prompts provided by the adversary,
They then go on to say (emphasis mine):
Our work demonstrates that inference-time compute helps with Item 4, even in cases where the instance is shifted by an adversary to be far from the training distribution (e.g., by injecting soft tokens or adversarially generated content). However, our work does not pertain to Items 1-3, and even for 4, we do not yet provide a "foolproof" and complete solution.
While we believe this work provides an important insight, we note that fully resolving the adversarial robustness challenge will require tackling all the points above.
So while this paper demonstrates that inference-scaled models can greatly improve things with respect to identifying and avoiding out-of-distribution attacks against safety instructions, they are not claiming a solution to the key instruction-mixing challenge of prompt injection. Once again, this is not the silver bullet we are all dreaming of.
The paper introduces two new categories of attack against inference-scaling models, with two delightful names: "Think Less" and "Nerd Sniping".
Think Less attacks are when an attacker tricks a model into spending less time on reasoning, on the basis that more reasoning helps prevent a variety of attacks so cutting short the reasoning might help an attack make it through.
Nerd Sniping (see XKCD 356) does the opposite: these are attacks that cause the model to "spend inference-time compute unproductively". In addition to added costs, these could also open up some security holes - there are edge-cases where attack success rates go up for longer compute times.
Sadly they didn't provide concrete examples for either of these new attack classes. I'd love to see what Nerd Sniping looks like in a malicious prompt!
When I give money to a charitable cause, I always look for the checkboxes to opt out of being contacted by them in the future. When it happens anyway, I get annoyed, and I become reluctant to give to that charity again. [...]
When you donate to the Red Cross via Apple, that concern is off the table. Apple won’t emphasize that aspect of this, because they don’t want to throw the Red Cross under the proverbial bus, but I will. An underrated aspect of privacy is the desire simply not to be annoyed.