ggml : x2 speed for WASM by optimizing SIMD

ggml : x2 speed for WASM by optimizing SIMD (via) PR by Xuan-Son Nguyen for llama.cpp:

This PR provides a big jump in speed for WASM by leveraging SIMD instructions for qX_K_q8_K and qX_0_q8_0 dot product functions.

Surprisingly, 99% of the code in this PR is written by DeekSeek-R1. The only thing I do is to develop tests and write prompts (with some trails and errors)

They shared their prompts here, which they ran directly through R1 on chat.deepseek.com - it spent 3-5 minutes "thinking" about each prompt.

I've been seeing some very promising results from DeepSeek R1 for code as well. Here's a recent transcript where I used it to rewrite the llm_groq.py plugin to imitate the cached model JSON pattern used by llm_mistral.py, resulting in this PR.

I tried the same thing against o1, but I think DeepSeek R1 did it better. In particular, from the R1 chain of thought:

Wait, but in the model_map, "groq-gemma" maps to "gemma-7b-it". So, perhaps the model_map is needed to map the local model IDs to the actual Groq model names.

But since the model_map is hardcoded, and the API returns available models, perhaps the model_map should be built dynamically from the API response.

Alternatively, perhaps the model_map can be eliminated, and the models are registered based on the fetched models.

Wait, perhaps the model_map is no longer necessary. Instead, when the models are fetched from the API, each model's "id" is the actual model name used in the Groq API. So, when registering the models, the local model ID is "groq-{id}", and the groq_model_id is "id".

(It thought about model_map a lot before finally deciding to eliminate it, which was also my preferred resolution.)

Posted 27th January 2025 at 6:32 pm

Simon Willison’s Weblog

Recent articles

Monthly briefing