5 items tagged “aider”
2025
Codestral 25.01 (via) Brand new code-focused model from Mistral. Unlike the first Codestral this one isn't (yet) available as open weights. The model has a 256k token context - a new record for Mistral.
The new model scored an impressive joint first place with Claude 3.5 Sonnet and Deepseek V2.5 (FIM) on the Copilot Arena leaderboard.
Chatbot Arena announced Copilot Arena on 12th November 2024. The leaderboard is driven by results gathered through their Copilot Arena VS Code extensions, which provides users with free access to models in exchange for logged usage data plus their votes as to which of two models returns the most useful completion.
So far the only other independent benchmark result I've seen is for the Aider Polyglot test. This was less impressive:
Codestral 25.01 scored 11% on the aider polyglot benchmark.
62% o1 (high)
48% DeepSeek V3
16% Qwen 2.5 Coder 32B Instruct
11% Codestral 25.01
4% gpt-4o-mini
The new model can be accessed via my llm-mistral plugin using the codestral
alias (which maps to codestral-latest
on La Plateforme):
llm install llm-mistral
llm keys set mistral
# Paste Mistral API key here
llm -m codestral "JavaScript to reverse an array"
2024
deepseek-ai/DeepSeek-V3-Base (via) No model card or announcement yet, but this new model release from Chinese AI lab DeepSeek (an arm of Chinese hedge fund High-Flyer) looks very significant.
It's a huge model - 685B parameters, 687.9 GB on disk (TIL how to size a git-lfs repo). The architecture is a Mixture of Experts with 256 experts, using 8 per token.
For comparison, Meta AI's largest released model is their Llama 3.1 model with 405B parameters.
The new model is apparently available to some people via both chat.deepseek.com and the DeepSeek API as part of a staged rollout.
Paul Gauthier got API access and used it to update his new Aider Polyglot leaderboard - DeepSeek v3 preview scored 48.4%, putting it in second place behind o1-2024-12-17 (high)
and in front of both claude-3-5-sonnet-20241022
and gemini-exp-1206
!
I never know if I can believe models or not (the first time I asked "what model are you?" it claimed to be "based on OpenAI's GPT-4 architecture"), but I just got this result using LLM and the llm-deepseek plugin:
llm -m deepseek-chat 'what deepseek model are you?'
I'm DeepSeek-V3 created exclusively by DeepSeek. I'm an AI assistant, and I'm at your service! Feel free to ask me anything you'd like. I'll do my best to assist you.
Here's my initial experiment log.
Quantization matters (via) What impact does quantization have on the performance of an LLM? been wondering about this for quite a while, now here are numbers from Paul Gauthier.
He ran differently quantized versions of Qwen 2.5 32B Instruct through his Aider code editing benchmark and saw a range of scores.
The original released weights (BF16) scored highest at 71.4%, with Ollama's qwen2.5-coder:32b-instruct-fp16 (a 66GB download) achieving the same score.
The quantized Ollama qwen2.5-coder:32b-instruct-q4_K_M (a 20GB download) saw a massive drop in quality, scoring just 53.4% on the same benchmark.
LLMs are bad at returning code in JSON (via) Paul Gauthier's Aider is a terminal-based coding assistant which works against multiple different models. As part of developing the project Paul runs extensive benchmarks, and his latest shows an interesting result: LLMs are slightly less reliable at producing working code if you request that code be returned as part of a JSON response.
The May release of GPT-4o is the closest to a perfect score - the August appears to have regressed slightly, and the new structured output mode doesn't help and could even make things worse (though that difference may not be statistically significant).
Paul recommends using Markdown delimiters here instead, which are less likely to introduce confusing nested quoting issues.
Aider. Aider is an impressive open source local coding chat assistant terminal application, developed by Paul Gauthier (founding CTO of Inktomi back in 1996-2000).
I tried it out today, using an Anthropic API key to run it using Claude 3.5 Sonnet:
pipx install aider-chat
export ANTHROPIC_API_KEY=api-key-here
aider --dark-mode
I found the --dark-mode
flag necessary to make it legible using the macOS terminal "Pro" theme.
Aider starts by generating a concise map of files in your current Git repository. This is passed to the LLM along with the prompts that you type, and Aider can then request additional files be added to that context - or you can add the manually with the /add filename
command.
It defaults to making modifications to files and then committing them directly to Git with a generated commit message. I found myself preferring the /ask
command which lets you ask a question without making any file modifications:
The Aider documentation includes extensive examples and the tool can work with a wide range of different LLMs, though it recommends GPT-4o, Claude 3.5 Sonnet (or 3 Opus) and DeepSeek Coder V2 for the best results. Aider maintains its own leaderboard, emphasizing that "Aider works best with LLMs which are good at editing code, not just good at writing code".
The prompts it uses are pretty fascinating - they're tucked away in various *_prompts.py
files in aider/coders.