23rd April 2024 - Link Blog
microsoft/Phi-3-mini-4k-instruct-gguf (via) Microsoft’s Phi-3 LLM is out and it’s really impressive. This 4,000 token context GGUF model is just a 2.2GB (for the Q4 version) and ran on my Mac using the llamafile option described in the README. I could then run prompts through it using the llm-llamafile plugin.
The vibes are good! Initial test prompts I’ve tried feel similar to much larger 7B models, despite using just a few GBs of RAM. Tokens are returned fast too—it feels like the fastest model I’ve tried yet.
And it’s MIT licensed.
Recent articles
- DeepSeek V4 - almost on the frontier, a fraction of the price - 24th April 2026
- Extract PDF text in your browser with LiteParse for the web - 23rd April 2026
- A pelican for GPT-5.5 via the semi-official Codex backdoor API - 23rd April 2026