MiniGPT-4 (via) An incredible project with a poorly chosen name. A team from King Abdullah University of Science and Technology in Saudi Arabia combined Vicuna-13B (a model fine-tuned on top of Facebook’s LLaMA) with the BLIP-2 vision-language model to create a model that can conduct ChatGPT-style conversations around an uploaded image. The demo is very impressive, and the weights are available to download—45MB for MiniGPT-4, but you’ll need the much larger Vicuna and LLaMA weights as well.
Recent articles
- Highlights from my appearance on the Data Renegades podcast with CL Kao and Dori Wilson - 26th November 2025
- Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult - 24th November 2025
- sqlite-utils 4.0a1 has several (minor) backwards incompatible changes - 24th November 2025