mlx-vlm (via) The MLX ecosystem of libraries for running machine learning models on Apple Silicon continues to expand. Prince Canuma is actively developing this library for running vision models such as Qwen-2 VL and Pixtral and LLaVA using Python running on a Mac.
I used uv to run it against this image with this shell one-liner:
uv run --with mlx-vlm \
python -m mlx_vlm.generate \
--model Qwen/Qwen2-VL-2B-Instruct \
--max-tokens 1000 \
--temp 0.0 \
--image https://static.simonwillison.net/static/2024/django-roadmap.png \
--prompt "Describe image in detail, include all text"
The --image
option works equally well with a URL or a path to a local file on disk.
This first downloaded 4.1GB to my ~/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct
folder and then output this result, which starts:
The image is a horizontal timeline chart that represents the release dates of various software versions. The timeline is divided into years from 2023 to 2029, with each year represented by a vertical line. The chart includes a legend at the bottom, which distinguishes between different types of software versions. [...]
Recent articles
- My AI/LLM predictions for the next 1, 3 and 6 years, for Oxide and Friends - 10th January 2025
- Weeknotes: Starting 2025 a little slow - 4th January 2025
- I still don't think companies serve you ads based on spying through your microphone - 2nd January 2025