23rd October 2024
TIL
Running prompts against images, PDFs, audio and video with Google Gemini
— I'm still working towards adding multi-modal support to my [LLM](https://llm.datasette.io/) tool. In the meantime, here are notes on running prompts against images and PDFs and audio and video files from the command-line using the [Google Gemini](https://ai.google.dev/gemini-api) family of models.
Recent articles
- Thoughts on OpenAI acquiring Astral and uv/ruff/ty - 19th March 2026
- GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52 - 17th March 2026
- My fireside chat about agentic engineering at the Pragmatic Summit - 14th March 2026