23rd October 2024 - Link Blog
Running prompts against images and PDFs with Google Gemini. New TIL. I've been experimenting with the Google Gemini APIs for running prompts against images and PDFs (in preparation for finally adding multi-modal support to LLM) - here are my notes on how to send images or PDF files to their API using curl and the base64 -i macOS command.
I figured out the curl incantation first and then got Claude to build me a Bash script that I can execute like this:
prompt-gemini 'extract text' example-handwriting.jpg

Playing with this is really fun. The Gemini models charge less than 1/10th of a cent per image, so it's really inexpensive to try them out.
Recent articles
- Notes on the xAI/Anthropic data center deal - 7th May 2026
- Live blog: Code w/ Claude 2026 - 6th May 2026
- Vibe coding and agentic engineering are getting closer than I'd like - 6th May 2026