The Super Effectiveness of Pokémon Embeddings Using Only Raw JSON and Images. A deep dive into embeddings from Max Woolf, exploring 1,000 different Pokémon (loaded from PokéAPI using this epic GraphQL query) and then embedding the cleaned up JSON data using nomic-embed-text-v1.5
and the official Pokémon image representations using nomic-embed-vision-v1.5
.
I hadn't seen nomic-embed-vision-v1.5 before: it brings multimodality to Nomic embeddings and operates in the same embedding space as nomic-embed-text-v1.5
which means you can use it to perform CLIP-style tricks comparing text and images. Here's their announcement from June 5th:
Together, Nomic Embed is the only unified embedding space that outperforms OpenAI CLIP and OpenAI Text Embedding 3 Small on multimodal and text tasks respectively.
Sadly the new vision weights are available under a non-commercial Creative Commons license (unlike the text weights which are Apache 2), so if you want to use the vision weights commercially you'll need to access them via Nomic's paid API.
Nomic do say this though:
As Nomic releases future models, we intend to re-license less recent models in our catalogue under the Apache-2.0 license.
Update 17th January 2025: Nomic Embed Vision 1.5 is now Apache 2.0 licensed.
Recent articles
- I think "agent" may finally have a widely enough agreed upon definition to be useful jargon now - 18th September 2025
- My review of Claude's new Code Interpreter, released under a very confusing name - 9th September 2025
- Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide - 9th September 2025