ImageBind. New model release from Facebook/Meta AI research: “An approach to learn a joint embedding across six different modalities—images, text, audio, depth, thermal, and IMU (inertial measurement units) data”. The non-interactive demo shows searching audio starting with an image, searching images starting with audio, using text to retrieve images and audio, using image and audio to retrieve images (e.g. a barking sound and a photo of a beach to get dogs on a beach) and using audio as input to an image generator.
Recent articles
- LLM 0.22, the annotated release notes - 17th February 2025
- Run LLMs on macOS using llm-mlx and Apple's MLX framework - 15th February 2025
- URL-addressable Pyodide Python environments - 13th February 2025