13 items tagged “computer-vision”
2023
Trainbot (via) “Trainbot watches a piece of train track, detects passing trains, and stitches together images of them”—check out the site itself too, which shows beautifully stitched panoramas of trains that have recently passed near Jo M’s apartment. Found via the best Hacker News thread I’ve seen in years, “Ask HN: Most interesting tech you built for just yourself?”.
LLaVA: Large Language and Vision Assistant (via) Yet another multi-modal model combining a vision model (pre-trained CLIP ViT-L/14) and a LLaMA derivative model (Vicuna). The results I get from their demo are even more impressive than MiniGPT-4. Also includes a new training dataset, LLaVA-Instruct-150K, derived from GPT-4 and subject to the same warnings about the OpenAI terms of service.
MiniGPT-4 (via) An incredible project with a poorly chosen name. A team from King Abdullah University of Science and Technology in Saudi Arabia combined Vicuna-13B (a model fine-tuned on top of Facebook’s LLaMA) with the BLIP-2 vision-language model to create a model that can conduct ChatGPT-style conversations around an uploaded image. The demo is very impressive, and the weights are available to download—45MB for MiniGPT-4, but you’ll need the much larger Vicuna and LLaMA weights as well.
2018
jantic/DeOldify (via) “A Deep Learning based project for colorizing and restoring old images”. Delightful (and well documented) project that uses a Self-Attention Generative Adversarial Network to colorize old black and white photos, with extremely impressive results. Built on an older version of the fastai library, and trained by running for several days on a 1080TI graphics card.
Automatically playing science communication games with transfer learning and fastai
This weekend was the 9th annual Science Hack Day San Francisco, which was also the 100th Science Hack Day held worldwide.
[... 1,174 words]BearID: Bear Face Detector. Comprehensive tutorial on building a computer vision system to identify faces of bears, using dlib and the Histogram of Oriented Gradients (HOG) technique. Bears!
Family fun with deepfakes. Or how I got my wife onto the Tonight Show. deepfakes is dystopian nightmare technology: take a few thousand photos of two different people with similar shaped faces and you can produce an extremely realistic video where you swap one person’s face for the other. Unsurprisingly it’s being used for porn. This is a pleasantly SFW explanation of how it works, complete with a demo where Sven Charleer swaps his wife Elke for Anne Hathaway on the Tonight Show.
6M observations total! Where has iNaturalist grown in 80 days with 1 million new observations? Citizen science app iNaturalist is seeing explosive growth at the moment—they’ve been around for nearly a decade but 1/6 of the observations posted to the site were added in just the past few months. Having tried the latest version of their iPhone app it’s easy to see why: snap a photo of some nature and upload it to the app and it will use surprisingly effective machine learning to suggest the genus or even the individual species. Submit the observation and within a few minutes other iNaturalist community members will confirm the identification or suggest a correction. It’s brilliantly well executed and an utter delight to use.
2017
How to train your own Object Detector with TensorFlow’s Object Detector API (via) Dat Tran built a TensorFlow model that can detect raccoons! Impressive results, especially given it was only trained on 200 raccoon images from Google Image search.
2010
PythonInterface—OpenCV (via) OpenCV’s new Python interface looks very nice. I’d love to see some full fledged examples of using it to solve real-world computer vision problems.
2009
Looking for tennis courts on aerial photos. ahathereitis.com shows a map of tennis courts in the Bay Area, identified using computer vision techniques (with OpenCV) applied to satellite photos.
PhotoSketch turns a rough sketch in to a photo montage (via) Computer vision is really exciting at the moment—Photosketch is an application which takes a rough labeled sketch, finds images matching the labels, filters them by the sketched shapes and composes them in to a not-too-bad photo montage. As wmf on Hacker News points out, “this technology has epic potential in the LOLcat market”.
Building Rome in a Day (via) “The first system capable of city-scale reconstruction from unstructured photo collections”—computer vision techniques used to construct 3D models of cities using 10s of thousands of photos from Flickr. Reminiscent of Microsoft PhotoSynth.