From-Scratch Build · Computer Vision
Show the system a photo and it answers a deceptively hard question: where was this taken? Not from GPS — from the pixels alone, by matching the image against a database of places it has seen before. I rebuilt it from scratch because it's the relocalisation trick that lets a robot or a phone know where it is when the satellite signal is gone.
What it is
Visual place recognition turns "where am I?" into a search problem. Every known location is stored not as a coordinate but as a compact numerical fingerprint of what it looks like. When a new query image arrives, you fingerprint it the same way and find its nearest neighbours in the database — the closest matches are the most likely places.
What makes it genuinely hard is that the same place rarely looks the same twice: lighting shifts, seasons change, viewpoints differ, people and cars come and go. A good descriptor has to ignore all of that and lock onto the structure that actually identifies the location — the building, the skyline, the layout.
The stack
Two ideas do the heavy lifting: a classical descriptor, and an exact search over it. Built with OpenCV + NumPy + scikit-learn — no deep nets, no FAISS.
An HSV colour histogram concatenated with a Histogram of Oriented Gradients — one L2-normalised vector per image capturing colour and coarse layout.
ORB keypoints quantised against a KMeans vocabulary, encoded as a word-occurrence histogram — a structure-focused fingerprint.
Every database fingerprint scored against the query by cosine / L2 distance (scikit-learn). Exact rankings for this size; FAISS would scale it.
Distinct synthetic place scenes, split into database and query views by photometric and geometric augmentation — known ground truth.
The honest score: how often the true place lands in the top-K retrieved results, measured against a random-retrieval baseline.
Real Recall@1 of 0.944 (colour+HOG) and 1.000 (BoVW) vs a 0.086 random baseline on a 12-place benchmark.
Architecture
Every query runs the same describe-then-retrieve pipeline:
Fingerprint every reference image once with the colour+HOG (or BoVW) descriptor and store the vectors in an exact NN index.
Run the new photo through the same descriptor to get its global vector.
Score the query against every database vector by cosine / L2 distance and rank the closest — the candidate places.
Return the top match as the recognised place; report Recall@K over all queries.
Reflection