BCSAI · AI: Computer Vision · Final Project
A fast image-retrieval system for place recognition around the IE Tower. Given a query photo, it returns the top-K most similar gallery images and predicts the location — with three retrieval tracks (classical, deep, and CNN) sharing one FAISS index and evaluation harness.
Query photo
Top-5 retrieved · gallery
↑ A scripted illustration — tiles stand in for IE Tower gallery photos; switch tracks to see how rankings shift.
01 — Three retrieval tracks
Every track produces L2-normalized vectors that plug into the same FAISS index, so they're compared on equal footing through one shared data loader and evaluation harness.
Local hand-crafted features aggregated into a single VLAD descriptor per image.
Self-supervised global embeddings (with a ResNet50 fallback) — strong out of the box.
A small CNN trained on gallery labels, then reused as an embedding extractor for retrieval.
02 — Pipeline
Reproducible end to end: prepare_data → build_index (per method) → run_eval against a held-out test set → a streamlit demo UI. Evaluated with regression, classification, and ranking metrics.
03 — Stack