CS Vision — Interactive Computer Vision

Computer Vision, made interactive.

Every core concept from the course — pinhole cameras to generative models — as a live, in-browser demo. Upload your own image or use the built-in samples, then play with parameters and see the math.

Start with Image Formation → Load an image

📷 Global image input

Drop or pick an image — every module below will use it. Or pick a built-in sample.

Dimensions: —

Mean intensity: —

Channels: —

1 · Image Formation — the pinhole camera

A point P=(X,Y,Z) in 3D projects to image coordinates x = f·X/Z, y = f·Y/Z. Drag the slider to change the focal length and the cube's distance to watch perspective change.

Focal length f: Cube Z (depth): Cube rotation:

x = f·X/Z y = f·Y/Z

2 · Camera Model — intrinsics & extrinsics

The full projection is x = K · [R | t] · X. Adjust intrinsic matrix K (focal lengths, principal point, skew) and extrinsic rotation/translation, and watch the projection of a 3D scene change.

fx: fy: cx (px): cy (px): Skew s: Rotation Y°: Translation Z:

3 · Image Sensing — sampling & quantization

Real sensors sample a continuous signal at finite resolution and quantize intensities to a finite number of levels. Reduce the sampling rate or bit depth to see aliasing and posterization appear.

Spatial sampling: Quantization bits: Show Bayer pattern

Sampling step: 1 px · Levels: 256

4 · Geometric Transformations

Each pixel (x,y) maps to (x',y') = T(x,y). Translation, rotation, scaling, shear, and affine/projective transforms in one place.

Translate X: Translate Y: Rotate °: Scale: Shear X: Shear Y: Perspective: Interpolation:

5 · Intensity Transformations

Point-wise mappings s = T(r): negative, log, gamma, contrast stretching, thresholding, histogram equalization. Watch the histogram update live.

Operation: γ / threshold: Stretch low: Stretch high:

6 · Local Transformations — Convolution playground

2D convolution (I*K)(x,y) = Σ I(x-i,y-j)·K(i,j). Choose a kernel or edit values directly. Try Gaussian blur, Sobel edges, Laplacian sharpening, median, bilateral.

Kernel:

Apply count:

7 · Fourier Transform

The 2D DFT decomposes the image into spatial frequencies. Drag the slider to keep only low or high frequencies — that's exactly what blur and sharpening do under the hood.

Filter: Cutoff radius: Band width:

Left: input · Middle: log magnitude spectrum · Right: inverse FFT after filtering

8 · Feature Detectors — edges, corners, blobs

Canny edges, Harris corners, and Difference-of-Gaussians blobs. Each builds on convolution and gradient operators from the previous sections.

Detector: Threshold: Sigma: Non-max-suppression:

Detected: 0

9 · Binary Object Characterization

Threshold the image, clean it with morphology (erode/dilate/open/close), label connected components, and compute area, perimeter, centroid, and bounding box for each one.

Threshold: Operation: SE size: Invert Color labels

10 · Neural network playground (PyTorch-style MLP)

Click to add positive (left button) or negative (right button) points. Train a small MLP live and watch the decision boundary emerge — the foundation of everything that follows.

Hidden units: Learning rate: Activation: Dataset:

Epoch: 0 · Loss: —

11 · CNNs — feature maps live

A CNN is just stacked convolutions, non-linearities, and pooling. Run a small hand-crafted CNN on your image and inspect every activation map.

Conv1 kernel: Pooling: Activation: Depth (layers):

12 · Advanced Architectures

Hover any architecture to see how data flows. Each is rendered as a live SVG diagram with parameters you can scrub.

Architecture: Input size:

13 · Segmentation

From simple thresholding to region growing to k-means colour segmentation — the same problem at different levels of sophistication.

Method: k (clusters): Tolerance:

14 · Object Detection

Sliding window + simple template matching. Pick a template (drag a rectangle on the image), then scan the image and visualise the response map and detected boxes.

Threshold: Mode:

Detections: 0

15 · Self-Supervised & Contrastive Learning

SimCLR-style: two augmented "views" of the same image should map close in feature space, different images should be far apart. Try the augmentation and see how a tiny learned projection separates positives from negatives.

Augmentation strength:

Avg cosine (pos): — · (neg): —

16 · Autoencoders & PCA

An autoencoder compresses an image into a low-dim code and reconstructs it. We do PCA (the linear autoencoder) live on patches and let you scrub the latent dimension.

Latent dim k: Patch size:

Compression: — · MSE: —

17 · Generative Models

Sample from a 2D latent space and watch the "generator" produce images. We show a tiny learned 2D-latent-to-image mapping plus the conceptual diagrams of GAN, VAE, and diffusion.

Family: z₁: z₂: Diffusion step t:

About

Built as a study companion for the Computer Vision course. Every demo runs entirely in your browser — no server, no upload, your images never leave your machine.

All demos are intentionally small and self-contained so you can read the source in js/modules/ and tweak them. Real-world systems use OpenCV, PyTorch, etc. for performance — but the math here is the math there.