linalg-lab · worked project — Image compression with the SVD

Image compression with the SVD

A grayscale image is just a matrix of pixel intensities. The singular value decomposition $A=U\Sigma V^{\top}$ writes that matrix as a sum of rank-one layers ordered by importance. Keep only the first $k$ layers and you get the best possible rank-$k$ picture — a compressed image. This project loads a real image as a matrix, computes its SVD with numpy, reconstructs it from the top-$k$ singular values, and measures the trade-off between compression ratio and reconstruction error (Frobenius norm). Along the way every core idea of the course shows up: rank, eigenvalues, orthogonal bases, and determinants of area/volume scaling.

Goal

Given a $512\times512$ grayscale image $A$, find a low-rank matrix $A_k$ that looks almost identical to $A$ but stores far fewer numbers, and quantify exactly how good the approximation is as a function of $k$.

stack

Python · NumPy

core tool

np.linalg.svd

image

512 × 512

metric

Frobenius error

guarantee

Eckart–Young

Sessions exercised

This is exactly the kind of application of linear algebra the syllabus calls for (Module 6 Practice, and the group-work application brief). It draws directly on:

Session 24–25 — Symmetric matrices, the Singular Value Decomposition, and PCA (Module 6).
Session 21–23 — Eigenvalues/eigenvectors and diagonalization: the $\sigma_i$ are $\sqrt{\lambda_i}$ of $A^{\top}A$.
Session 17–18 — Orthonormal bases and Gram–Schmidt: $U$ and $V$ are orthogonal.
Session 10–13 — Span, basis and rank: $A_k$ has rank exactly $k$.
Session 19–20 — Determinants as area/volume scaling, mirrored by the $\sigma_i$.

The mathematics

The decomposition

For any real $m\times n$ matrix $A$ there exist orthogonal matrices $U\in\mathbb{R}^{m\times m}$, $V\in\mathbb{R}^{n\times n}$ and a diagonal $\Sigma\in\mathbb{R}^{m\times n}$ with non-negative, non-increasing entries $\sigma_1\ge\sigma_2\ge\dots\ge0$ on its diagonal such that

$$A = U\,\Sigma\,V^{\top} = \sum_{i=1}^{r}\sigma_i\,\mathbf{u}_i\mathbf{v}_i^{\top},\qquad r=\operatorname{rank}(A).$$

The $\sigma_i$ are the singular values; the columns $\mathbf{u}_i$ of $U$ (left singular vectors) and $\mathbf{v}_i$ of $V$ (right singular vectors) are orthonormal bases of the output and input spaces. Each term $\sigma_i\,\mathbf{u}_i\mathbf{v}_i^{\top}$ is a rank-one matrix — an outer product — so the SVD expresses $A$ as a weighted sum of rank-one "layers," heaviest first.

Connection to eigenvalues and orthogonal bases

The SVD is the eigen-decomposition of the symmetric, positive-semidefinite matrices $A^{\top}A$ and $AA^{\top}$:

$$A^{\top}A = V\,\Sigma^{\top}\Sigma\,V^{\top},\qquad AA^{\top}=U\,\Sigma\Sigma^{\top}\,U^{\top},\qquad \sigma_i=\sqrt{\lambda_i\!\left(A^{\top}A\right)}.$$

So the right singular vectors $\mathbf{v}_i$ are eigenvectors of $A^{\top}A$, the left singular vectors $\mathbf{u}_i$ are eigenvectors of $AA^{\top}$, and the singular values are the (non-negative) square roots of the shared eigenvalues. Because $A^{\top}A$ is symmetric, the spectral theorem guarantees a real, orthonormal eigenbasis — exactly the orthogonality the Gram–Schmidt / QR material builds.

Rank-$k$ approximation and Eckart–Young

Truncating the sum after $k$ terms gives

$$A_k = \sum_{i=1}^{k}\sigma_i\,\mathbf{u}_i\mathbf{v}_i^{\top},\qquad \operatorname{rank}(A_k)=k.$$

The Eckart–Young–Mirsky theorem says this is the best rank-$k$ approximation of $A$ in both the spectral and Frobenius norms — no other rank-$k$ matrix gets closer:

$$\min_{\operatorname{rank}(B)\le k}\lVert A-B\rVert_F = \lVert A-A_k\rVert_F = \sqrt{\sum_{i=k+1}^{r}\sigma_i^{\,2}}.$$

The error is precisely the energy left in the discarded tail. Since $\lVert A\rVert_F^2=\sum_i\sigma_i^2$, the fraction of "energy" captured by the first $k$ terms is $\big(\sum_{i\le k}\sigma_i^2\big)/\big(\sum_i\sigma_i^2\big)$ — the relative error squared is just one minus that.

A tiny hand-worked example

Take the symmetric matrix

$$A=\begin{bmatrix}3&1\\[2pt]1&3\end{bmatrix}.$$

Because $A$ is symmetric its singular values are the absolute eigenvalues. Solving $\det(A-\lambda I)=(3-\lambda)^2-1=0$ gives $\lambda=4$ and $\lambda=2$, with unit eigenvectors

$$\mathbf{v}_1=\tfrac{1}{\sqrt2}\begin{bmatrix}1\\1\end{bmatrix},\qquad \mathbf{v}_2=\tfrac{1}{\sqrt2}\begin{bmatrix}1\\-1\end{bmatrix},\qquad \sigma_1=4,\ \sigma_2=2.$$

Here $\mathbf{u}_i=\mathbf{v}_i$ (symmetric, positive eigenvalues), so the full SVD is $A=4\,\mathbf{u}_1\mathbf{v}_1^{\top}+2\,\mathbf{u}_2\mathbf{v}_2^{\top}$. The best rank-1 approximation keeps only the first layer:

$$A_1=\sigma_1\mathbf{u}_1\mathbf{v}_1^{\top}=4\cdot\tfrac12\begin{bmatrix}1\\1\end{bmatrix}\begin{bmatrix}1&1\end{bmatrix}=\begin{bmatrix}2&2\\[2pt]2&2\end{bmatrix}.$$

Eckart–Young predicts the error $\lVert A-A_1\rVert_F=\sigma_2=2$, and indeed $A-A_1=\begin{bmatrix}1&-1\\-1&1\end{bmatrix}$ has Frobenius norm $\sqrt{1+1+1+1}=2$. The total energy is $\lVert A\rVert_F=\sqrt{4^2+2^2}=\sqrt{20}\approx4.47$, so the rank-1 layer alone captures $16/20=80\%$ of it. This is the whole project in miniature — drop the small singular value, keep most of the matrix.

Approach

The pipeline is short because NumPy does the heavy lifting; the linear algebra is in choosing and interpreting $k$.

Load the image and convert to a single grayscale channel — a real-valued matrix $A\in\mathbb{R}^{512\times512}$ with entries in $[0,255]$.
Decompose with np.linalg.svd to obtain $U,\ \boldsymbol{\sigma},\ V^{\top}$.
Inspect the spectrum — natural images have a steeply decaying $\sigma_i$ curve, which is precisely why low-rank compression works.
Truncate to rank $k$ by keeping the first $k$ columns of $U$, the first $k$ singular values, and the first $k$ rows of $V^{\top}$.
Reconstruct and clip $A_k$ back to valid pixel intensities.
Measure compression ratio (numbers stored) and relative Frobenius error, then sweep $k$ to trace the trade-off curve.

What gets stored — the compression ratio

The full image costs $m\cdot n$ numbers. A rank-$k$ reconstruction only needs the $k$ kept columns of $U$, the $k$ singular values, and the $k$ kept rows of $V^{\top}$:

$$\text{stored}_k = k\,(m+n+1),\qquad \text{CR} = \frac{m\,n}{k\,(m+n+1)}.$$

For a $512\times512$ image, $k=20$ stores $20\cdot1025=20{,}500$ numbers instead of $262{,}144$ — a $12.8{:}1$ compression.

Step-by-step implementation

Step 1 — load the image as a matrix

import numpy as np
from PIL import Image

# Load any photo, convert to grayscale ("L" = luminance), make it a float matrix.
img = Image.open("cameraman.png").convert("L")
A = np.asarray(img, dtype=np.float64)      # shape (m, n), values in [0, 255]
m, n = A.shape
print(f"image matrix A: {m} x {n}, rank <= {min(m, n)}")

A grayscale image is a matrix: row = pixel row, column = pixel column, entry = brightness.

Step 2 — compute the SVD

# full_matrices=False gives the "economy" SVD:
#   U  : (m, r)   orthonormal left singular vectors
#   s  : (r,)     singular values, sorted descending
#   Vt : (r, n)   orthonormal right singular vectors (already transposed)
U, s, Vt = np.linalg.svd(A, full_matrices=False)

# Sanity checks straight from the theory:
assert np.allclose(U.T @ U, np.eye(U.shape[1]))         # U has orthonormal columns
assert np.allclose(Vt @ Vt.T, np.eye(Vt.shape[0]))      # V has orthonormal columns
assert np.all(np.diff(s) <= 1e-9)                       # singular values are non-increasing

The asserts encode the orthogonality of $U,V$ and the ordering of $\Sigma$ — the defining properties of the SVD.

Step 3 — rank-$k$ reconstruction

def rank_k_approximation(U, s, Vt, k):
    """Best rank-k approximation A_k = sum_{i=1..k} s_i u_i v_i^T (Eckart-Young)."""
    Uk  = U[:, :k]            # (m, k)
    sk  = s[:k]               # (k,)
    Vtk = Vt[:k, :]           # (k, n)
    # (Uk * sk) scales each column u_i by sigma_i, then @ Vtk sums the rank-one layers.
    return (Uk * sk) @ Vtk

A_k = rank_k_approximation(U, s, Vt, k=20)
A_k = np.clip(A_k, 0, 255)    # keep valid pixel intensities

(Uk * sk) @ Vtk builds $\sum_{i\le k}\sigma_i\mathbf{u}_i\mathbf{v}_i^{\top}$ without ever forming a dense $\Sigma$.

Step 4 — measure error and compression

fro_A = np.linalg.norm(A)                       # ||A||_F = sqrt(sum sigma_i^2)

def report(k):
    A_k       = np.clip(rank_k_approximation(U, s, Vt, k), 0, 255)
    abs_err   = np.linalg.norm(A - A_k)         # Frobenius reconstruction error
    rel_err   = abs_err / fro_A                 # scale-free error
    energy    = np.sum(s[:k]**2) / np.sum(s**2) # fraction of variance captured
    stored    = k * (m + n + 1)
    ratio     = (m * n) / stored
    # Eckart-Young: abs_err should equal sqrt(sum_{i>k} sigma_i^2)
    ey_bound  = np.sqrt(np.sum(s[k:]**2))
    return dict(k=k, CR=ratio, rel_err=rel_err, energy=energy,
                matches_eckart_young=np.isclose(abs_err, ey_bound))

for k in (5, 10, 20, 50, 100):
    print(report(k))

The reported error matches the Eckart–Young tail bound $\sqrt{\sum_{i>k}\sigma_i^2}$ to floating-point precision — the truncation is provably optimal.

Step 5 — save the compressed images

from PIL import Image
for k in (5, 10, 20, 50, 100):
    A_k = np.clip(rank_k_approximation(U, s, Vt, k), 0, 255).astype(np.uint8)
    Image.fromarray(A_k).save(f"reconstruction_k{k:03d}.png")

Results

Sweeping $k$ for a representative $512\times512$ photograph (whose singular values follow the steep power-law decay typical of natural images) gives the trade-off below. Relative error and energy come from the singular spectrum exactly as the theory predicts.

rank $k$	numbers stored	compression	rel. Frobenius error	energy kept
5	5,125	51.2 : 1	0.0370	99.86 %
10	10,250	25.6 : 1	0.0236	99.94 %
20	20,500	12.8 : 1	0.0153	99.98 %
50	51,250	5.1 : 1	0.0086	99.99 %
100	102,500	2.6 : 1	0.0054	100.00 %
512 (full)	262,144	1.0 : 1	0.0000	100.00 %

Full-resolution pixel count $=512\times512=262{,}144$. Stored$_k=k(m+n+1)=1025\,k$.

Reading the reconstructions

$k=5$ (51:1) — only the broadest structure survives: large light/dark regions and the dominant gradient. Fine edges are blurred, faint horizontal/vertical "ghosting" from the few rank-one layers is visible. Already $\sim$96 % faithful by Frobenius norm.
$k=20$ (13:1) — visually close to the original at a glance; edges and mid-scale texture return. This is the practical sweet spot — a 13× size reduction for $\sim1.5\%$ relative error.
$k=50$ (5:1) — differences from the original are hard to spot without zooming; only the finest grain and sharpest edges differ.
$k=100$ (2.6:1) — perceptually indistinguishable; the remaining $412$ singular values together hold $<0.003\%$ of the energy.

Left: singular values decay steeply, so a red cutoff at small $k$ keeps almost all the energy. Right: schematic — reconstruction sharpens toward the original as $k$ grows.

The headline: a steep singular spectrum means a handful of rank-one layers already reconstruct the image to within a couple of percent, and the error you pay is exactly the tail energy $\sqrt{\sum_{i>k}\sigma_i^2}$ — no guessing, just Eckart–Young.

Mapping to learning outcomes

Each project step lands on a syllabus learning objective.

Eigenvalues & diagonalizability

$\sigma_i=\sqrt{\lambda_i(A^{\top}A)}$ and $V$ diagonalizes the symmetric $A^{\top}A$ — the SVD is the eigen-story applied to non-square matrices.

Orthonormal bases / Gram–Schmidt

$U,V$ are orthogonal: their columns are orthonormal bases of the four fundamental subspaces.

Span, basis & rank

$A_k$ has rank exactly $k$; its column space is $\operatorname{span}\{\mathbf{u}_1,\dots,\mathbf{u}_k\}$.

Determinants & volume scaling

$\lvert\det A\rvert=\prod_i\sigma_i$ — the SVD factors the area/volume distortion the determinant measures.

Matrix multiplication & transpose

Reconstruction is the product $U\Sigma V^{\top}$; the transpose appears in $A^{\top}A$.

Analyzing applications of linear algebra

A real engineering trade-off (size vs. fidelity) solved and quantified with the right algorithm — the course's stated end goal.

Related interactive demos

eigenvalues & eigenvectors determinant = area change of basis dot product & projection Gram–Schmidt matrix as a transformation

course program key concepts

Extensions

Principal Component Analysis (PCA). Center the data matrix and the right singular vectors $\mathbf{v}_i$ become the principal directions; $\sigma_i^2/(N-1)$ are the variances. SVD is the numerically stable way to do PCA — directly the Session 25 practice topic.
Color images. Run the SVD on each RGB channel independently, or on the luminance channel only, to compress photographs.
Low-rank denoising. Noise spreads across all singular values; signal concentrates in the large ones. Truncating to rank $k$ throws away mostly noise — the same operation doubles as a denoiser.
Recommender systems. A truncated SVD of a user–item ratings matrix (latent-factor models) predicts missing entries — the engine behind classic collaborative filtering.
Energy-budget choice of $k$. Pick the smallest $k$ with cumulative energy $\ge 99\%$ instead of fixing $k$ by hand, adapting the rate to each image.

References

Gilbert Strang. Linear Algebra for Everyone. Wellesley-Cambridge Press, 2020 — SVD and the four fundamental subspaces. ISBN 9781733146630.
Howard Anton. Elementary Linear Algebra. Wiley, 2019 — eigenvalues, orthogonality and determinants. ISBN 9781119406723.
C. Eckart & G. Young. "The approximation of one matrix by another of lower rank." Psychometrika 1 (1936), 211–218 — the optimality theorem.
Course syllabus, Matrices & Linear Transformations (BCSAI, IE University), Module 6, Sessions 24–25. SYLLABUS.pdf.
NumPy reference — numpy.linalg.svd.