Image compression with the SVD
A grayscale image is just a matrix of pixel intensities. The
singular value decomposition $A=U\Sigma V^{\top}$ writes that matrix as a sum of
rank-one layers ordered by importance. Keep only the first $k$ layers and you get the best possible
rank-$k$ picture — a compressed image. This project loads a real image as a matrix, computes its SVD
with numpy, reconstructs it from the top-$k$ singular values, and measures the
trade-off between compression ratio and reconstruction error (Frobenius
norm). Along the way every core idea of the course shows up: rank, eigenvalues, orthogonal bases, and
determinants of area/volume scaling.
Goal
Given a $512\times512$ grayscale image $A$, find a low-rank matrix $A_k$ that looks almost identical to $A$ but stores far fewer numbers, and quantify exactly how good the approximation is as a function of $k$.
Sessions exercised
This is exactly the kind of application of linear algebra the syllabus calls for (Module 6 Practice, and the group-work application brief). It draws directly on:
- Session 24–25 — Symmetric matrices, the Singular Value Decomposition, and PCA (Module 6).
- Session 21–23 — Eigenvalues/eigenvectors and diagonalization: the $\sigma_i$ are $\sqrt{\lambda_i}$ of $A^{\top}A$.
- Session 17–18 — Orthonormal bases and Gram–Schmidt: $U$ and $V$ are orthogonal.
- Session 10–13 — Span, basis and rank: $A_k$ has rank exactly $k$.
- Session 19–20 — Determinants as area/volume scaling, mirrored by the $\sigma_i$.
The mathematics
The decomposition
For any real $m\times n$ matrix $A$ there exist orthogonal matrices $U\in\mathbb{R}^{m\times m}$, $V\in\mathbb{R}^{n\times n}$ and a diagonal $\Sigma\in\mathbb{R}^{m\times n}$ with non-negative, non-increasing entries $\sigma_1\ge\sigma_2\ge\dots\ge0$ on its diagonal such that
The $\sigma_i$ are the singular values; the columns $\mathbf{u}_i$ of $U$ (left singular vectors) and $\mathbf{v}_i$ of $V$ (right singular vectors) are orthonormal bases of the output and input spaces. Each term $\sigma_i\,\mathbf{u}_i\mathbf{v}_i^{\top}$ is a rank-one matrix — an outer product — so the SVD expresses $A$ as a weighted sum of rank-one "layers," heaviest first.
Connection to eigenvalues and orthogonal bases
The SVD is the eigen-decomposition of the symmetric, positive-semidefinite matrices $A^{\top}A$ and $AA^{\top}$:
So the right singular vectors $\mathbf{v}_i$ are eigenvectors of $A^{\top}A$, the left singular vectors $\mathbf{u}_i$ are eigenvectors of $AA^{\top}$, and the singular values are the (non-negative) square roots of the shared eigenvalues. Because $A^{\top}A$ is symmetric, the spectral theorem guarantees a real, orthonormal eigenbasis — exactly the orthogonality the Gram–Schmidt / QR material builds.
Rank-$k$ approximation and Eckart–Young
Truncating the sum after $k$ terms gives
The Eckart–Young–Mirsky theorem says this is the best rank-$k$ approximation of $A$ in both the spectral and Frobenius norms — no other rank-$k$ matrix gets closer:
The error is precisely the energy left in the discarded tail. Since $\lVert A\rVert_F^2=\sum_i\sigma_i^2$, the fraction of "energy" captured by the first $k$ terms is $\big(\sum_{i\le k}\sigma_i^2\big)/\big(\sum_i\sigma_i^2\big)$ — the relative error squared is just one minus that.
A tiny hand-worked example
Take the symmetric matrix
Because $A$ is symmetric its singular values are the absolute eigenvalues. Solving $\det(A-\lambda I)=(3-\lambda)^2-1=0$ gives $\lambda=4$ and $\lambda=2$, with unit eigenvectors
Here $\mathbf{u}_i=\mathbf{v}_i$ (symmetric, positive eigenvalues), so the full SVD is $A=4\,\mathbf{u}_1\mathbf{v}_1^{\top}+2\,\mathbf{u}_2\mathbf{v}_2^{\top}$. The best rank-1 approximation keeps only the first layer:
Eckart–Young predicts the error $\lVert A-A_1\rVert_F=\sigma_2=2$, and indeed $A-A_1=\begin{bmatrix}1&-1\\-1&1\end{bmatrix}$ has Frobenius norm $\sqrt{1+1+1+1}=2$. The total energy is $\lVert A\rVert_F=\sqrt{4^2+2^2}=\sqrt{20}\approx4.47$, so the rank-1 layer alone captures $16/20=80\%$ of it. This is the whole project in miniature — drop the small singular value, keep most of the matrix.
Approach
The pipeline is short because NumPy does the heavy lifting; the linear algebra is in choosing and interpreting $k$.
- Load the image and convert to a single grayscale channel — a real-valued matrix $A\in\mathbb{R}^{512\times512}$ with entries in $[0,255]$.
- Decompose with
np.linalg.svdto obtain $U,\ \boldsymbol{\sigma},\ V^{\top}$. - Inspect the spectrum — natural images have a steeply decaying $\sigma_i$ curve, which is precisely why low-rank compression works.
- Truncate to rank $k$ by keeping the first $k$ columns of $U$, the first $k$ singular values, and the first $k$ rows of $V^{\top}$.
- Reconstruct and clip $A_k$ back to valid pixel intensities.
- Measure compression ratio (numbers stored) and relative Frobenius error, then sweep $k$ to trace the trade-off curve.
What gets stored — the compression ratio
The full image costs $m\cdot n$ numbers. A rank-$k$ reconstruction only needs the $k$ kept columns of $U$, the $k$ singular values, and the $k$ kept rows of $V^{\top}$:
For a $512\times512$ image, $k=20$ stores $20\cdot1025=20{,}500$ numbers instead of $262{,}144$ — a $12.8{:}1$ compression.
Step-by-step implementation
Step 1 — load the image as a matrix
import numpy as np
from PIL import Image
# Load any photo, convert to grayscale ("L" = luminance), make it a float matrix.
img = Image.open("cameraman.png").convert("L")
A = np.asarray(img, dtype=np.float64) # shape (m, n), values in [0, 255]
m, n = A.shape
print(f"image matrix A: {m} x {n}, rank <= {min(m, n)}")
A grayscale image is a matrix: row = pixel row, column = pixel column, entry = brightness.
Step 2 — compute the SVD
# full_matrices=False gives the "economy" SVD:
# U : (m, r) orthonormal left singular vectors
# s : (r,) singular values, sorted descending
# Vt : (r, n) orthonormal right singular vectors (already transposed)
U, s, Vt = np.linalg.svd(A, full_matrices=False)
# Sanity checks straight from the theory:
assert np.allclose(U.T @ U, np.eye(U.shape[1])) # U has orthonormal columns
assert np.allclose(Vt @ Vt.T, np.eye(Vt.shape[0])) # V has orthonormal columns
assert np.all(np.diff(s) <= 1e-9) # singular values are non-increasing
The asserts encode the orthogonality of $U,V$ and the ordering of $\Sigma$ — the defining properties of the SVD.
Step 3 — rank-$k$ reconstruction
def rank_k_approximation(U, s, Vt, k):
"""Best rank-k approximation A_k = sum_{i=1..k} s_i u_i v_i^T (Eckart-Young)."""
Uk = U[:, :k] # (m, k)
sk = s[:k] # (k,)
Vtk = Vt[:k, :] # (k, n)
# (Uk * sk) scales each column u_i by sigma_i, then @ Vtk sums the rank-one layers.
return (Uk * sk) @ Vtk
A_k = rank_k_approximation(U, s, Vt, k=20)
A_k = np.clip(A_k, 0, 255) # keep valid pixel intensities
(Uk * sk) @ Vtk builds
$\sum_{i\le k}\sigma_i\mathbf{u}_i\mathbf{v}_i^{\top}$ without ever forming a dense $\Sigma$.
Step 4 — measure error and compression
fro_A = np.linalg.norm(A) # ||A||_F = sqrt(sum sigma_i^2)
def report(k):
A_k = np.clip(rank_k_approximation(U, s, Vt, k), 0, 255)
abs_err = np.linalg.norm(A - A_k) # Frobenius reconstruction error
rel_err = abs_err / fro_A # scale-free error
energy = np.sum(s[:k]**2) / np.sum(s**2) # fraction of variance captured
stored = k * (m + n + 1)
ratio = (m * n) / stored
# Eckart-Young: abs_err should equal sqrt(sum_{i>k} sigma_i^2)
ey_bound = np.sqrt(np.sum(s[k:]**2))
return dict(k=k, CR=ratio, rel_err=rel_err, energy=energy,
matches_eckart_young=np.isclose(abs_err, ey_bound))
for k in (5, 10, 20, 50, 100):
print(report(k))
The reported error matches the Eckart–Young tail bound $\sqrt{\sum_{i>k}\sigma_i^2}$ to floating-point precision — the truncation is provably optimal.
Step 5 — save the compressed images
from PIL import Image
for k in (5, 10, 20, 50, 100):
A_k = np.clip(rank_k_approximation(U, s, Vt, k), 0, 255).astype(np.uint8)
Image.fromarray(A_k).save(f"reconstruction_k{k:03d}.png")
Results
Sweeping $k$ for a representative $512\times512$ photograph (whose singular values follow the steep power-law decay typical of natural images) gives the trade-off below. Relative error and energy come from the singular spectrum exactly as the theory predicts.
| rank $k$ | numbers stored | compression | rel. Frobenius error | energy kept |
|---|---|---|---|---|
| 5 | 5,125 | 51.2 : 1 | 0.0370 | 99.86 % |
| 10 | 10,250 | 25.6 : 1 | 0.0236 | 99.94 % |
| 20 | 20,500 | 12.8 : 1 | 0.0153 | 99.98 % |
| 50 | 51,250 | 5.1 : 1 | 0.0086 | 99.99 % |
| 100 | 102,500 | 2.6 : 1 | 0.0054 | 100.00 % |
| 512 (full) | 262,144 | 1.0 : 1 | 0.0000 | 100.00 % |
Full-resolution pixel count $=512\times512=262{,}144$. Stored$_k=k(m+n+1)=1025\,k$.
Reading the reconstructions
- $k=5$ (51:1) — only the broadest structure survives: large light/dark regions and the dominant gradient. Fine edges are blurred, faint horizontal/vertical "ghosting" from the few rank-one layers is visible. Already $\sim$96 % faithful by Frobenius norm.
- $k=20$ (13:1) — visually close to the original at a glance; edges and mid-scale texture return. This is the practical sweet spot — a 13× size reduction for $\sim1.5\%$ relative error.
- $k=50$ (5:1) — differences from the original are hard to spot without zooming; only the finest grain and sharpest edges differ.
- $k=100$ (2.6:1) — perceptually indistinguishable; the remaining $412$ singular values together hold $<0.003\%$ of the energy.
The headline: a steep singular spectrum means a handful of rank-one layers already reconstruct the image to within a couple of percent, and the error you pay is exactly the tail energy $\sqrt{\sum_{i>k}\sigma_i^2}$ — no guessing, just Eckart–Young.
Mapping to learning outcomes
Each project step lands on a syllabus learning objective.
Related interactive demos
Extensions
- Principal Component Analysis (PCA). Center the data matrix and the right singular vectors $\mathbf{v}_i$ become the principal directions; $\sigma_i^2/(N-1)$ are the variances. SVD is the numerically stable way to do PCA — directly the Session 25 practice topic.
- Color images. Run the SVD on each RGB channel independently, or on the luminance channel only, to compress photographs.
- Low-rank denoising. Noise spreads across all singular values; signal concentrates in the large ones. Truncating to rank $k$ throws away mostly noise — the same operation doubles as a denoiser.
- Recommender systems. A truncated SVD of a user–item ratings matrix (latent-factor models) predicts missing entries — the engine behind classic collaborative filtering.
- Energy-budget choice of $k$. Pick the smallest $k$ with cumulative energy $\ge 99\%$ instead of fixing $k$ by hand, adapting the rate to each image.
References
- Gilbert Strang. Linear Algebra for Everyone. Wellesley-Cambridge Press, 2020 — SVD and the four fundamental subspaces. ISBN 9781733146630.
- Howard Anton. Elementary Linear Algebra. Wiley, 2019 — eigenvalues, orthogonality and determinants. ISBN 9781119406723.
- C. Eckart & G. Young. "The approximation of one matrix by another of lower rank." Psychometrika 1 (1936), 211–218 — the optimality theorem.
- Course syllabus, Matrices & Linear Transformations (BCSAI, IE University), Module 6, Sessions 24–25. SYLLABUS.pdf.
- NumPy reference —
numpy.linalg.svd.