Part I: Image Processing
Chapter 1: Digital Image Fundamentals

Sampling & Quantization

"I used to be a continuous function with infinite detail. Then I got sampled at 512 by 512 and rounded to 8 bits. You adapt."

A Recently Discretized Signal
Big Picture

Digitization is two separate, independent acts of information loss: sampling chops space into a grid, and quantization rounds brightness into a ladder of levels; every digital image artifact you will ever debug traces back to one of these two, and the cures are different. Sample too coarsely and high-frequency detail does not vanish politely; it disguises itself as false low-frequency patterns called aliasing. Quantize too coarsely and smooth gradients shatter into visible bands. This section gives you the vocabulary, the math, and the antidotes for both.

In Section 1.1 we followed light through the lens and sensor to a RAW mosaic. We glossed over the most fundamental step of all: the optical image projected onto the sensor is a continuous function of position and brightness, and the array in your program is neither. Formally, the scene presents an irradiance function $f(x, y)$ defined at every real-valued point and taking real values. The sensor turns it into $f[m, n]$, defined only at integer grid positions (sampling), and the ADC turns each value into one of finitely many integers (quantization). Two axes of discretization, two failure modes, two sets of engineering tools, which is exactly how this section is organized.

1. Sampling: From Function to Grid Beginner

Sampling evaluates the continuous image on a regular lattice with horizontal pitch $\Delta x$ and vertical pitch $\Delta y$:

$$f[m, n] = f(m \, \Delta x, \; n \, \Delta y), \qquad m = 0, \dots, M-1, \; n = 0, \dots, N-1.$$

A persistent misconception, worth killing early, is that a pixel is a little square of color. A pixel is a sample: a single measurement associated with a point (in practice, a small area-average around that point, since photosites integrate light over their surface). The little-square picture causes real bugs, for example when aligning coordinate systems during the geometric warps of Chapter 5, where the question "is pixel (0,0) the corner or the center of the first sample area?" changes results by half a pixel.

How fine must the grid be? The celebrated sampling theorem of Nyquist and Shannon answers precisely: a signal can be perfectly reconstructed from its samples if it contains no frequency at or above half the sampling rate,

$$f_{\text{sample}} > 2 f_{\text{max}},$$

where for images "frequency" means cycles per unit distance: fine stripes are high frequency, smooth washes are low frequency. The full machinery behind this statement (Fourier transforms, spectra, and the elegant proof) arrives in Chapter 4; what you need now is the consequence of violating it.

2. Aliasing: When Detail Lies Intermediate

When the scene contains frequencies above the Nyquist limit, those frequencies do not disappear from the sampled image. They fold back, masquerading as lower frequencies that were never in the scene. This is aliasing, and you have seen it: the wagon-wheel that spins backwards on film, the shimmering moire on a striped shirt in a video call, the jagged staircase on a rendered diagonal line. The defining property of aliasing is that it manufactures plausible-looking false structure, which is what makes it dangerous for measurement and for machine learning alike.

The torture test for aliasing is the zone plate, a pattern whose frequency grows steadily with radius. Code 1.2.1 builds one and downsamples it two ways: naive pixel-dropping versus area averaging.

import cv2
import numpy as np

# Zone plate: spatial frequency increases with radius, a torture test
# for any resampling code.
n = 512
y, x = np.mgrid[0:n, 0:n].astype(np.float32)
r2 = (x - n / 2) ** 2 + (y - n / 2) ** 2
zone = (127.5 + 127.5 * np.cos(np.pi * r2 / 256.0)).astype(np.uint8)

# Downsample 4x by simply keeping every 4th pixel (no pre-filter):
small_naive = cv2.resize(zone, (n // 4, n // 4),
                         interpolation=cv2.INTER_NEAREST)
# Downsample 4x by averaging each 4x4 block (a built-in anti-alias filter):
small_clean = cv2.resize(zone, (n // 4, n // 4),
                         interpolation=cv2.INTER_AREA)

cv2.imwrite("zone_full.png", zone)
cv2.imwrite("zone_naive.png", small_naive)  # ghost rings: aliasing
cv2.imwrite("zone_clean.png", small_clean)  # fine rings fade to flat gray
Code 1.2.1: Zone plate downsampling. The INTER_NEAREST result sprouts phantom rings far from the center, false low frequencies created by undersampling. The INTER_AREA result instead lets unresolvable detail fade smoothly to gray, which is the honest answer.

Open the three saved files side by side and the lesson is immediate: the naive version contains concentric rings that simply do not exist in the original at those positions, while the area-averaged version degrades gracefully. The general rule, which you will use every time you build an image pyramid in Chapter 4 or resize a training set, is to remove the frequencies the new grid cannot carry before resampling, usually with a Gaussian or box blur.

Key Insight: Blur Before You Shrink

Downsampling is safe only after a low-pass filter has removed the detail the smaller grid cannot represent. Blurring sounds like vandalism but is actually honesty: the alternative is not sharpness, it is fabricated patterns. This single rule explains why cv2.INTER_AREA exists, why image pyramids blur at every level, and why deep learning frameworks added antialias=True flags to their resize ops after researchers showed aliasing in data pipelines measurably hurts model accuracy and robustness.

Library Shortcut: Anti-Aliased Resizing in One Line

A correct manual downsampler (design a Gaussian whose cutoff matches the scale factor, pad, filter, then subsample) runs 30 to 50 lines. Production libraries fold the pre-filter into the resize call:

import cv2
from skimage.transform import rescale

small = cv2.resize(img, None, fx=0.25, fy=0.25,
                   interpolation=cv2.INTER_AREA)        # box pre-filter
small2 = rescale(img, 0.25, anti_aliasing=True,
                 channel_axis=-1)                       # Gaussian pre-filter
Code 1.2.2: Library anti-aliasing: one argument replaces a hand-built filter-then-subsample pipeline. OpenCV's INTER_AREA averages source pixel blocks; scikit-image's anti_aliasing=True applies a scale-matched Gaussian before interpolating.

3. Quantization: From Real Values to Integer Levels Intermediate

The second discretization acts on brightness. A $b$-bit quantizer maps the continuous range of measured intensities onto $L = 2^b$ discrete levels. With uniform spacing, the step size between adjacent levels is

$$\Delta = \frac{I_{\max} - I_{\min}}{2^b},$$

and every true value within a step gets rounded to that step's representative. The rounding error is at most $\Delta/2$ per pixel, and if the true values are spread evenly within each step, the mean squared error of quantization is the classic result

$$\mathrm{MSE}_{\text{quant}} = \frac{\Delta^2}{12},$$

which translates into a signal-to-noise ratio that improves by about $6.02$ dB for every added bit. Eight bits give roughly 48 to 50 dB, comfortably below what most humans can spot in a photograph viewed casually; that is why 8-bit images dominate, as we saw when handling dtypes in Chapter 0.

The visible failure of coarse quantization is banding (also called posterization or false contouring): smooth gradients break into discrete plateaus with visible seams. The eye is exquisitely sensitive to these seams because they look like edges, and edges carry meaning. Code 1.2.3 quantizes a smooth ramp to decreasing bit depths so you can find your own threshold of visibility.

import numpy as np
import cv2

# A perfectly smooth horizontal ramp, 0 to 255 across 1024 columns.
ramp = np.tile(np.linspace(0, 255, 1024, dtype=np.float32), (128, 1))

def quantize(img, bits):
    """Uniformly requantize a [0, 255] image to 2**bits levels."""
    levels = 2 ** bits
    step = 255.0 / (levels - 1)
    return (np.round(img / step) * step).astype(np.uint8)

panels = []
for bits in [8, 5, 3, 2]:
    q = quantize(ramp, bits)
    panels.append(q)
    print(f"{bits} bits -> {len(np.unique(q)):>3} distinct levels")

cv2.imwrite("banding.png", np.vstack(panels))  # stacked for comparison
Code 1.2.3: Requantizing a smooth ramp. At 8 bits the ramp looks continuous; at 5 bits faint bands appear under careful viewing; at 3 bits and 2 bits the gradient collapses into obvious stripes.
8 bits -> 256 distinct levels
5 bits ->  32 distinct levels
3 bits ->   8 distinct levels
2 bits ->   4 distinct levels
Output 1.2.3: Each lost bit halves the number of available intensity levels, and the histogram (a tool we sharpen in Chapter 2) collapses onto progressively fewer spikes.

Figure 1.2.1 puts the two discretizations side by side on a one-dimensional slice, because seeing them as separate operations is the key mental model: sampling acts on the horizontal axis, quantization on the vertical one.

1. Sampling (discretize position) position x brightness samples exist only at grid positions 2. Quantization (discretize value) position x L3 L2 L1 L0 each sample snaps to the nearest level; the gap is quantization error
Figure 1.2.1: The two independent discretizations. Left: sampling keeps the signal's exact values but only at grid positions. Right: quantization snaps each sampled value (gray curve) to the nearest of $L$ levels (dashed lines), producing a staircase; the vertical gaps are the quantization error whose mean square is $\Delta^2/12$.

4. Dithering: Trading Banding for Noise Advanced

Quantization error is deterministic: every pixel in a band rounds the same way, which is precisely why the eye sees a contour. Dithering breaks that determinism by adding controlled randomness, replacing correlated banding with uncorrelated grain that the eye averages away. The classic algorithm is Floyd-Steinberg error diffusion: quantize each pixel, then push its rounding error onto the not-yet-visited neighbors so that errors cancel locally. Code 1.2.4 implements it from scratch.

import numpy as np

def floyd_steinberg(gray, levels=2):
    """Quantize to `levels` while diffusing each pixel's rounding error
    onto its unvisited neighbors (right, lower-left, lower, lower-right)."""
    img = gray.astype(np.float32).copy()
    h, w = img.shape
    step = 255.0 / (levels - 1)
    for r in range(h):
        for c in range(w):
            old = img[r, c]
            new = np.clip(np.round(old / step) * step, 0, 255)
            img[r, c] = new
            err = old - new
            if c + 1 < w:
                img[r, c + 1] += err * 7 / 16
            if r + 1 < h:
                if c > 0:
                    img[r + 1, c - 1] += err * 3 / 16
                img[r + 1, c] += err * 5 / 16
                if c + 1 < w:
                    img[r + 1, c + 1] += err * 1 / 16
    return img.astype(np.uint8)

# Compare: hard 1-bit threshold vs dithered 1-bit, on the ramp from Code 1.2.3
ramp8 = np.tile(np.linspace(0, 255, 1024), (128, 1)).astype(np.uint8)
hard = np.where(ramp8 >= 128, 255, 0).astype(np.uint8)   # 2 flat halves
dith = floyd_steinberg(ramp8, levels=2)                  # smooth-looking ramp
print("hard levels:", np.unique(hard), " dithered levels:", np.unique(dith))
Code 1.2.4: Floyd-Steinberg error diffusion from scratch. Both outputs contain only pure black and pure white, yet from arm's length the dithered version reads as a continuous gradient because local black/white densities track the original intensities.

The deep idea, worth savoring, is that dithering does not reduce error energy at all; it reshapes the error spectrum, moving it from visible low-frequency contours into high-frequency noise the visual system discounts. The same principle reappears wearing different costumes throughout this book: noise shaping in audio, stochastic rounding when training neural networks in low precision, and the deliberate noise injection at the heart of the diffusion models of Chapter 33.

Library Shortcut: Pillow Dithers in One Line

Our 25-line Python loop is also painfully slow (it cannot be vectorized along rows because errors propagate). Pillow's palette conversion applies optimized Floyd-Steinberg in C:

from PIL import Image

bw = Image.fromarray(ramp8).convert("1")  # 1-bit, Floyd-Steinberg by default
pal = Image.fromarray(ramp8).convert("P",
        palette=Image.Palette.ADAPTIVE, colors=8)  # dithered 8-color version
Code 1.2.5: One-line dithering via Pillow: the 25-line error-diffusion loop of Code 1.2.4, plus palette selection and a fast C inner loop, handled internally by convert.
Practical Example: Product Photos on a Four-Level Screen

Who: A firmware engineer at a retail electronics company shipping electronic shelf labels with 2-bit grayscale e-paper displays.

Situation: Marketing wanted small product photos on the labels, not just prices. The display hardware offers exactly four gray levels.

Problem: Naive quantization to four levels turned faces and packaging gradients into blotchy cartoon regions; the pilot store called the photos "melted".

Decision: The engineer added Floyd-Steinberg dithering to the image preparation service, plus a mild pre-sharpening pass to compensate for the e-paper's pixel blur.

Result: The same four hardware levels now rendered photographs that customers rated as "clearly recognizable" in store tests; returns on the photo feature stopped.

Lesson: When you cannot add bits, reshape the error. Perceived quality is a property of the error spectrum, not just the error magnitude.

5. Budgeting Pixels and Bits Beginner

Sampling density and bit depth are budget decisions, and they interact with everything downstream. More samples cost memory, bandwidth, and compute quadratically; more bits cost linearly but stress storage formats and tooling. The right split depends on the consumer. Human viewing tolerates 8 bits but hates aliasing. Measurement tasks (gauging, medical, astronomy) often need 12 to 16 bits but modest resolution. Deep networks ingest surprisingly low resolutions (224×224 remains a standard training size) but are sensitive to aliasing introduced by careless dataset resizing, a pitfall that resurfaces in the augmentation pipelines of Chapter 21. The next section, Section 1.3, takes up this budgeting question quantitatively: what resolution, depth, and dynamic range actually buy you.

Research Frontier: Images Without Grids (2024 to 2026)

A lively research line discards the sampling grid altogether and represents an image as a continuous function, typically a small neural network mapping coordinates $(x, y)$ to color: the implicit neural representation (INR) lineage started by SIREN (Sitzmann et al., 2020). Once an image is a function, "resolution" becomes a rendering choice rather than a property of the data, enabling arbitrary-scale super-resolution: LIIF began this thread, and Thera (Becker et al., CVPR 2025) made it explicitly anti-aliased by attaching a physically motivated heat-field decay to each frequency component, so that rendering at any scale automatically suppresses frequencies the target grid cannot carry, the Nyquist rule of this section baked into the architecture. Related 2024 to 2026 work on Gaussian-splat image representations pursues the same goal with sums of 2D Gaussians instead of neural fields. The lesson for practitioners: the sampling theorem is not going away; new methods succeed precisely by respecting it by construction.

Exercise 1.2.1: Spot the Alias Conceptual

A 4000 pixel wide photograph of a building contains railings that repeat every 3 pixels. The web team displays it at 800 pixels wide using nearest-neighbor scaling. Predict what the railings will look like and why. Would the artifact disappear if they instead displayed the image at 1333 pixels wide? Explain using the $f_s > 2 f_{\max}$ criterion.

Exercise 1.2.2: Measure the 6 dB Law Coding

Using Code 1.2.3 as a base, quantize a natural photograph (not a ramp) to every bit depth from 8 down to 1. For each depth compute the mean squared error against the original and convert it to a signal-to-noise ratio in dB. Plot SNR versus bits and fit a line: how close is your slope to 6.02 dB per bit, and at which bit depths does the natural image deviate from the uniform-error theory? Inspect the histogram to explain the deviation.

Exercise 1.2.3: Dithering Under Resampling Analysis

Dithered images and resizing interact badly. Take the 1-bit dithered ramp from Code 1.2.4 and downsample it by 2× first with INTER_NEAREST, then with INTER_AREA. Describe and explain the artifacts in each result. Which step of this section's theory did the nearest-neighbor path violate, and why is dithered content especially vulnerable to it?