Chapter 3: Spatial Filtering & Convolution

"I have personally visited every pixel in this image, nine at a time. Ask me anything, provided it concerns my immediate neighborhood."
A Well-Traveled Convolution Kernel

Big Picture

This chapter introduces the single most important operation in the entire book: convolution, the act of replacing each pixel with a weighted combination of its neighbors. Three words capture it, and they are worth keeping for the whole book: slide, multiply, sum. Every filter in this chapter, from a humble blur to a precision edge detector, is one small kernel of numbers slid across the image. The same operation, with the numbers learned from data instead of designed by hand, becomes the convolutional layer at the heart of Chapter 19 and the U-Net denoiser inside the diffusion models of Chapter 33. Learn it well here, where the weights are small enough to read.

Chapter Overview

Everything in Chapter 2 shared one limitation: each output pixel depended only on the single input pixel at the same position. Point operations can brighten, stretch, and threshold, but they are blind to structure. They cannot tell a noisy speckle from a fine texture, or an edge from a gradient, because telling those apart requires looking at a pixel's surroundings. This chapter widens the aperture from one pixel to a neighborhood, and in doing so unlocks the operations that define classical image processing: smoothing, sharpening, and differentiation.

The instrument that does all of this is the kernel: a small grid of weights, typically $3 \times 3$ to $31 \times 31$, that slides across the image and computes a weighted sum at every position. Section 3.1 builds this machinery carefully, distinguishing correlation from true convolution (they differ by a flip that matters more in theory than in daily practice) and implementing both from scratch in NumPy before handing the job to OpenCV and PyTorch. The remaining sections are a tour of what different weights buy you. Section 3.2 covers the smoothing family: the box filter, the Gaussian (the most-used filter in vision), and the median, a nonlinear order-statistic filter that linear theory cannot replicate. Section 3.3 runs the logic in reverse and sharpens, using the unsharp mask trick photographers invented in darkrooms a century before Photoshop. Section 3.4 turns filters into measuring instruments: Sobel, Laplacian, and LoG kernels estimate first and second derivatives, producing the gradient maps that feed the edge and line detectors of Chapter 9.

Two closing sections address the tensions the first four create. Smoothing fights noise but destroys edges; Section 3.5 resolves the conflict with the bilateral and guided filters, which adapt their weights to image content and smooth within regions while respecting boundaries. These edge-aware filters run inside virtually every smartphone camera pipeline shipped today. Finally, Section 3.6 confronts the two engineering questions every practitioner hits within an hour of filtering real images: what happens at the image border, where the kernel hangs off the edge, and how to make filtering fast, where one algebraic property (separability) routinely buys a tenfold speedup.

A word on why this chapter deserves unusual attention. The convolution kernel is this book's signature recurring character. In Part I it is designed by hand: you will choose the weights and know exactly why each one is there. In Chapter 19 the same sliding-window operation returns with learnable weights, and remarkably, the first layers of trained networks rediscover the very filters you build here: oriented edge detectors, blob detectors, and color-opponent kernels. In Chapter 33, stacks of convolutions form the U-Net that turns noise into photographs. Understanding what a $3 \times 3$ kernel can and cannot see is therefore not classical trivia; it is the foundation for reading every architecture diagram in the second half of this book.

To turn this tour into something you can keep, the chapter ends with a single capstone project: the Hands-On Lab, where you assemble every filter family from the six sections into one configurable filtering studio that loads an image, applies a chosen filter, reports the speedup from separability, and saves a labeled before-and-after panel.

Prerequisites

You should be comfortable with images as NumPy arrays, including dtype pitfalls and vectorized arithmetic, from Chapter 0: Foundations: The Python Imaging Stack. The discussion of noise origins and image quality metrics leans on the sensor and sampling story of Chapter 1: Digital Image Fundamentals. Histograms, contrast, and thresholding from Chapter 2: Point Operations, Histograms & Thresholding appear repeatedly as the downstream consumers of filtered images. Basic linear algebra (dot products, outer products, matrix rank, and the singular value decomposition) is assumed throughout and becomes essential in Section 3.6; the Mathematical Foundations appendix is a self-contained refresher for any of these, along with the variance and covariance that Section 3.5 uses.

Chapter Roadmap

3.1 Convolution & Correlation: The Workhorse Operation The sliding-window machinery from scratch: correlation, the flip that makes it convolution, kernel galleries, and the same operation in NumPy, OpenCV, and PyTorch.
3.2 Smoothing: Box, Gaussian & Median Filters Averaging as noise suppression: the box filter and its artifacts, the Gaussian and its sigma, and the median filter that deletes salt-and-pepper noise outright.
3.3 Sharpening & Unsharp Masking Blur, subtract, amplify: the darkroom trick behind every sharpen slider, collapsed into a single kernel, with its halos and noise costs examined.
3.4 Derivative Filters: Sobel, Laplacian & LoG Filters as measuring instruments: gradients, magnitudes, and orientations via Sobel and Scharr, second derivatives via the Laplacian, and blob-finding LoG and DoG.
3.5 Edge-Preserving Smoothing: Bilateral & Guided Filters Smoothing that respects boundaries: the bilateral filter's two Gaussians, the guided filter's linear model, and the smartphone pipelines built on both.
3.6 Borders, Separability & Performance The engineering of filtering: border modes and their artifacts, separable kernels and their tenfold speedups, operation counts, and how production systems make convolution fast.

What's Next?

Spatial filtering describes every operation in this chapter as a sum over neighborhoods, but there is a second, complementary language for the same ideas. In Chapter 4: The Frequency Domain & Multi-Scale Analysis, images become sums of waves, and every kernel in this chapter acquires a frequency-domain identity: the Gaussian is revealed as a low-pass filter, sharpening as high-frequency boost, and convolution itself as simple multiplication of spectra. That viewpoint explains in one stroke why box filters ring, why large-kernel convolution is sometimes faster through the FFT, and how Gaussian and Laplacian pyramids compress an image into a stack of scales.

Hands-On Lab: Build a Configurable Filtering Studio

Duration: about 60 to 75 minutes Difficulty: Intermediate

Objective

Build a small command-line filtering studio that loads any image, applies a filter you select from every family in this chapter (smoothing, sharpening, derivative, and edge-preserving), measures the speedup that separability buys for the Gaussian, and writes a labeled before-and-after comparison panel to disk.

What You'll Practice

Applying box, Gaussian, and median smoothing and reasoning about their artifacts (Section 3.2).
Building an unsharp-mask sharpener from a blur, a subtraction, and an amplification (Section 3.3).
Computing Sobel gradient magnitude and a Laplacian response as measuring instruments (Section 3.4).
Calling the bilateral filter and seeing edge-preserving smoothing in action (Section 3.5).
Timing a full two-dimensional Gaussian against its separable two-pass form to confirm the operation-count argument (Section 3.6).

Setup

One environment, two libraries, and any image of your own. Install with:

pip install opencv-python numpy

Save a test photo as input.jpg in the same folder, or let the script fall back to a synthetic test pattern if no file is found.

Steps

Step 1: Load an image and build a safe fallback

Read the image in grayscale so every filter has a single channel to work on, and generate a synthetic checkerboard-plus-noise pattern when no file is present so the lab always runs.

import cv2
import numpy as np
import time

def load_image(path="input.jpg"):
    img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
    if img is not None:
        return img
    # TODO: build a 256x256 uint8 fallback: a checkerboard with added
    # Gaussian noise, so smoothing and edge filters both have something to do.
    # Hint: np.indices to make the board, np.random.normal for the noise,
    # then np.clip(..., 0, 255).astype(np.uint8).
    raise NotImplementedError

img = load_image()
print("loaded", img.shape, img.dtype)

Hint

A quick checkerboard: board = (((np.indices((256, 256)).sum(axis=0) // 32) % 2) * 200).astype(np.float64), then add np.random.normal(0, 20, board.shape) and clip to the valid range.

Step 2: Implement the smoothing family

Wrap the three smoothers of Section 3.2 behind one function so the studio can switch between them by name. Each returns a uint8 image the same size as the input.

def smooth(img, kind="gaussian", ksize=5):
    if kind == "box":
        return cv2.blur(img, (ksize, ksize))
    if kind == "gaussian":
        # TODO: return a Gaussian blur with this kernel size.
        # Let OpenCV derive sigma from ksize by passing sigmaX=0.
        ...
    if kind == "median":
        return cv2.medianBlur(img, ksize)  # ksize must be odd
    raise ValueError(kind)

Hint

cv2.GaussianBlur(img, (ksize, ksize), 0). The trailing 0 tells OpenCV to compute sigma from the kernel size using its standard rule.

Step 3: Add the unsharp-mask sharpener

Sharpening is blur, subtract, amplify (Section 3.3). Blur the image, form the detail layer as the difference, then add a scaled copy of that detail back onto the original.

def unsharp(img, ksize=5, amount=1.5):
    blurred = cv2.GaussianBlur(img, (ksize, ksize), 0)
    # TODO: combine the original and the blur so the result is
    # original + amount * (original - blurred), in float, then clip
    # back to uint8. cv2.addWeighted does this in one call.
    ...

Hint

cv2.addWeighted(img, 1 + amount, blurred, -amount, 0) computes (1 + amount) * img - amount * blurred, which is exactly the unsharp mask and stays in uint8.

Step 4: Turn filters into measuring instruments

Add the derivative responses of Section 3.4: Sobel gradient magnitude (an edge strength map) and an absolute Laplacian (a second-derivative response). Compute these in a signed float type before scaling back to a viewable image.

def gradient_magnitude(img):
    gx = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=3)
    gy = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=3)
    # TODO: combine gx and gy into a magnitude, then map to 0..255 uint8.
    # Hint: np.hypot for the magnitude, cv2.normalize for the rescale.
    ...

def laplacian_response(img):
    lap = cv2.Laplacian(img, cv2.CV_64F, ksize=3)
    return cv2.convertScaleAbs(lap)  # |lap| scaled into uint8

Hint

mag = np.hypot(gx, gy) then cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX).astype(np.uint8). Computing in CV_64F first avoids clipping the negative gradient lobes to zero.

Step 5: Add edge-preserving smoothing

Bring in the bilateral filter of Section 3.5 so the studio can smooth noise while keeping edges crisp. Note how its two control parameters separate spatial reach from intensity tolerance.

def edge_preserving(img, d=9, sigma_color=75, sigma_space=75):
    # TODO: call the bilateral filter. d is the pixel neighborhood diameter,
    # sigma_color controls how different intensities may be and still blend,
    # sigma_space controls spatial reach.
    ...

Hint

cv2.bilateralFilter(img, d, sigma_color, sigma_space). Raising sigma_color makes the filter behave more like a plain Gaussian; lowering it protects edges more aggressively.

Step 6: Measure the separability speedup

Confirm the central performance claim of Section 3.6. A two-dimensional Gaussian convolution costs $O(k^2)$ per pixel; the same blur done as two one-dimensional passes costs $O(2k)$. Time both on a large kernel and print the ratio.

def time_separability(img, ksize=31, runs=20):
    k = cv2.getGaussianKernel(ksize, 0)          # 1D Gaussian column
    k2d = k @ k.T                                 # outer product: full 2D kernel

    t0 = time.perf_counter()
    for _ in range(runs):
        cv2.filter2D(img, -1, k2d)               # dense 2D convolution
    full = time.perf_counter() - t0

    # TODO: time the separable form with cv2.sepFilter2D(img, -1, k, k),
    # then return the speedup ratio full / separable.
    ...

Hint

Mirror the first timing block but call cv2.sepFilter2D(img, -1, k, k) inside the loop. With a 31 by 31 kernel you should see the separable form run several times faster.

Step 7: Assemble the studio and save a comparison panel

Wire the pieces into a dispatcher keyed by filter name, stack the original beside the filtered result with a labeled divider, and write the panel to disk. This is the artifact you keep.

FILTERS = {
    "box":       lambda im: smooth(im, "box", 5),
    "gaussian":  lambda im: smooth(im, "gaussian", 5),
    "median":    lambda im: smooth(im, "median", 5),
    "sharpen":   lambda im: unsharp(im, 5, 1.5),
    "gradient":  gradient_magnitude,
    "laplacian": laplacian_response,
    "bilateral": edge_preserving,
}

def run_studio(img, name):
    out = FILTERS[name](img)
    # TODO: place img and out side by side with a thin white divider column,
    # then cv2.imwrite a file named f"studio_{name}.png".
    # Hint: np.hstack with a (H, 4) white strip between the two images.
    ...

for name in FILTERS:
    run_studio(img, name)
ratio = time_separability(img)
print(f"separable Gaussian speedup at k=31: {ratio:.1f}x")

Hint

Build the divider with divider = np.full((img.shape[0], 4), 255, np.uint8), then panel = np.hstack([img, divider, out]) and cv2.imwrite(f"studio_{name}.png", panel).

Expected Output

Seven PNG files, one per filter, each a side-by-side panel of the original and the filtered result. The studio_median.png panel should show salt-and-pepper specks erased while edges stay sharp; studio_gradient.png should show bright contours on a dark background; studio_bilateral.png should look smoother than the original yet keep crisp boundaries. The console prints one line, for example separable Gaussian speedup at k=31: 6.8x; the exact number depends on your machine, but the separable form should be clearly faster, confirming the $O(2k)$ versus $O(k^2)$ argument of Section 3.6.

Stretch Goals

Add a border-mode switch (cv2.BORDER_REFLECT, BORDER_REPLICATE, BORDER_CONSTANT) and save panels that make the border artifacts of Section 3.6 visible at the image edge.
Library shortcut: replace your hand-built gradient_magnitude with a single call to skimage.filters.sobel from scikit-image and confirm the maps match; this mirrors the book's Right Tool principle, where a from-scratch build is followed by the few-line production equivalent.
Reach forward to Chapter 19: load your 2D Gaussian kernel into a torch.nn.Conv2d layer as fixed weights and verify it reproduces the OpenCV blur, previewing convolution with learnable weights.

Complete Solution

import cv2
import numpy as np
import time

def load_image(path="input.jpg"):
    img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
    if img is not None:
        return img
    board = (((np.indices((256, 256)).sum(axis=0) // 32) % 2) * 200).astype(np.float64)
    noisy = board + np.random.normal(0, 20, board.shape)
    return np.clip(noisy, 0, 255).astype(np.uint8)

def smooth(img, kind="gaussian", ksize=5):
    if kind == "box":
        return cv2.blur(img, (ksize, ksize))
    if kind == "gaussian":
        return cv2.GaussianBlur(img, (ksize, ksize), 0)
    if kind == "median":
        return cv2.medianBlur(img, ksize)
    raise ValueError(kind)

def unsharp(img, ksize=5, amount=1.5):
    blurred = cv2.GaussianBlur(img, (ksize, ksize), 0)
    return cv2.addWeighted(img, 1 + amount, blurred, -amount, 0)

def gradient_magnitude(img):
    gx = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=3)
    gy = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=3)
    mag = np.hypot(gx, gy)
    return cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX).astype(np.uint8)

def laplacian_response(img):
    lap = cv2.Laplacian(img, cv2.CV_64F, ksize=3)
    return cv2.convertScaleAbs(lap)

def edge_preserving(img, d=9, sigma_color=75, sigma_space=75):
    return cv2.bilateralFilter(img, d, sigma_color, sigma_space)

def time_separability(img, ksize=31, runs=20):
    k = cv2.getGaussianKernel(ksize, 0)
    k2d = k @ k.T

    t0 = time.perf_counter()
    for _ in range(runs):
        cv2.filter2D(img, -1, k2d)
    full = time.perf_counter() - t0

    t0 = time.perf_counter()
    for _ in range(runs):
        cv2.sepFilter2D(img, -1, k, k)
    sep = time.perf_counter() - t0

    return full / sep

FILTERS = {
    "box":       lambda im: smooth(im, "box", 5),
    "gaussian":  lambda im: smooth(im, "gaussian", 5),
    "median":    lambda im: smooth(im, "median", 5),
    "sharpen":   lambda im: unsharp(im, 5, 1.5),
    "gradient":  gradient_magnitude,
    "laplacian": laplacian_response,
    "bilateral": edge_preserving,
}

def run_studio(img, name):
    out = FILTERS[name](img)
    divider = np.full((img.shape[0], 4), 255, np.uint8)
    panel = np.hstack([img, divider, out])
    cv2.imwrite(f"studio_{name}.png", panel)

if __name__ == "__main__":
    img = load_image()
    print("loaded", img.shape, img.dtype)
    for name in FILTERS:
        run_studio(img, name)
        print("wrote", f"studio_{name}.png")
    ratio = time_separability(img)
    print(f"separable Gaussian speedup at k=31: {ratio:.1f}x")

Bibliography & Further Reading

Foundational Papers

Tomasi, C. and Manduchi, R. "Bilateral Filtering for Gray and Color Images." ICCV (1998). doi:10.1109/ICCV.1998.710815

The paper that named and popularized the bilateral filter taught in Section 3.5; short, readable, and still the cleanest statement of range-versus-domain filtering.

He, K., Sun, J., and Tang, X. "Guided Image Filtering." IEEE TPAMI (2013). doi:10.1109/TPAMI.2012.213

Introduces the guided filter of Section 3.5: an edge-preserving smoother whose cost is independent of kernel radius, now standard in camera ISPs and matting pipelines.

Marr, D. and Hildreth, E. "Theory of Edge Detection." Proceedings of the Royal Society B (1980). doi:10.1098/rspb.1980.0020

The Laplacian-of-Gaussian and zero-crossing theory behind Section 3.4, grounded in a computational account of biological vision.

Perona, P. and Malik, J. "Scale-Space and Edge Detection Using Anisotropic Diffusion." IEEE TPAMI (1990). doi:10.1109/34.56205

The PDE view of edge-preserving smoothing: iterate a diffusion that slows down at edges. Useful contrast to the single-pass bilateral and guided filters of Section 3.5.

Getreuer, P. "A Survey of Gaussian Convolution Algorithms." Image Processing On Line (2013). ipol.im/pub/art/2013/87

Everything about computing Gaussian blur fast, with reference code: FIR truncation, recursive IIR approximations, and accuracy-versus-speed tradeoffs relevant to Section 3.6.

Recent Research (2022-2026)

Ding, X. et al. "Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs." CVPR (2022). arXiv:2203.06717

RepLKNet: the paper that restarted the large-kernel conversation, showing hand-tuned depthwise kernels up to 31 pixels wide rival vision transformers.

Ding, X. et al. "UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition." CVPR (2024). arXiv:2311.15599

Design rules for very large kernels across modalities; a 2024 answer to "how big should a kernel be," sixty years after the question was first asked.

Ye, Y. et al. "DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection." AAAI (2024). arXiv:2401.02032

Edge detection reborn as generation: a diffusion model produces single-pixel-wide edge maps, the modern descendant of the Sobel and LoG operators in Section 3.4.

Yu, F. et al. "Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild." CVPR (2024). arXiv:2401.13627

SUPIR: generative restoration that synthesizes plausible detail rather than amplifying recorded contrast, the modern counterpoint to the sharpening of Section 3.3.

Liu, Y. et al. "VMamba: Visual State Space Model." (2024). arXiv:2401.10166

A linear-time alternative to both convolution and attention; useful perspective on the operation-count analysis of Section 3.6.

Books

Szeliski, R. Computer Vision: Algorithms and Applications, 2nd edition (2022). szeliski.org/Book

Chapter 3 of Szeliski covers the same ground as this chapter with full mathematical depth; the book is free online and the standard reference for the field.

Gonzalez, R. and Woods, R. Digital Image Processing, 4th edition. imageprocessingplace.com

The classic textbook treatment of spatial filtering, with exhaustive worked examples of every kernel family in this chapter.

Tools & Libraries

OpenCV. "Smoothing Images" tutorial and imgproc filtering reference. docs.opencv.org

The official OpenCV 4.x walkthrough of blur, GaussianBlur, medianBlur, and bilateralFilter used throughout this chapter.

SciPy. scipy.ndimage multidimensional image processing reference. docs.scipy.org

Reference implementations of correlate, convolve, and Gaussian filtering with explicit border-mode control, used in Sections 3.1 and 3.6.

PyTorch. torch.nn.Conv2d documentation. pytorch.org/docs

The learnable version of this chapter's central operation; note the documentation's admission that it actually computes cross-correlation, as explained in Section 3.1.