Part I: Image Processing
Chapter 3: Spatial Filtering & Convolution

Derivative Filters: Sobel, Laplacian & LoG

"I don't care what the pixel value is. I care where it's going. Show me change or show me zeros."

An Edge Detector Who Sees Things in Black and White
Big Picture

Derivative filters turn convolution from a beautifier into a measuring instrument: they estimate, at every pixel, how fast brightness changes and in which direction. The gradient maps they produce are the raw material of edge detection in Chapter 9, the orientation histograms of classical recognition, and the patterns that the first layers of trained CNNs in Chapter 19 spontaneously rediscover. This section builds the three classical instruments (Sobel for first derivatives, Laplacian for second, LoG/DoG for scale-tuned blobs) and hammers on the one rule that makes them all usable: differentiation amplifies noise, so every practical derivative filter smooths first.

So far this chapter has produced images for humans: smoothed in Section 3.2, sharpened in Section 3.3. This section produces images for algorithms. A gradient map looks like a ghostly line drawing, not a photograph, but it answers the question downstream code actually asks: where does the scene change, how abruptly, and in what direction?

1. Derivatives on a Pixel Grid Intermediate

An image is a sampled function, so derivatives must become finite differences. For the horizontal partial derivative, three candidates present themselves: forward difference $I(x+1) - I(x)$, backward difference $I(x) - I(x-1)$, and the central difference:

$$ \frac{\partial I}{\partial x} \;\approx\; \frac{I(x+1, y) - I(x-1, y)}{2} $$

The central difference wins on accuracy (its error is second-order in the sample spacing, versus first-order for the others) and on symmetry (it assigns the derivative to the pixel itself, not to a half-pixel offset). As a kernel it is $\frac{1}{2}[\,-1\;\;0\;\;+1\,]$, an antisymmetric row whose weights sum to zero, the signature of every derivative filter: flat regions must map to exactly zero response.

Stacking both partials gives the gradient vector $\nabla I = (\partial I/\partial x,\; \partial I/\partial y)$, summarized by two scalar fields that downstream code consumes constantly:

$$ \text{magnitude:}\;\; \|\nabla I\| = \sqrt{I_x^2 + I_y^2} \qquad \text{orientation:}\;\; \theta = \operatorname{atan2}(I_y,\, I_x) $$

The magnitude is large wherever brightness changes quickly (edges), and the orientation points across the edge, perpendicular to its direction. These two numbers per pixel are the most reused intermediate quantity in classical vision.

2. Sobel, Prewitt, Scharr: Smoothed Derivatives Intermediate

The bare central difference has a fatal flaw: noise. Differentiation is a high-frequency amplifier (in Chapter 4's terms, its frequency response grows with frequency), and pixel noise is precisely high-frequency. Apply $[\,-1\;0\;+1\,]$ to a real photograph and the output is a blizzard of noise responses with edges barely visible inside it. The cure is the chapter's recurring move: smooth first. Smoothing perpendicular to the differentiation direction suppresses noise without diluting the derivative itself, and packaging both into one kernel via the separability logic of Section 3.6 gives the Sobel operator:

$$ S_x = \begin{bmatrix} -1 & 0 & +1 \\ -2 & 0 & +2 \\ -1 & 0 & +1 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix} \begin{bmatrix} -1 & 0 & +1 \end{bmatrix}, \qquad S_y = S_x^\top $$

Read the factorization: a vertical $[1\;2\;1]$ smoothing column (a tiny Gaussian approximation) times the horizontal derivative row. Prewitt uses uniform $[1\;1\;1]$ smoothing instead; Scharr re-optimizes the weights to $[3\;10\;3]$ for markedly better rotational accuracy, which matters whenever the orientation $\theta$ feeds geometry. All three are the same idea with different smoothing budgets. For larger noise, OpenCV's ksize parameter grows the Sobel kernel ($5 \times 5$, $7 \times 7$), folding in more smoothing.

Key Insight: There Is No Pure Derivative Filter

Every derivative filter that works on real images is a smoothed derivative; the only question is how much smoothing and in which direction. Sobel hides a $[1\;2\;1]$ blur, LoG wears its Gaussian openly in its name, and the Canny detector of Chapter 9 begins by choosing a $\sigma$. Differentiation and smoothing are not opposites in tension; they are inseparable halves of one operation, because measuring change meaningfully requires first deciding the scale at which change counts.

Figure 3.4.1 shows the geometry that makes first and second derivatives complementary instruments: across an edge, the first derivative peaks at the transition, while the second derivative crosses zero there, with lobes of opposite sign on either side.

intensity I(x): a soft edge first derivative I′(x): peak at the edge maximum marks the edge second derivative I″(x): zero crossing at the edge sign flip locates the edge precisely
Figure 3.4.1 The derivative view of an edge. The first derivative (orange) peaks where intensity climbs fastest, so edges are gradient-magnitude maxima. The second derivative (purple) swings positive then negative, crossing zero exactly at the transition, the basis of zero-crossing edge detection.

In code, the cardinal rule is to compute derivatives in a signed float type. A Sobel response is negative on dark-to-light transitions read right-to-left, and storing it in uint8 silently destroys half the signal, the second classic dtype bug of this chapter after Section 3.3's wraparound.

import cv2
import numpy as np

gray = cv2.imread("staircase.jpg", cv2.IMREAD_GRAYSCALE)

# Signed float output (CV_32F) is essential: gradients go negative.
gx = cv2.Sobel(gray, cv2.CV_32F, 1, 0, ksize=3)   # d/dx
gy = cv2.Sobel(gray, cv2.CV_32F, 0, 1, ksize=3)   # d/dy

mag, ang = cv2.cartToPolar(gx, gy, angleInDegrees=True)

print(f"magnitude: max {mag.max():.0f}, mean {mag.mean():.1f}")
print(f"strong-edge pixels (mag > 100): {(mag > 100).mean():.1%}")
# Representative output:
# magnitude: max 1040, mean 21.3
# strong-edge pixels (mag > 100): 4.7%
# Typical scenes are mostly flat: a few percent of pixels carry the edges.

# For display only: rescale magnitude into [0, 255].
view = cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX).astype(np.uint8)
Computing the gradient field with cv2.Sobel in signed float, converting to magnitude and orientation via cartToPolar, and confirming the standard statistic that only a few percent of pixels in a natural scene are strong-edge pixels.
Library Shortcut: One Gradient Call in Practice

Hand-building $S_x$ and applying it via filter2D, as Section 3.1 taught, takes about ten lines per axis; cv2.Sobel(gray, cv2.CV_32F, 1, 0) is one. The library applies the kernel separably (two 1D passes), offers cv2.Scharr as a drop-in with better rotation accuracy at identical cost, and grows the smoothing via ksize without you re-deriving weights. scikit-image's filters.sobel goes further, returning a pre-normalized magnitude in one call: a 20-to-1 reduction when both axes, magnitude, and scaling are counted.

What are gradient maps for? Thresholding the magnitude with the techniques of Chapter 2 yields a crude but serviceable edge map. Histograms of the orientation channel summarize the dominant directions in a region, an idea that matures into the HOG descriptors of classical recognition and the keypoint orientations of Chapter 10. The example below is among the simplest profitable uses: estimating the skew of a scanned document from its gradient orientations.

Practical Example: Deskewing a Million Receipts

Who: A two-person computer vision team at an expense-management SaaS, ingesting photographed receipts from a mobile app.

Situation: Their OCR engine's accuracy fell off a cliff when receipts were rotated more than about 3 degrees, and users photograph receipts at every angle physics permits.

Problem: A learned orientation model was on the roadmap, but the backlog was immediate, and the team had two weeks and a CPU-only inference budget.

Decision: A 30-line classical fix: Sobel gradients on a downscaled grayscale image, keep pixels with strong magnitude, histogram their orientations into half-degree bins, and take the dominant bin (text lines produce overwhelming horizontal-edge mass) as the skew estimate, then counter-rotate before OCR.

Result: Median absolute skew after correction fell below 0.4 degrees on a 5,000-receipt validation set; end-to-end OCR field accuracy rose 11 points; latency cost was 9 ms per image on one CPU core. The "interim" fix outlived the roadmap item, which was eventually descoped.

Lesson: Gradient orientation is nearly free, and aggregated over a whole image it is robust to exactly the local noise that breaks per-pixel decisions. Before reaching for a model, check whether the statistic you need is already lying in the gradient field.

3. The Laplacian: Change of Change Intermediate

The Laplacian $\nabla^2 I = \partial^2 I/\partial x^2 + \partial^2 I/\partial y^2$ sums the unmixed second derivatives. Applying the finite-difference recipe twice per axis and adding gives the standard 4-neighbor kernel, with an 8-neighbor variant that includes diagonals:

$$ \nabla^2_{4} = \begin{bmatrix} 0 & 1 & 0 \\ 1 & -4 & 1 \\ 0 & 1 & 0 \end{bmatrix} \qquad \nabla^2_{8} = \begin{bmatrix} 1 & 1 & 1 \\ 1 & -8 & 1 \\ 1 & 1 & 1 \end{bmatrix} $$

Both are rotationally symmetric (unlike Sobel, the Laplacian has no preferred direction; it measures total curvature in one number) and both sum to zero. You met the 4-neighbor version inside Section 3.3's sharpening kernel, and Figure 3.4.1's bottom panel shows its behavior at an edge: a positive lobe, a negative lobe, and a zero crossing at the transition itself. Zero crossings localize edges more precisely than thresholded gradient peaks, and they form closed contours, properties that made them the foundation of an entire edge-detection school. Their weakness is the theme of this section squared: two rounds of differentiation amplify noise twice as enthusiastically as one, so the raw Laplacian of a photograph is unusable without prior smoothing. Which leads directly to the section's final construction.

4. LoG and DoG: Derivatives With a Scale Dial Advanced

Smooth with a Gaussian, then take the Laplacian. By the associativity from Section 3.1, the two convolutions collapse into one kernel, the Laplacian of Gaussian:

$$ \mathrm{LoG}_\sigma(x, y) = \nabla^2 (G_\sigma * I) = (\nabla^2 G_\sigma) * I, \qquad \nabla^2 G_\sigma(r) = \frac{r^2 - 2\sigma^2}{\sigma^4}\, G_\sigma(r) $$

The kernel looks like a sombrero: a negative center pit ringed by a positive annulus (or the inverse, by sign convention). It responds maximally to blobs, roughly circular regions of size matched to $\sigma$, dark-on-light or light-on-dark, and it crosses zero on their boundaries. The parameter $\sigma$ is now a genuine instrument dial: sweep it and the filter detects structure at chosen scales, the idea formalized by Marr and Hildreth in 1980 as a theory of biological edge detection and later industrialized by scale-space theory in Chapter 4.

Because building exact LoG kernels for many sigmas is wasteful, practice substitutes the Difference of Gaussians: $\mathrm{DoG} = G_{k\sigma} * I - G_{\sigma} * I$, which approximates the LoG closely when $k \approx 1.6$ and costs only Gaussian blurs, which Section 3.6 makes nearly free. A stack of DoG responses across scales is precisely the detection engine inside SIFT, the keypoint detector of Chapter 10; the code below builds one level of it.

import cv2
import numpy as np

gray = cv2.imread("cells.png", cv2.IMREAD_GRAYSCALE).astype(np.float32)

# --- LoG: smooth, then Laplacian (associativity makes this exact) ---
sigma = 2.0
smooth = cv2.GaussianBlur(gray, (0, 0), sigma)
log = cv2.Laplacian(smooth, cv2.CV_32F, ksize=3)

# --- DoG: difference of two Gaussians approximates LoG at ~zero cost ---
k = 1.6                                            # canonical scale ratio
dog = cv2.GaussianBlur(gray, (0, 0), k * sigma) \
    - cv2.GaussianBlur(gray, (0, 0), sigma)

# The two responses are proportional; compare after normalization.
corr = np.corrcoef(log.ravel(), dog.ravel())[0, 1]
print(f"LoG vs DoG correlation: {corr:.3f}")
# Representative output:
# LoG vs DoG correlation: 0.987   (DoG is a positive multiple of LoG here)
Building a Laplacian-of-Gaussian response by smoothing then differentiating, and verifying numerically that the cheap Difference-of-Gaussians with scale ratio 1.6 reproduces its shape almost perfectly.
Fun Fact

The LoG's sombrero profile shows up in wet biology: the center-surround receptive fields of retinal ganglion cells, mapped by Kuffler in 1953, are well modeled by a Difference of Gaussians. Your retina has been running DoG filtering for several hundred million years of evolutionary uptime, which makes it the most field-tested image-processing deployment in existence.

Research Frontier: Edges, Learned and Generated

Hand-designed derivative kernels have two modern afterlives. First, they keep being rediscovered: visualize the first convolutional layer of nearly any trained image network from Chapter 19 and a large fraction of the learned kernels are oriented edge and blob filters, Sobel's and LoG's statistical descendants. Second, edge detection itself moved to generation: DiffusionEdge (Ye et al., AAAI 2024, arXiv:2401.02032) trains a diffusion model to emit crisp single-pixel edge maps directly, outperforming CNN edge detectors that needed post-hoc thinning. Meanwhile the largest consumer of classical edge maps in 2024-2026 is, unexpectedly, image generation: ControlNet-style conditioning feeds Canny or HED edge maps into diffusion models to control composition, a pipeline Chapter 35 dissects. The humble gradient survived the deep learning revolution on both sides of the camera.

Exercise 3.4.1: Kernel Bookkeeping Conceptual

For each kernel in this section ($S_x$, $S_y$, $\nabla^2_4$, $\nabla^2_8$, and a LoG), state: the sum of its weights, its symmetry under 180-degree rotation (and hence whether the convolution-versus-correlation flip from Section 3.1 matters for it), and what it returns on a perfect linear intensity ramp $I(x,y) = 3x + 7$. The ramp answers reveal the deepest difference between first- and second-derivative filters.

Exercise 3.4.2: Build a Blob Detector Coding

Using only cv2.GaussianBlur and NumPy, implement a multi-scale DoG blob detector: compute DoG responses for $\sigma \in \{2, 4, 8, 16\}$ (ratio 1.6), find local extrema above a threshold in each response with scipy.ndimage.maximum_filter, and draw a circle of radius $\sqrt{2}\sigma$ at each detection. Run it on an image of coins or cells and report how detected radii track object sizes across scales.

Exercise 3.4.3: Noise Versus Derivatives Analysis

Add Gaussian noise of $\sigma_n \in \{2, 5, 10, 20\}$ to a clean test image and compute (a) the raw central-difference gradient, (b) Sobel with ksize=3, and (c) Sobel after a Gaussian pre-blur of $\sigma = 2$. For each, measure the fraction of "strong edge" pixels (magnitude above a fixed threshold) that occur in regions you know to be flat. Plot false-edge fraction versus noise level for the three pipelines and explain the ordering using the smoothed-derivative principle from this section.