Section 9.2: The Canny Edge Detector, Step by Step

"I accept strong edges on the spot. Weak edges need a strong edge to vouch for them. It's not paranoia; it's hysteresis."
A Canny Detector With Trust Issues

Big Picture

The Canny detector closes the gradient-to-edge gap of Section 9.1 with two ideas that decide with context instead of pixel by pixel: non-maximum suppression thins gradient ribbons to one-pixel crests, and hysteresis lets confident edges rescue their faint continuations. What makes it remarkable is its pedigree: rather than tinkering toward a heuristic, John Canny in 1986 wrote down what an optimal edge detector should achieve (find real edges, place them accurately, mark each once), derived the operator that best satisfies the criteria, and arrived at a four-stage pipeline so well balanced that it remains the default edge detector in every vision library four decades later, and lately the favorite conditioning signal of the diffusion models in Chapter 35.

The previous section ended with a measured indictment: a thresholded gradient map keeps thick ribbons instead of curves and shatters real contours no matter where the threshold sits. This section builds the machine that fixes both defects. We will walk the four stages in order, implement the whole detector from scratch in NumPy, verify it against OpenCV's, and finish with the one-line call and its tuning knobs.

1. What Should an Edge Detector Optimize? Intermediate

Canny's contribution begins before any code, with three criteria for judging any edge detector applied to a noisy step edge:

Good detection. Maximize the probability of marking real edges and minimize the probability of marking noise. Good localization. Marked points should sit as close as possible to the true edge position. Single response. One edge should produce one mark, not a cluster.

The criteria fight each other. Detection improves with more smoothing (noise dies faster than the step does), but localization worsens, because smoothing smears the step's position; this is the $\sigma$ trade-off that Section 9.1 flagged. Canny formalized the trade-off, solved for the 1D operator maximizing a product of the criteria, and found the optimum is approximated within about 20 percent by something we already own: the first derivative of a Gaussian, exactly the "smooth, then differentiate" composite of Chapter 3. The deep reassurance of the result is that the standard practice was not just convenient; it is close to provably the best a linear filter can do. What the optimal filter cannot do alone is satisfy criterion three and produce thin, connected curves; that requires the two nonlinear stages that make Canny's pipeline famous. Figure 9.2.1 lays out all four stages.

Figure 9.2.1 The four-stage Canny pipeline. The first two stages are the smoothed-gradient machinery of Chapter 3; the last two are nonlinear decision stages, and they are where the thick-ribbon and broken-contour defects of Section 9.1 get repaired.

Remember the Four Stages: Smooth, Measure, Thin, Connect

Four verbs hold the whole detector in memory, in order: smooth (Gaussian, to confiscate noise's megaphone), measure (gradient magnitude and orientation), thin (non-maximum suppression, ribbons to one-pixel crests), and connect (hysteresis, strong edges recruit their weak continuations). The first two are linear and near-optimal by Canny's criteria; the last two are the nonlinear decisions that no per-pixel threshold can make. If a from-scratch detector misbehaves, the symptom names the broken verb: speckle means smooth failed, double lines mean thin failed, broken contours mean connect failed.

2. Stages 1 and 2: Smooth, Then Measure Intermediate

These two stages look like a formality, and that is exactly why they cause more production bugs than the famous nonlinear stages do: the trap is hidden in a default everyone assumes is there and is not. Convolve with a Gaussian of scale $\sigma$, then apply Sobel kernels to get $I_x$, $I_y$, and from them magnitude and orientation. The single decision is $\sigma$, and it should be made consciously: $\sigma \approx 1$ to $1.4$ preserves fine texture edges; $\sigma \approx 2$ to $3$ is a sensible default for natural photographs; larger values are for finding only the dominant contours. One practical warning that surprises many users: cv2.Canny performs no internal Gaussian smoothing; it runs Sobel directly on whatever you pass it. Smoothing is your job (or use skimage.feature.canny, which takes an explicit sigma argument). Feeding a raw noisy image to cv2.Canny and wondering about speckle is among the most common edge-detection bugs in production code, a sibling of the denoise-first discipline from Chapter 7.

3. Stage 3: Non-Maximum Suppression Advanced

Stage 3 attacks the thick ribbons. The insight is geometric: crossing an edge perpendicularly, gradient magnitude rises, peaks at the true boundary, and falls. The ribbon exists because we kept the whole rise and fall; the boundary is just the crest. So at every pixel, look one step in the gradient direction and one step against it, and keep the pixel only if its magnitude is at least as large as both neighbors. The result is called non-maximum suppression (NMS): everything that is not a local maximum along the gradient direction is deleted, collapsing each ribbon to a one-pixel crest line exactly perpendicular to the local gradient, the geometry established in Figure 9.1.2 of Section 9.1.

On a discrete grid "one step in the gradient direction" needs care: a gradient at 37 degrees points between pixels. The classical implementation quantizes orientation into four bins (0, 45, 90, 135 degrees), each selecting one of the four neighbor pairs in the 3×3 neighborhood; finer implementations interpolate magnitude between the two pixels straddling the true direction. Figure 9.2.2 shows the quantized version that our from-scratch code implements.

Figure 9.2.2 Non-maximum suppression in a 3×3 neighborhood. The gradient at the center pixel p points up-right (bin 45 degrees), so p is compared against its upper-right and lower-left neighbors (shaded); p survives only as a local maximum along the gradient line, which thins every gradient ribbon to a one-pixel crest.

Key Insight: NMS Keeps One Pixel in Four and Deletes Nothing You Wanted

Here is the number that makes non-maximum suppression click. A typical photographic edge is a ramp three to five pixels wide, so a plain threshold keeps that whole flank: roughly four "edge" pixels for every one true boundary point. NMS walks each flank, finds the single crest, and discards the other three, cutting the surviving pixel count by about a factor of four without losing a single boundary. Watch what happens to the same street photograph: thresholding the gradient at $T = 60$ kept 4.8 percent of pixels in Section 9.1's experiment, yet the finished Canny map (NMS plus hysteresis) keeps only about 3 percent, and those survivors are one pixel wide instead of four or five. The deletions are pure waste removal: every pixel NMS throws away was an off-crest member of a ramp the kept pixel already represents. That is why downstream voters in Section 9.3 finally get one ballot per boundary point instead of a stuffed box.

4. Stage 4: Double Threshold and Hysteresis Advanced

Stage 4 attacks the broken contours, and it does so by refusing to use one threshold. Pick two, $T_{\text{high}}$ and $T_{\text{low}}$ (Canny suggested a ratio between 2:1 and 3:1). Pixels above $T_{\text{high}}$ are strong: accepted unconditionally. Pixels below $T_{\text{low}}$ are rejected unconditionally. The pixels in between are weak: accepted if and only if they connect, directly or through a chain of other weak pixels, to some strong pixel. Implementation is a flood fill seeded at the strong pixels and allowed to spread only through weak ones; readers of Chapter 6 will recognize this as morphological reconstruction of the weak map from strong markers. The illustration below captures the social logic of the rule: a strong edge is admitted on sight and can vouch for a weak neighbor, while a faint speckle with no sponsor is turned away.

At a friendly nightclub door, a glowing strong-edge dot is admitted instantly and vouches for a dimmer weak-edge dot beside it, while a lone faint speckle with no sponsor is turned away, illustrating Canny hysteresis where weak pixels are kept only when connected to a strong edge. — Hysteresis is a prior about the world: a faint gradient touching a confident contour is probably that contour continuing, while the same faint gradient alone in the sky is just noise.

Why this works is worth saying plainly. A real contour whose contrast dips for a stretch produces weak pixels precisely along the dip, sandwiched between strong segments, so the flood fill recovers it whole. An isolated noise response has the same magnitude as that dip but no strong sponsor anywhere nearby, so it dies. The decision uses geometry that no per-pixel threshold can see: membership in a connected structure. Figure 9.2.3 traces the rule along one contour, showing the two thresholds as horizontal lines, the three bands they carve out, and a weak segment in a contrast dip being rescued because strong neighbors flank it on both sides.

Figure 9.2.3 Hysteresis read along a single contour. The high threshold $T_{\text{high}}$ and low threshold $T_{\text{low}}$ split the gradient magnitude into three bands; the contour's contrast dips into the weak band partway along, but because strong segments flank the dip on both sides the flood fill recruits it back (green span), while an isolated weak bump elsewhere (gray), having no strong sponsor, is deleted. One threshold could not tell these two weak signals apart; connectivity can.

Key Insight: Hysteresis Is a Prior About the World, Not a Trick

Double thresholding encodes a physical belief: boundaries in the world are continuous curves, so edge evidence should be judged by the company it keeps. A weak gradient adjoining a confident contour is probably that contour continuing; the same weak gradient alone in flat sky is probably noise. This is the chapter's first instance of a pattern that defines the rest of the book: when per-pixel evidence is ambiguous, resolve it with structure. The Hough transform of Section 9.3 (pixels voting for shared geometry) and the RANSAC consensus of Section 9.4 are the same idea at ever larger scales.

5. The Whole Detector From Scratch Advanced

Forty lines of NumPy implement everything above. The code keeps each stage visibly separate so you can instrument any of them, and then checks itself against OpenCV's implementation.

# Canny from scratch: the four stages kept visibly separate so any one
# can be instrumented, then validated against cv2.Canny on the same input.
# Each stage below corresponds to one paragraph of this section.
import cv2
import numpy as np
from collections import deque

def canny_from_scratch(gray, sigma=1.4, low=40, high=90):
    # Stage 1: Gaussian smoothing (sigma chosen by the caller, deliberately)
    img = cv2.GaussianBlur(gray.astype(np.float32), (0, 0), sigma)

    # Stage 2: gradient magnitude and orientation
    gx = cv2.Sobel(img, cv2.CV_32F, 1, 0, ksize=3)
    gy = cv2.Sobel(img, cv2.CV_32F, 0, 1, ksize=3)
    mag = np.hypot(gx, gy)
    ang = np.rad2deg(np.arctan2(gy, gx)) % 180.0      # fold to [0, 180)

    # Stage 3: non-maximum suppression with 4-bin orientation quantization
    q = (((ang + 22.5) // 45).astype(int)) % 4         # 0,45,90,135 deg bins
    offs = [(0, 1), (-1, 1), (-1, 0), (-1, -1)]        # neighbor along gradient
    H, W = mag.shape
    nms = np.zeros_like(mag)
    for y in range(1, H - 1):
        for x in range(1, W - 1):
            dy, dx = offs[q[y, x]]
            m = mag[y, x]
            if m >= mag[y + dy, x + dx] and m >= mag[y - dy, x - dx]:
                nms[y, x] = m                          # crest pixels survive

    # Stage 4: double threshold + hysteresis (flood fill from strong seeds)
    strong = nms >= high
    weak = (nms >= low) & ~strong
    edges = strong.copy()
    queue = deque(zip(*np.nonzero(strong)))
    while queue:
        y, x = queue.popleft()
        for dy in (-1, 0, 1):
            for dx in (-1, 0, 1):
                yy, xx = y + dy, x + dx
                if 0 <= yy < H and 0 <= xx < W and weak[yy, xx] and not edges[yy, xx]:
                    edges[yy, xx] = True               # strong recruits weak
                    queue.append((yy, xx))
    return edges

gray = cv2.imread("street.jpg", cv2.IMREAD_GRAYSCALE)
mine = canny_from_scratch(gray, sigma=1.4, low=40, high=90)
ref = cv2.Canny(cv2.GaussianBlur(gray, (0, 0), 1.4), 40, 90) > 0

print(f"edge pixels (ours): {mine.mean():.2%}   (OpenCV): {ref.mean():.2%}")
print(f"pixelwise agreement: {(mine == ref).mean():.1%}")
# Representative output:
# edge pixels (ours): 3.08%   (OpenCV): 3.21%
# pixelwise agreement: 99.3%

Code Fragment 1: The full canny_from_scratch pipeline in four labeled stages (Gaussian blur, Sobel gradient, non-maximum suppression, hysteresis flood-fill), validated against cv2.Canny on the same pre-blurred input: agreement above 99 percent of pixels, with the residual disagreement concentrated on borderline weak pixels where OpenCV's interpolated NMS and our 4-bin quantization make different calls.

The Python loops make this implementation a teaching instrument, not a production one: it runs in seconds where OpenCV takes about a millisecond. But every line corresponds to a sentence in this section, and instrumenting it (dump the nms array, count how many weak pixels hysteresis rescued) is the fastest way to make the pipeline's behavior concrete. On a typical street photograph, roughly a third of the final edge pixels arrive via rescue rather than direct strong acceptance, which is the broken-contour defect of Section 9.1 being repaired in bulk.

6. The One-Liner and Its Knobs Intermediate

In production you call the library, and the craft moves into choosing arguments. The most durable trick is deriving the thresholds from image statistics, in the spirit of Chapter 2's histogram methods, so that one configuration survives across exposures and scenes:

# Production Canny: pre-blur explicitly (cv2.Canny never smooths), then
# derive the two hysteresis thresholds from the image median so one
# configuration survives across exposures and scenes.
import cv2
import numpy as np

gray = cv2.imread("street.jpg", cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(gray, (0, 0), 1.4)     # cv2.Canny does NOT smooth

# Auto-thresholds from the median: robust to exposure changes
v = float(np.median(blur))
low, high = int(max(0, 0.66 * v)), int(min(255, 1.33 * v))

edges = cv2.Canny(blur, low, high,
                  apertureSize=3,      # Sobel kernel size: 3, 5, or 7
                  L2gradient=True)     # true Euclidean magnitude, not |gx|+|gy|

print(f"median {v:.0f} -> thresholds ({low}, {high}); "
      f"edge pixels: {(edges > 0).mean():.2%}")
# Representative output:
# median 118 -> thresholds (77, 156); edge pixels: 2.64%

Code Fragment 2: The production pattern: explicit pre-blur (because cv2.Canny applies Sobel directly with no internal smoothing), low and high derived from np.median(blur) so the thresholds track exposure automatically, and L2gradient=True for an isotropic magnitude worth its small extra cost.

Library Shortcut: 40 Lines Become One

Our from-scratch detector is about 40 lines plus a validation harness; cv2.Canny(blur, low, high) is one line, and skimage.feature.canny(gray, sigma=1.4) is one line that also folds in the Gaussian stage and uses sub-bin interpolated NMS for slightly cleaner crests. Internally the libraries handle the parts our version simplified: interpolation between neighbors instead of 4-bin quantization, border policies, a scan-line hysteresis that avoids Python-level queues, and SIMD-parallelized gradient kernels, which is how they run three orders of magnitude faster. Write the from-scratch version once to own the concepts; ship the library call.

Practical Example: The Document Scanner That Stopped Missing Pages

Who: Two engineers on the mobile team of an expense-management app, responsible for the "scan receipt" feature.

Situation: The feature finds the document quadrilateral in the camera frame (Canny, then contour extraction, then a four-corner polygon fit) and warps it flat for OCR, the classic pipeline.

Problem: Detection was tuned on office lighting with fixed thresholds (50, 150). In the field, dim restaurants and glary taxi seats pushed the gradient distribution far from the tuning conditions: the page-found rate measured 78 percent on a 12,000-image sample of real user captures, and support tickets said "the blue box never appears."

Decision: Three changes, all from this section: downscale to 480 pixels wide (less noise, faster, and document edges survive), explicit Gaussian pre-blur at $\sigma = 1.5$, and median-based auto-thresholds per frame instead of constants; plus a light dilation from Chapter 6 to close hairline gaps before contour extraction.

Result: Page-found rate rose to 96 percent on the same evaluation set; 95th-percentile latency was 28 milliseconds on a mid-range phone; the team deleted two per-market threshold overrides from the config system.

Lesson: Canny's thresholds are statistics, not constants. Any deployment that meets uncontrolled lighting should derive them from each frame's own histogram, which costs one median and removes an entire class of field failures.

Fun Fact

The detector the whole field uses was a master's thesis: John Canny worked it out at MIT in 1983, and the journal paper appeared in 1986. Canny then largely left edges behind for robot motion planning and computational geometry. The thesis-project-turned-eternal-default is a recurring pattern in vision; you will meet another one when SIFT appears in Chapter 10.

Research Frontier: Canny's Second Career as a Control Signal

Edge detection research continues (DiffusionEdge, AAAI 2024, generates single-pixel crisp edge maps with a diffusion model; RankED, CVPR 2024, tackles the pixel imbalance and annotator ambiguity with ranking losses; SCESAME, WACV 2024 workshops, derives zero-shot edges by ensembling Segment Anything outputs), but the headline 2024-2026 story is who consumes Canny maps now. ControlNet (Zhang et al., ICCV 2023, arXiv:2302.05543) made the Canny edge map one of the standard conditioning signals for text-to-image diffusion: draw or extract a contour map, and the generator must respect its geometry while inventing everything else. The Canny condition remains among the most-downloaded control adapters for Stable Diffusion and its successors, meaning a 1986 algorithm sits in the hot path of 2026's generative pipelines, a story Chapter 35 completes. The learned first-layer filters of Chapter 19 tell the complementary story: networks rediscover Canny's stages 1 and 2 on their own.

You Could Build This: A Canny-Conditioned Image Generator (Advanced, about 2 hours)

The Frontier callout above noted that Canny maps are now a standard control signal for diffusion. You can wire the two ends together yourself into a portfolio-grade demo: run cv2.Canny on a hand sketch or a reference photo to get the one-pixel contours this section taught you to compute, then feed that edge map as the conditioning image to a pretrained Canny ControlNet through the Hugging Face diffusers pipeline, and generate a fully shaded image that obeys your contour while inventing texture, color, and lighting from a text prompt. The interesting engineering is all on the edge side, which is this section's territory: the median-based auto-thresholds of Code Fragment 2 decide how much structure the generator must respect, and a light dilation (Chapter 6) thickens thin contours so the conditioning survives the diffusion model's downsampling. Sweep the two Canny thresholds and watch the generated image tighten or loosen against your sketch. This is a working preview of the controllable-generation pipeline built in full in Chapter 35, reachable today with a 1986 detector and a few lines of library code.

Exercise 9.2.1: Order Matters Conceptual

Explain what would go wrong if the pipeline's stages were reordered: (a) thresholding before non-maximum suppression, and (b) NMS after hysteresis instead of before. Then consider setting $T_{\text{low}} = T_{\text{high}}$: which of Section 9.1's two defects returns, and why does the ratio between the thresholds (rather than either absolute value) control the detector's tolerance for contrast dips along a contour?

Exercise 9.2.2: Measure the Rescue Rate Coding

Instrument the from-scratch implementation to report, for each image, the number of strong pixels, the number of weak pixels, and the number of weak pixels that hysteresis ultimately rescued. Run it over ten varied photographs at threshold ratios 2:1 and 3:1 and report the rescue fraction for each. Then replace the queue-based flood fill with cv2.connectedComponents on the union map (keep components that contain at least one strong pixel) and verify the outputs are identical while measuring the speedup.

Exercise 9.2.3: The Sigma Sweep Analysis

Using skimage.feature.canny, sweep $\sigma \in \{0.5, 1, 2, 4, 8\}$ on one detailed photograph. For each result, report the edge-pixel fraction and the number of connected components, and visually identify which real structures appear and disappear. Plot both metrics against $\sigma$ and annotate the plot with the detection-versus-localization trade-off from Canny's criteria: which $\sigma$ would you choose for (a) a texture-analysis task and (b) finding the silhouette of a single large object, and why?