Section 9.3: The Hough Transform: Lines & Circles

"I don't argue about where the line is. I hold an election and let every edge pixel vote. Turnout is excellent; the gaps never show up, and that's exactly the point."
A Hough Accumulator With Election-Night Energy

Big Picture

The Hough transform turns shape detection into an election: every edge pixel votes for every line (or circle) that could pass through it, and shapes that really exist collect landslides in a parameter-space ballot box called the accumulator. The genius of the scheme is what it does not require: no connectivity, no ordering, no complete contours. The Canny output of Section 9.2 can be gapped by occlusion, peppered with clutter, and shuffled into any order, and the votes still pile up in the same cells, which is why this 1962 idea remains the standard way to pull lines and circles out of edge maps, from lane markings to calibration targets.

The previous section ended with the best edge map classical vision can produce: thin, well-localized, largely connected. But an edge map is still just pixels; nothing in it says "these 1,200 pixels form one straight line." This section builds the machine that says exactly that, first for lines, then for circles, and along the way introduces a way of thinking (transform the problem into parameter space, accumulate evidence, read off the peaks) that the curve fitting of Section 9.4 will complement and the lane detector of Section 9.5 will put to work.

1. The Grouping Problem Intermediate

Consider the naive approach first, because its failure explains the design. To find lines, you might trace contours: start at an edge pixel, follow its neighbors, and check whether the path stays straight. Tracing dies on contact with reality. Real edge maps have gaps (a shadow interrupts the curb), junctions (the curb crosses a lamp post), and clutter (leaves everywhere); a tracer must decide at every gap and junction what to do, and each decision compounds. Worse, tracing is sequential: one bad hop poisons everything after it.

The Hough transform inverts the logic. Instead of asking "which pixels chain together?", it asks each pixel independently: "which lines could you belong to?" Every edge pixel answers with a set of candidate lines, all candidates receive one vote each, and the lines that many pixels agree on win. No pixel ever needs to know about any other pixel; agreement emerges in the tally. Gaps cost nothing but a few missing votes, and clutter votes incoherently, scattering its ballots across parameter space without electing anyone. The illustration below casts the whole scheme as an election night, with the real shape winning its ballot box by a landslide.

Many small orange edge-pixel characters drop ballots into boxes that stand for candidate lines; one box overflows with a landslide of votes and wears a winner's sash while others hold only stray ballots, depicting the Hough transform detecting a shape by letting every edge pixel vote independently. — Detection becomes an election: every edge pixel votes for the shapes it could belong to, gaps cost only a few absentee votes, and the real line wins by a landslide.

2. Lines as Points: The $(\rho, \theta)$ Parameter Space Intermediate

To vote for a line you need an address for every line. The schoolbook form $y = mx + b$ fails the job: vertical lines need $m = \infty$, and near-vertical lines need enormous $m$, so no finite, evenly spaced ballot box can hold them. Duda and Hart's 1972 fix, now universal, addresses each line by its normal form:

$$ \rho = x \cos\theta + y \sin\theta, $$

where $\theta \in [0, \pi)$ is the direction of the line's normal and $\rho$ is the line's signed perpendicular distance from the origin. Every line in the image has exactly one $(\rho, \theta)$ address, both coordinates bounded: $\theta$ by construction, $\rho$ by the image diagonal $D$, so $\rho \in [-D, D]$. Vertical lines are nothing special ($\theta = 0$); the pathology is gone.

Now the elegant part. Fix an edge pixel $(x_0, y_0)$ and ask which lines pass through it: all $(\rho, \theta)$ satisfying $\rho = x_0 \cos\theta + y_0 \sin\theta$, which is a sinusoid in the $(\theta, \rho)$ plane. One pixel, one sinusoid. And if several pixels are collinear, their sinusoids all pass through the single $(\rho, \theta)$ address of the shared line: collinearity in the image becomes intersection in parameter space. Figure 9.3.1 draws the duality; it is the picture to hold in mind whenever Hough voting comes up.

Figure 9.3.1 The Hough duality. Left: four collinear edge pixels in image space; the shared line is addressed by its normal angle $\theta$ and distance $\rho$ from the origin. Right: each pixel maps to a sinusoid of all the lines through it, and the four sinusoids intersect in exactly one accumulator cell, whose vote count equals the number of supporting pixels regardless of the gaps between them.

Fun Fact

Paul Hough was a physicist, and the 1962 patent ("Method and Means for Recognizing Complex Patterns") was about finding particle tracks in bubble-chamber photographs, millions of which needed scanning at mid-century physics labs. His version used the slope-intercept parameterization, infinities and all; the $(\rho, \theta)$ form and the name "Hough transform" came a decade later from Duda and Hart. The algorithm thus entered computer vision the same way many of its best ideas did: sideways, from physics.

3. Building the Accumulator From Scratch Advanced

Implementation is bookkeeping. Discretize parameter space into bins (say 1 degree by 1 pixel; for a 640×480 image that is 180 columns and about 1,600 rows, under 300k cells), give every edge pixel a vote in each $\theta$ column at the $\rho$ its sinusoid passes through, and find cells whose counts are both large and locally maximal. (Why insist on locally maximal, not simply the few cells with the highest counts? Because of peak spreading, the next subsection's topic: one real line deposits a cluster of high counts in neighboring cells, so the top few cells can all belong to the same line. Requiring a local maximum keeps one winner per line instead of several copies of the loudest one.) The code below vectorizes the votes with NumPy and verifies the whole idea on a synthetic image where we know the answer.

# Vectorized Hough line transform: every edge pixel deposits one vote per
# theta column along its (rho, theta) sinusoid, and the accumulator's
# tallest cell is the most-supported line. Verified on a synthetic image.
import cv2
import numpy as np

def hough_lines_scratch(edges, d_theta=np.pi / 180, d_rho=1.0):
    ys, xs = np.nonzero(edges)                     # coordinates of edge pixels
    thetas = np.arange(0, np.pi, d_theta)
    diag = int(np.hypot(*edges.shape))
    # each pixel's rho at every theta: one sinusoid per row
    rho = xs[:, None] * np.cos(thetas) + ys[:, None] * np.sin(thetas)
    bins = np.round((rho + diag) / d_rho).astype(int)
    acc = np.zeros((int(2 * diag / d_rho) + 1, len(thetas)), np.int32)
    for j in range(len(thetas)):                   # tally one theta column at a time
        acc[:, j] = np.bincount(bins[:, j], minlength=acc.shape[0])
    return acc, thetas, diag

# Synthetic test: one known line plus 0.5% salt noise
img = np.zeros((200, 300), np.uint8)
cv2.line(img, (20, 180), (280, 40), 255, 1)
rng = np.random.default_rng(3)
edges = (img > 0) | (rng.random((200, 300)) > 0.995)

acc, thetas, diag = hough_lines_scratch(edges)
r, t = np.unravel_index(acc.argmax(), acc.shape)
print(f"line pixels drawn: {(img > 0).sum()}, peak votes: {acc.max()}")
print(f"detected theta = {np.rad2deg(thetas[t]):.1f} deg, rho = {r - diag} px")
# Representative output:
# line pixels drawn: 261, peak votes: 247
# detected theta = 61.7 deg, rho = 168 px

Code Fragment 1: A vectorized from-scratch Hough transform: hough_lines_scratch tallies votes one theta column at a time with np.bincount, and on a synthetic image acc.argmax() recovers the drawn line's normal angle (61.7 degrees) and distance, with the 300 salt-noise pixels contributing only scattered single votes.

Two numbers in the output deserve a look. The peak collects 247 of the 261 line pixels: a few votes leak into neighboring cells because the discrete line's pixels do not sit exactly on one mathematical line, a phenomenon called peak spreading that gets worse as bins shrink. That is the accumulator's core resolution trade-off: coarse bins merge nearby distinct lines; fine bins smear one line's votes across several cells and can drop its peak below threshold. Practical implementations blur the accumulator lightly or use a coarse-to-fine pass; OpenCV's cv2.HoughLines exposes the bin sizes directly as its rho and theta arguments. The salt noise, meanwhile, added 300 edge pixels and changed nothing, which is the robustness the voting design bought us.

Common Misconception: A Tall Peak Means a Long, Solid Line

Because the accumulator counts votes, it is tempting to read a tall peak as "a long, continuous line" and to expect cv2.HoughLines to hand back segments you can draw. Both are wrong, and both bite in practice. The classical transform returns infinite lines, bare $(\rho, \theta)$ pairs with no endpoints: a peak says only that many pixels are collinear, never where the paint starts or stops. And vote count measures collinear pixel count, not extent: a hundred sparse, gappy pixels strung across the whole image outvote a short but solid stripe, so a dashed lane marking and a dotted trail of clutter can produce identical peaks. If you need drawable segments with real endpoints, or you care about how long and how connected a line actually is, use the probabilistic variant below (cv2.HoughLinesP), which walks the image to recover each run of supporting pixels.

4. The Practical Variant: Probabilistic Hough Intermediate

The classical transform has two operational annoyances: it returns infinite lines (a $(\rho, \theta)$ pair says nothing about where the paint starts and stops), and it spends votes on every edge pixel even when a fraction would do. The progressive probabilistic Hough transform (Matas, Galambos, and Kittler, 2000), implemented as cv2.HoughLinesP, fixes both: it samples edge pixels at random, and the moment a candidate line accumulates enough votes it walks along that line in the image, collecting the actual run of supporting pixels into a segment with endpoints, then removes those pixels from the pool so they cannot vote again. You get exactly what downstream code wants, drawable segments, for less computation.

# Compare the two OpenCV line detectors on one edge map: the classical
# transform returns infinite (rho, theta) lines, while the probabilistic
# variant returns finite, drawable segments with explicit endpoints.
import cv2
import numpy as np

gray = cv2.imread("corridor.jpg", cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(gray, (0, 0), 1.4)
edges = cv2.Canny(blur, 60, 150)                       # Section 9.2's recipe

# Classical: infinite lines as (rho, theta) pairs
lines = cv2.HoughLines(edges, rho=1, theta=np.pi / 180, threshold=150)

# Probabilistic: finite segments as (x1, y1, x2, y2)
segs = cv2.HoughLinesP(edges, rho=1, theta=np.pi / 180, threshold=60,
                       minLineLength=40,   # discard stubs shorter than this
                       maxLineGap=8)       # bridge breaks up to this many px

print(f"classical: {0 if lines is None else len(lines)} infinite lines")
print(f"probabilistic: {0 if segs is None else len(segs)} segments")
vis = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)
for x1, y1, x2, y2 in segs[:, 0]:
    cv2.line(vis, (x1, y1), (x2, y2), (0, 255, 0), 2)
# Representative output:
# classical: 18 infinite lines
# probabilistic: 64 segments

Code Fragment 2: The two OpenCV line detectors on a corridor photograph: cv2.HoughLines returns 18 dominant infinite lines (one per wall seam and floor tile direction), while cv2.HoughLinesP returns 64 finite drawable segments, with minLineLength=40 filtering stubs and maxLineGap=8 bridging breaks left by the edge detector.

Library Shortcut: 30 Lines of Accumulator Become One Call

Our from-scratch transform is about 30 lines and still lacks peak non-maximum suppression, multi-line extraction, and segment endpoints. cv2.HoughLines and cv2.HoughLinesP are one call each: internally they manage the accumulator's memory, suppress non-maximal peaks, run the progressive sampling and pixel-removal logic of the probabilistic variant, and do it all in C++ at video rate. skimage.transform.hough_line plus hough_line_peaks is the readable middle ground, returning the raw accumulator so you can visualize exactly the picture in Figure 9.3.1 on your own images. A 30-to-1 line-count reduction, and several-hundred-fold on speed.

Key Insight: Voting Buys Robustness That Fitting Cannot

The Hough transform never computes a residual and never averages anything, and that is precisely why outliers cannot hurt it: clutter votes for the wrong cells, and wrong cells simply lose the election. Compare the least-squares fit coming in Section 9.4, where every point, including every outlier, tugs on the answer. The price of the robustness is precision (an answer quantized to bin width) and scaling (one accumulator dimension per shape parameter). This complementarity sets up the chapter's standard two-step: detect coarsely by voting, then measure precisely by fitting the voters.

5. Circles and the Gradient Trick Advanced

A circle has three parameters, center $(a, b)$ and radius $r$, satisfying $(x - a)^2 + (y - b)^2 = r^2$. The voting recipe generalizes immediately and then immediately becomes expensive: a 3D accumulator for a 640×480 image with radii up to 100 already holds roughly 30 million cells, and each edge pixel must vote on a whole cone of (center, radius) combinations. The standard escape, implemented in OpenCV's HOUGH_GRADIENT method, reuses a fact we have been carrying since Section 9.1: at any point on a circle's rim, the gradient points along the radius, straight at (or away from) the center. So instead of voting for every possible center, each edge pixel votes only for the cells along its gradient ray, collapsing the 3D election into a 2D center-finding election followed by a cheap 1D radius histogram per candidate center. Figure 9.3.2 shows the geometry.

Figure 9.3.2 The gradient trick for circles. Gradient rays from rim pixels (orange) all pass through the center, so center votes concentrate without ever enumerating radii; a clutter pixel's ray (gray) wanders harmlessly. The radius is recovered afterward from the distances of the supporting pixels.

OpenCV 4.x ships two versions: the legacy HOUGH_GRADIENT and, since version 4.3, HOUGH_GRADIENT_ALT, a rewrite with markedly better precision whose param2 becomes an interpretable "circle perfectness" score in $[0, 1]$ rather than a raw vote threshold.

# Detect coins as circles with the gradient-trick accumulator. Each rim
# pixel votes only along its gradient ray, so HOUGH_GRADIENT_ALT finds
# centers cheaply and recovers each radius from the supporting pixels.
import cv2
import numpy as np

gray = cv2.imread("coins.jpg", cv2.IMREAD_GRAYSCALE)
gray = cv2.medianBlur(gray, 5)            # median: kills speckle, keeps rims

circles = cv2.HoughCircles(
    gray, cv2.HOUGH_GRADIENT_ALT, dp=1.5,  # accumulator at 1/1.5 resolution
    minDist=30,                            # centers closer than this merge
    param1=300,                            # upper gradient threshold (Scharr)
    param2=0.9,                            # circle "perfectness", 0..1
    minRadius=20, maxRadius=80)

if circles is not None:
    for x, y, r in np.uint16(np.around(circles[0])):
        print(f"center ({x:4d},{y:4d})  radius {r:3d}")
# Representative output (six coins on a desk):
# center ( 142, 210)  radius  46
# center ( 305, 188)  radius  52
# center ( 251, 341)  radius  46
# center ( 433, 296)  radius  38
# center ( 118, 388)  radius  52
# center ( 391, 121)  radius  39

Code Fragment 3: Circle detection with HOUGH_GRADIENT_ALT on a coins photograph: cv2.medianBlur pre-filtering protects the rims, minDist=30 stops one coin from being reported twice, and the interpretable param2=0.9 demands near-complete, well-formed circles, returning one center and radius per coin.

Circle voting earns its keep across an unreasonable range of jobs: pupils and irises, drill holes on printed circuit boards, pipeline cross-sections, agar-plate colonies, and the circle-grid calibration targets that cv2.findCirclesGrid locates for the camera calibration of Chapter 12. Beyond circles, Ballard's 1981 generalized Hough transform extends voting to arbitrary template shapes by storing, for each gradient orientation, the displacement vectors toward a reference point; it trades the analytic shape equation for a lookup table and reappears conceptually in implicit-shape-model detectors. And when lines themselves are the product, the strongest classical competitor is the Line Segment Detector (LSD), which grows segments from aligned-gradient regions rather than voting; it is worth knowing as the non-Hough alternative on cluttered scenes.

You Could Build This: A Coin and Parts Counter (Beginner, about 30 minutes)

The circle accumulator is enough on its own for a satisfying end-to-end tool. Point a phone camera at coins, washers, or pills scattered on a contrasting background; run cv2.medianBlur then cv2.HoughCircles with HOUGH_GRADIENT_ALT exactly as in Code Fragment 3, draw each detected circle, and print the count. Bucketing the returned radii with np.histogram turns it into a denomination sorter (a quarter's rim is reliably larger than a dime's) or a quality check that flags any washer whose radius falls outside a tolerance band. This is the same machine vision counting task that runs on pharmaceutical blister-pack and fastener-bin inspection lines, and it uses nothing beyond this section: median pre-filtering to protect the rims, minDist to stop double-counting, and the interpretable param2 perfectness score to reject partial circles. Calibrate one known coin diameter to pixels and your counts come out in millimeters.

Practical Example: Row Guidance Without GPS

Who: The perception engineer at an agricultural robotics startup retrofitting autonomous row-following onto orchard sprayers.

Situation: Under dense canopy, RTK-GPS drops out for minutes at a time; the sprayer must keep following the crop row using its forward camera.

Problem: Crop rows are not crisp lines: they are ragged stripes of vegetation with gaps, weeds, and shadow dapple, exactly the conditions that break contour tracing and exhaust hand-tuned heuristics.

Decision: A voting pipeline. Segment vegetation with an excess-green index ($2G - R - B$) thresholded per frame using the histogram methods of Chapter 2, clean the binary mask with one morphological opening from Chapter 6, then run cv2.HoughLines with the $\theta$ range restricted to ±20 degrees of the expected row direction (votes outside the plausible range are never even counted), and median-filter the winning $(\rho, \theta)$ over 15 frames.

Result: Lateral RMS error of 4.1 centimeters at 8 km/h over 14 kilometers of test rows, on an embedded board with most of a 90 fps budget to spare; the gaps and weeds that defeated the heuristic tracker cost the accumulator only a few percent of its votes.

Lesson: The accumulator is a budget, and domain knowledge is free money: restricting $\theta$ to the physically possible range cut compute by a factor of nine and eliminated an entire family of false detections (shadow lines near-perpendicular to the rows) before they could vote.

Research Frontier: Voting Survives, With Learned Ballots

The 2023-2026 line-detection literature is largely hybrids that keep the classical extraction and learn the evidence under it. DeepLSD (Pautrat et al., CVPR 2023, arXiv:2212.07766) trains a network to output a cleaned gradient-like field and then runs classical line-segment extraction on top, beating both pure-classical and pure-learned baselines. GlueStick (Pautrat et al., ICCV 2023) carries detected line segments into the matching stage, treating them as first-class features alongside the keypoints of Chapter 10, and line features are increasingly standard in the SLAM systems of Chapter 14, where corridors and window frames outlive textureless walls. On the fitting side, PARSAC (Kluger and Rosenhahn, AAAI 2024, arXiv:2401.14919) learns to propose which points belong together so that many lines or planes can be recovered in parallel, a neural sibling of the accumulator's simultaneous multi-shape tally. The voting idea itself, propose from evidence and let consensus decide, is now textbook design vocabulary far beyond classical vision.

Exercise 9.3.1: Parameter Space Bookkeeping Conceptual

(a) Explain why $\theta$ only needs the range $[0, \pi)$ rather than $[0, 2\pi)$, given that $\rho$ is signed. (b) A colleague proposes detecting ellipses with a direct Hough transform. An ellipse has five parameters; estimate the accumulator size at 1-pixel and 1-degree resolution for a 640×480 image and explain why nobody does this. (c) Which two ideas from this section and from Section 9.4 combine into the practical answer for ellipses?

Exercise 9.3.2: Two Families of Lines Coding

Photograph or download a chessboard image and run the full pipeline: Canny (Section 9.2's auto-threshold recipe), then cv2.HoughLines. Cluster the resulting $(\rho, \theta)$ pairs by $\theta$ into two perpendicular families (k-means with $k = 2$ on the angle, or a simple histogram split), draw each family in its own color, and count the lines per family. Compare your counts against the board's true 9+9 grid lines and diagnose any misses or duplicates in terms of peak spreading and threshold choice.

Exercise 9.3.3: Stress-Testing the Election Analysis

Using the from-scratch accumulator from this section, run a controlled study on synthetic images: one line of 200 pixels, plus salt noise swept from 0 percent to 5 percent of pixels, 20 trials each. Plot (a) the peak's vote count, (b) the angular error of the detected line, and (c) the height of the tallest spurious peak, all versus noise density. At what noise level does a spurious peak first beat the true line, and how does doubling the bin sizes shift that crossover? Explain both observations with the peak-spreading trade-off.