Part I: Image Processing
Chapter 6: Morphology, Binary Images & Shape

Contours, Moments & Shape Descriptors

"Translate me, rotate me, shrink me to a thumbnail: my seven numbers will not budge. Hold me up to a mirror, though, and exactly one of them changes its mind."

A Defiantly Invariant Hu Moment
Big Picture

A blob's boundary carries nearly everything worth knowing about its shape, and this section compresses that boundary, step by deliberate step, into a handful of numbers that do not change when the object slides, turns, or shrinks, so that recognizing a shape becomes comparing two short vectors. The pipeline runs: trace the boundary into a polygon with a hierarchy of holes, measure the polygon (perimeter, area, vertices, convexity), then distill it through the moment invariance chain into descriptors. At the end we assemble the chapter's promised payoff: a complete, auditable shape classifier in a page of code.

Section 6.5 collapsed shapes onto their centerlines; this final section walks their edges. We have met boundaries twice already: the morphological gradient of Section 6.3 produced a rim of boundary pixels, and Section 6.4's regionprops quietly consumed boundaries to report perimeters and orientations it did not explain. Both left the boundary as an unordered set. The upgrade in this section is order: a contour is the boundary traced in sequence, a closed polygon you can walk, simplify, integrate over, and compare. Order is what turns a rim into geometry, and geometry into recognition, completing the journey this chapter began when a threshold from Chapter 2 first split the world into foreground and background.

1. From Pixel Sets to Boundaries: Border Following Intermediate

The standard tracing algorithm is Suzuki and Abe's border following (1985), the engine inside cv2.findContours. It raster-scans the binary image for a foreground pixel whose left neighbor is background (a border start), then walks counterclockwise around the component, keeping foreground on one side, until it returns to the start. Crucially, it does this for every border in the image, outer boundaries and hole boundaries alike, and records who surrounds whom. The connectivity conventions of Section 6.1 are baked in exactly as Rosenfeld prescribed: 8-connected foreground borders, 4-connected holes, which is why a diagonal chain of pixels traces as one contour rather than many.

The "who surrounds whom" record is the contour hierarchy, and choosing how much of it to keep is the first API decision. cv2.RETR_EXTERNAL keeps only outermost boundaries (fastest, blind to holes); cv2.RETR_LIST returns every contour with no structure; cv2.RETR_CCOMP arranges them into exactly two levels, outer boundaries and their holes, which is the workhorse mode for parts on a tray; and cv2.RETR_TREE keeps the full nesting tree, which a washer lying inside a larger ring genuinely needs. The companion flag cv2.CHAIN_APPROX_SIMPLE compresses straight runs of boundary pixels to their endpoints (a rectangle becomes 4 points instead of hundreds) while cv2.CHAIN_APPROX_NONE keeps every boundary pixel. Figure 6.6.1 shows a three-deep scene and the tree that RETR_TREE builds from it, including the four-integer record [next, previous, first_child, parent] that OpenCV returns per contour.

One scene, four borders C0 outer C1 hole C2 island C3 outer even nesting depth = object border, odd = hole border RETR_TREE hierarchy C0 [3,-1, 1,-1] C3 [-1, 0,-1,-1] next C1 (hole) [-1,-1, 2, 0] child C2 [-1,-1,-1, 1] child [next, prev, child, parent]
Figure 6.6.1 Border following finds every boundary and records the nesting. Left: a plate (C0) with a hole (C1) containing an island (C2), beside a separate disc (C3). Right: the RETR_TREE hierarchy as OpenCV encodes it, one [next, previous, first_child, parent] quadruple per contour; C0 and C3 are siblings at depth 0, C1 is C0's child, C2 is C1's child. Nesting depth parity tells you instantly whether a contour bounds an object or a hole.

The code below traces the scene of Figure 6.6.1 and walks the hierarchy, computing each contour's depth by following parent pointers. Depth parity then classifies every contour without any geometry at all, an idea Section 6.4's Euler number expressed as a single global count and the hierarchy now itemizes:

import cv2

bw = cv2.imread("plate_scene.png", cv2.IMREAD_GRAYSCALE)
contours, hierarchy = cv2.findContours(bw, cv2.RETR_TREE,
                                       cv2.CHAIN_APPROX_SIMPLE)
print(f"{len(contours)} contours found")
for i, (nxt, prev, child, parent) in enumerate(hierarchy[0]):
    depth, p = 0, parent
    while p != -1:                      # climb parent links to the root
        depth += 1
        p = hierarchy[0][p][3]
    kind = "object border" if depth % 2 == 0 else "hole border"
    print(f"C{i}: depth {depth} ({kind:13s}) "
          f"{len(contours[i]):4d} pts  child={child:2d} parent={parent:2d}")
# Output:
# 4 contours found
# C0: depth 0 (object border )  482 pts  child= 1 parent=-1
# C1: depth 1 (hole border   )  168 pts  child= 2 parent= 0
# C2: depth 2 (object border )   62 pts  child=-1 parent= 1
# C3: depth 0 (object border )  139 pts  child=-1 parent=-1
Walking the RETR_TREE hierarchy of Figure 6.6.1's scene: each contour's nesting depth is recovered by climbing parent pointers, and depth parity alone separates object borders (even) from hole borders (odd), no pixel geometry required.

2. Contour Geometry: Perimeter, Area & Polygon Simplification Beginner

Once a boundary is an ordered polygon, classical geometry applies verbatim. cv2.contourArea evaluates the shoelace formula (Green's theorem specialized to polygons):

$$ A \;=\; \frac{1}{2} \,\Bigl|\, \sum_{i=0}^{n-1} \bigl( x_i\, y_{i+1} - x_{i+1}\, y_i \bigr) \Bigr|, $$

which is why its answer differs slightly from the pixel count Section 6.4 reported as area: one integrates a polygon threaded through pixel centers, the other counts squares. The discrepancy is worth a digit of caution on small blobs and nothing more. Perimeter via cv2.arcLength deserves more caution: summing pixel steps along a staircase systematically overestimates the smooth boundary length (a digital diagonal line of length $L$ measures up to about 8 percent long), so any spec written in millimeters of perimeter should be calibrated against parts of known size rather than trusted absolutely. Derived ratios inherit the bias; we return to this in Exercise 6.6.3.

The third primitive is simplification. The Douglas-Peucker algorithm inside cv2.approxPolyDP recursively keeps only vertices that deviate from a straight chord by more than a tolerance $\varepsilon$, typically set as a fraction of the perimeter. It converts a noisy 400-point boundary into the 4-point quadrilateral that the document scanner of Chapter 5 hunted for, or the 6-point hexagon that will identify a nut in Section 5 below. Together with the convex hull (cv2.convexHull) these produce the working measurements of contour analysis:

import cv2
import numpy as np

contours, _ = cv2.findContours(bw, cv2.RETR_EXTERNAL,
                               cv2.CHAIN_APPROX_SIMPLE)
c = max(contours, key=cv2.contourArea)            # largest object

area  = cv2.contourArea(c)                        # shoelace / Green's theorem
perim = cv2.arcLength(c, closed=True)
approx = cv2.approxPolyDP(c, 0.02 * perim, closed=True)
hull   = cv2.convexHull(c)

print(f"area {area:.0f} px^2, perimeter {perim:.1f} px")
print(f"vertices after Douglas-Peucker: {len(approx)}")
print(f"circularity {4*np.pi*area/perim**2:.3f}, "
      f"solidity {area/cv2.contourArea(hull):.3f}")
# Representative output for a slightly dog-eared rectangular label:
# area 50412 px^2, perimeter 916.4 px
# vertices after Douglas-Peucker: 4
# circularity 0.754, solidity 0.988
The contour-geometry starter kit on one object: shoelace area, staircase perimeter, Douglas-Peucker simplification at 2 percent of perimeter (recovering the label's 4 corners despite boundary noise), and two first descriptors, circularity and solidity, computed from the same ingredients.

3. Moments & the Invariance Chain Intermediate

Geometry measures a shape as placed; recognition needs numbers that survive placement. The instrument is the family of image moments, borrowed directly from mechanics: treat the foreground as a sheet of unit mass and compute

$$ m_{pq} \;=\; \sum_{(x,\,y) \in A} x^p\, y^q . $$

The zeroth moment $m_{00}$ is the area; the first moments give the centroid $\bar x = m_{10}/m_{00}$, $\bar y = m_{01}/m_{00}$, the formula Section 6.4 promised we would derive. Second moments describe how mass spreads around the centroid, and from them comes the orientation that regionprops reports: the major axis of the best-fit ellipse sits at

$$ \theta \;=\; \tfrac{1}{2}\, \operatorname{atan2}\!\bigl( 2\mu_{11},\; \mu_{20} - \mu_{02} \bigr), $$

where the $\mu_{pq}$ are the central moments, the same sums computed about the centroid, $\mu_{pq} = \sum (x - \bar x)^p (y - \bar y)^q$. Centering is the first rung of a ladder that Figure 6.6.2 lays out in full. Central moments do not change when the object translates. Dividing by the right power of area, $\eta_{pq} = \mu_{pq} / m_{00}^{\,1 + (p+q)/2}$, yields normalized central moments that also ignore scale. The final rung is Hu's 1962 construction: seven polynomial combinations of the $\eta_{pq}$ up to third order that are additionally unchanged by rotation. The first two are

$$ \phi_1 = \eta_{20} + \eta_{02}, \qquad \phi_2 = (\eta_{20} - \eta_{02})^2 + 4\eta_{11}^2, $$

and five more follow in the same spirit. The seventh is special: it is a skew invariant whose magnitude survives reflection but whose sign flips when the shape is mirrored, because mirroring negates every central moment with an odd power of one coordinate, and $\phi_7$ alone is built from an odd combination. One bit of asymmetry survives all that quotienting, and Section 5's practical example will spend exactly that bit.

The invariance chain: each stage forgets one nuisance raw moments m_pq everything matters subtract centroid central μ_pq translation forgotten divide by m00^(1+(p+q)/2) normalized η_pq + scale forgotten 7 polynomial combinations Hu invariants φ_1 .. φ_7 + rotation forgotten what survives: pure shape, plus one bit of handedness, the sign of φ_7, which flips under mirroring read θ (orientation) and the centroid off the earlier stages before the chain forgets them
Figure 6.6.2 The moment invariance chain. Raw moments depend on position, scale, and rotation; subtracting the centroid forgets translation, normalizing by a power of area forgets scale, and Hu's seven polynomial combinations forget rotation. Each stage discards a nuisance variable on purpose, and quantities you still need (the centroid as a pick-point, $\theta$ as a grasp angle) must be read off before the stage that destroys them.

Hu invariants have a wide dynamic range ($\phi_1$ might be $10^{-3}$ while $\phi_7$ is $10^{-21}$), so in practice everyone compares them on a signed log scale, $h_i = -\operatorname{sign}(\phi_i)\,\log_{10}\lvert\phi_i\rvert$. The demonstration below puts the whole chain to the test: a wrench mask is rotated 37 degrees and shrunk to 60 percent, then mirrored, and the log-scale invariants are printed for all three. Watch the last column:

import cv2
import numpy as np

def log_hu(mask):
    """Seven Hu invariants on a signed log10 scale."""
    hu = cv2.HuMoments(cv2.moments(mask, binaryImage=True)).flatten()
    return -np.sign(hu) * np.log10(np.abs(hu) + 1e-30)

mask = cv2.imread("wrench_mask.png", cv2.IMREAD_GRAYSCALE)
h, w = mask.shape
M = cv2.getRotationMatrix2D((w/2, h/2), angle=37, scale=0.6)
moved    = cv2.warpAffine(mask, M, (w, h))    # rotated + scaled copy
mirrored = cv2.flip(mask, 1)                  # left-right reflection

for name, m in [("original", mask), ("rot 37, scale 0.6", moved),
                ("mirrored", mirrored)]:
    print(f"{name:18s}", np.round(log_hu(m), 2))
# Representative output (phi_1 .. phi_7, log scale):
# original           [ 2.95  6.61  9.69 10.31 20.43 13.66 -20.65]
# rot 37, scale 0.6  [ 2.95  6.60  9.70 10.33 20.47 13.66 -20.70]
# mirrored           [ 2.95  6.61  9.69 10.31 20.43 13.66  20.65]
The invariance chain verified experimentally: rotating and rescaling the wrench perturbs the seven log-scale Hu values only in the second decimal (discretization noise from resampling), while mirroring leaves six of them untouched and flips precisely the sign of $\phi_7$, the one skew invariant that remembers handedness.
Key Insight: Invariance Is a Budget You Spend Deliberately

Every rung of the chain in Figure 6.6.2 destroys information, and that is its job: position, scale, and orientation are nuisances when classifying, so you quotient them out. But they are the answer when gripping or aligning, so a real pipeline reads the centroid and $\theta$ off the early rungs, then climbs the rest of the ladder for recognition. And sometimes an invariance is a bug: a classifier built on $\phi_1$ through $\phi_6$ (or on matchShapes, which compares magnitudes) is constitutionally incapable of telling a part from its mirror image. Choose which transformations your numbers should forget; do not let the library choose for you.

Fun Fact

Hu published his seven invariants in 1962, and for decades they were taught, implemented, and shipped as a canonical set. It took until 2000 for Jan Flusser to prove the set is neither independent ($\phi_3$ is expressible from the others) nor complete. Thirty-eight years of production use before anyone audited the algebra; the fix (Flusser's complete invariant systems) exists, but the seven originals remain the default in every library, a monument to "good enough, shipped first."

4. A Descriptor Vector for Every Blob Beginner

Moments are one family of shape numbers; the geometry of Section 2 supplies the rest. In practice a small, interpretable descriptor vector covers an enormous range of industrial decisions. Circularity, $4\pi A / P^2$, is 1 for a perfect disk and falls as the boundary wrinkles or elongates. Aspect ratio from cv2.minAreaRect captures elongation without caring about orientation. Extent (area over bounding-box area) separates crosses from blobs; solidity (area over convex-hull area) drops the moment a bite, crack, or concavity appears; the Douglas-Peucker vertex count reads off polygonal class directly (4 means quadrilateral, 6 means hexagon); and the hierarchy's hole count from Section 1 is the topological feature no amount of boundary noise can fake. Each is one or two lines on top of code we have already written, and each has a failure mode you can state in a sentence, which is exactly what makes the vector auditable.

When classes are too subtle for thresholds on individual descriptors, the next step up is comparing whole invariant vectors against labeled templates, which is nearest-neighbor classification in descriptor space and the conceptual ancestor of everything in Chapter 16. OpenCV ships the comparison ready-made:

Library Shortcut: cv2.matchShapes Is a Shape Metric in One Line

Hand-rolling template comparison means computing moments, building Hu vectors, applying the signed log map, guarding against zero moments, and choosing a distance: about 30 lines of fussy code. cv2.matchShapes(c1, c2, cv2.CONTOURS_MATCH_I1, 0) is one line, a 30-to-1 reduction, and handles all of it internally: it forms $m_i = \operatorname{sign}(\phi_i) \log_{10} \lvert \phi_i \rvert$ for both shapes and returns $\sum_i \lvert 1/m_i^A - 1/m_i^B \rvert$ (methods I2 and I3 are alternative formulas on the same ingredients). Near-zero is a match. It accepts contours or whole grayscale images, and because it compares magnitudes only, it is mirror-blind by design; when handedness matters, check the sign of $\phi_7$ yourself as in Section 3.

5. The Complete Shape Classifier Intermediate

Now the payoff the chapter overview promised: a complete shape classifier in a page of code, every stage of which this chapter built. Threshold with Otsu (Chapter 2), clean with an opening (Section 6.3), trace with hierarchy (Section 1), describe (Sections 2 to 4), and decide with rules a quality auditor can read aloud. The parts tray contains screws, washers, and hex nuts; topology and three descriptors are enough to tell them apart:

import cv2
import numpy as np

def describe(c):
    """Descriptor vector for one outer contour."""
    area  = cv2.contourArea(c)
    perim = cv2.arcLength(c, True)
    circularity = 4 * np.pi * area / perim**2 if perim > 0 else 0.0
    solidity = area / cv2.contourArea(cv2.convexHull(c))
    (w, h) = cv2.minAreaRect(c)[1]                  # oriented box sides
    aspect = max(w, h) / max(min(w, h), 1e-6)
    verts  = len(cv2.approxPolyDP(c, 0.02 * perim, True))
    return dict(area=area, circularity=circularity,
                solidity=solidity, aspect=aspect, vertices=verts)

def classify(d, has_hole):
    if has_hole and d["circularity"] > 0.80:        # round ring
        return "washer"
    if has_hole and 5 <= d["vertices"] <= 7:        # hexagonal ring
        return "nut"
    if not has_hole and d["aspect"] > 2.5:          # long and solid
        return "screw"
    return "reject"                                 # fail loud, not plausible

img = cv2.imread("tray.png", cv2.IMREAD_GRAYSCALE)
_, bw = cv2.threshold(img, 0, 255,
                      cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
se = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
bw = cv2.morphologyEx(bw, cv2.MORPH_OPEN, se)       # de-speckle first

contours, hier = cv2.findContours(bw, cv2.RETR_CCOMP,
                                  cv2.CHAIN_APPROX_SIMPLE)
counts = {}
for i, c in enumerate(contours):
    if hier[0][i][3] != -1:          # skip hole borders themselves
        continue
    if cv2.contourArea(c) < 200:     # dust gate
        continue
    has_hole = hier[0][i][2] != -1   # any child contour means a hole
    label = classify(describe(c), has_hole)
    counts[label] = counts.get(label, 0) + 1
print(counts)
# Representative output on a 24-piece kit tray:
# {'screw': 12, 'washer': 8, 'nut': 4}
The chapter's capstone in one page: Otsu threshold, morphological opening, two-level contour retrieval, a five-number descriptor per outer contour, and four readable rules that sort a fastener tray into screws, washers, and nuts, with topology (the hole) doing the heaviest lifting and an explicit reject class for everything else.

Three habits in this code repay study. The reject class exists because a rule-based system should fail loudly on the unexpected, never shrug a foreign object into the nearest class. The hole feature comes from the hierarchy, not from any intensity heuristic, so grease stains cannot fake it. And every threshold (0.80, 2.5, 200) traces to a measurable property of golden samples, so recalibrating for a new part is a measurement session, not a retraining run. When classes finally overlap in every descriptor you can write down, that is the boundary of this method and the on-ramp to the learned features of Chapter 10 and the classifiers of Chapter 16.

Practical Example: The Two Brackets That Were the Same Shape

Who: A vision engineer at a furniture-hardware supplier whose kitting line packs door-hinge sets: screws, washers, and a left-hand and a right-hand mounting bracket per kit.

Situation: A camera over a backlit tray verified each kit before sealing, using a descriptor classifier much like this section's. Screws and washers sorted perfectly. The two brackets did not: kits kept shipping with two left brackets, and the resulting installer complaints were the warranty department's top line item.

Problem: The brackets are exact mirror images. Area, perimeter, circularity, solidity, aspect ratio, vertex count, $\phi_1$ through $\phi_6$, and cv2.matchShapes are all reflection-blind by construction, so every number the system computed was identical for both parts. The team's first instinct, training a small CNN, stalled on explainability requirements and a 5 ms cycle budget.

Decision: Use the one reflection-sensitive number in the standard toolkit: the sign of $\phi_7$. They verified on 2,000 golden samples that $\lvert\phi_7\rvert$ for this bracket sits far from zero (the shape is strongly chiral), then added a single rule: positive sign means right-hand, negative means left-hand, magnitude below a guard threshold means reject for manual inspection.

Result: Left/right separation went to 100 percent on a 50,000-part audit at no measurable cycle-time cost, the guard threshold caught a handful of bent (and therefore nearly symmetric) brackets as a free bonus, and the auditors signed off on a decision rule that fits in one sentence.

Lesson: Invariances are design choices. The team did not need a richer model; they needed one number that deliberately kept the asymmetry every other number had been engineered to forget.

Research Frontier: Contours and Topology Inside the Learning Loop

This section's objects are thriving inside modern pipelines. SAM 2 (Ravi et al., 2024, arXiv:2408.00714) made promptable segmentation a commodity, and the masks it emits are post-processed in production with exactly this section's toolkit: findContours, hole-aware hierarchies, polygon simplification for annotation formats, and descriptor gates to reject implausible masks; the measurement layer of Chapter 24's models is this chapter. Topology is becoming a training signal too: Betti-matching losses (Stucki et al., ICML 2023, with an efficient 3D extension in 2024) penalize a segmentation for predicting the wrong number of holes or components, differentiating the very quantities our hierarchy walk counted. And orientation-from-moments has a learned descendant: oriented-bounding-box heads in Ultralytics YOLO11 (2024) regress per-object angles end to end for aerial and industrial imagery, while the broader handcrafted-descriptor program of this section is carried forward by the learned representations of Chapter 25, which trade auditability for breadth.

Exercise 6.6.1: One Bit of Handedness Conceptual

Show algebraically why mirroring an image about the $y$ axis negates every central moment $\mu_{pq}$ with odd $p$ and leaves the rest unchanged, and use this to argue that $\phi_1$ through $\phi_6$ are reflection-invariant while $\phi_7$ flips sign. Then explain why cv2.matchShapes cannot distinguish a part from its mirror image, and describe two industrial situations from this chapter's examples where that blindness is harmless and one where it ships the wrong part.

Exercise 6.6.2: An Alphabet Retrieval Engine Coding

Render the 26 uppercase letters as binary masks with cv2.putText at a large font size. Build a template library of log-scale Hu vectors, then query it with rotated (random angles) and rescaled (0.5x to 2x) versions of each letter using nearest neighbor on the Hu vectors, and separately with cv2.matchShapes. Report the confusion matrix for each method. Which letter pairs collapse under rotation invariance (consider N and Z, M and W under mirroring), and which descriptor from Section 4 would you add to break each tie?

Exercise 6.6.3: The Circularity Ceiling Analysis

Rasterize perfect disks of radius 2 to 200 pixels and plot measured circularity $4\pi A / P^2$ using cv2.contourArea and cv2.arcLength against radius. Explain the curve's small-radius behavior and its large-radius asymptote (it does not reach 1.0; quantify the gap and attribute it to the staircase perimeter bias of Section 2). Repeat with skimage.measure.perimeter_crofton, which estimates perimeter by the Crofton intersection formula, and recommend, with numbers, a minimum part diameter in pixels below which a circularity-based accept/reject gate should not be trusted.