"Translate me, rotate me, shrink me to a thumbnail: my seven numbers will not budge. Hold me up to a mirror, though, and exactly one of them changes its mind."
A Defiantly Invariant Hu Moment
A blob's boundary carries nearly everything worth knowing about its shape, and this section compresses that boundary, step by deliberate step, into a handful of numbers that do not change when the object slides, turns, or shrinks, so that recognizing a shape becomes comparing two short vectors. The pipeline runs: trace the boundary into a polygon with a hierarchy of holes, measure the polygon (perimeter, area, vertices, convexity), then distill it through the moment invariance chain into descriptors. At the end we assemble the chapter's promised payoff: a complete, auditable shape classifier in a page of code.
Section 6.5 collapsed shapes onto their centerlines; this final section walks their edges. We have met boundaries twice already: the morphological gradient of Section 6.3 produced a rim of boundary pixels, and Section 6.4's regionprops quietly consumed boundaries to report perimeters and orientations it did not explain. Both left the boundary as an unordered set. The upgrade in this section is order: a contour is the boundary traced in sequence, a closed polygon you can walk, simplify, integrate over, and compare. Order is what turns a rim into geometry, and geometry into recognition, completing the journey this chapter began when a threshold from Chapter 2 first split the world into foreground and background.
1. From Pixel Sets to Boundaries: Border Following Intermediate
The standard tracing algorithm is Suzuki and Abe's border following (1985), the engine inside cv2.findContours. It raster-scans the binary image for a foreground pixel whose left neighbor is background (a border start), then walks counterclockwise around the component, keeping foreground on one side, until it returns to the start. Crucially, it does this for every border in the image, outer boundaries and hole boundaries alike, and records who surrounds whom. The connectivity conventions of Section 6.1 are baked in exactly as Rosenfeld prescribed: 8-connected foreground borders, 4-connected holes, which is why a diagonal chain of pixels traces as one contour rather than many.
The "who surrounds whom" record is the contour hierarchy, and choosing how much of it to keep is the first API decision. cv2.RETR_EXTERNAL keeps only outermost boundaries (fastest, blind to holes); cv2.RETR_LIST returns every contour with no structure; cv2.RETR_CCOMP arranges them into exactly two levels, outer boundaries and their holes, which is the workhorse mode for parts on a tray; and cv2.RETR_TREE keeps the full nesting tree, which a washer lying inside a larger ring genuinely needs. The companion flag cv2.CHAIN_APPROX_SIMPLE compresses straight runs of boundary pixels to their endpoints (a rectangle becomes 4 points instead of hundreds) while cv2.CHAIN_APPROX_NONE keeps every boundary pixel. Figure 6.6.1 shows a three-deep scene and the tree that RETR_TREE builds from it, including the four-integer record [next, previous, first_child, parent] that OpenCV returns per contour.
RETR_TREE hierarchy as OpenCV encodes it, one [next, previous, first_child, parent] quadruple per contour; C0 and C3 are siblings at depth 0, C1 is C0's child, C2 is C1's child. Nesting depth parity tells you instantly whether a contour bounds an object or a hole.
The code below traces the scene of Figure 6.6.1 and walks the hierarchy, computing each contour's depth by following parent pointers. Depth parity then classifies every contour without any geometry at all, an idea Section 6.4's Euler number expressed as a single global count and the hierarchy now itemizes:
import cv2
bw = cv2.imread("plate_scene.png", cv2.IMREAD_GRAYSCALE)
contours, hierarchy = cv2.findContours(bw, cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)
print(f"{len(contours)} contours found")
for i, (nxt, prev, child, parent) in enumerate(hierarchy[0]):
depth, p = 0, parent
while p != -1: # climb parent links to the root
depth += 1
p = hierarchy[0][p][3]
kind = "object border" if depth % 2 == 0 else "hole border"
print(f"C{i}: depth {depth} ({kind:13s}) "
f"{len(contours[i]):4d} pts child={child:2d} parent={parent:2d}")
# Output:
# 4 contours found
# C0: depth 0 (object border ) 482 pts child= 1 parent=-1
# C1: depth 1 (hole border ) 168 pts child= 2 parent= 0
# C2: depth 2 (object border ) 62 pts child=-1 parent= 1
# C3: depth 0 (object border ) 139 pts child=-1 parent=-1
RETR_TREE hierarchy of Figure 6.6.1's scene: each contour's nesting depth is recovered by climbing parent pointers, and depth parity alone separates object borders (even) from hole borders (odd), no pixel geometry required.2. Contour Geometry: Perimeter, Area & Polygon Simplification Beginner
Once a boundary is an ordered polygon, classical geometry applies verbatim. cv2.contourArea evaluates the shoelace formula (Green's theorem specialized to polygons):
$$ A \;=\; \frac{1}{2} \,\Bigl|\, \sum_{i=0}^{n-1} \bigl( x_i\, y_{i+1} - x_{i+1}\, y_i \bigr) \Bigr|, $$
which is why its answer differs slightly from the pixel count Section 6.4 reported as area: one integrates a polygon threaded through pixel centers, the other counts squares. The discrepancy is worth a digit of caution on small blobs and nothing more. Perimeter via cv2.arcLength deserves more caution: summing pixel steps along a staircase systematically overestimates the smooth boundary length (a digital diagonal line of length $L$ measures up to about 8 percent long), so any spec written in millimeters of perimeter should be calibrated against parts of known size rather than trusted absolutely. Derived ratios inherit the bias; we return to this in Exercise 6.6.3.
The third primitive is simplification. The Douglas-Peucker algorithm inside cv2.approxPolyDP recursively keeps only vertices that deviate from a straight chord by more than a tolerance $\varepsilon$, typically set as a fraction of the perimeter. It converts a noisy 400-point boundary into the 4-point quadrilateral that the document scanner of Chapter 5 hunted for, or the 6-point hexagon that will identify a nut in Section 5 below. Together with the convex hull (cv2.convexHull) these produce the working measurements of contour analysis:
import cv2
import numpy as np
contours, _ = cv2.findContours(bw, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
c = max(contours, key=cv2.contourArea) # largest object
area = cv2.contourArea(c) # shoelace / Green's theorem
perim = cv2.arcLength(c, closed=True)
approx = cv2.approxPolyDP(c, 0.02 * perim, closed=True)
hull = cv2.convexHull(c)
print(f"area {area:.0f} px^2, perimeter {perim:.1f} px")
print(f"vertices after Douglas-Peucker: {len(approx)}")
print(f"circularity {4*np.pi*area/perim**2:.3f}, "
f"solidity {area/cv2.contourArea(hull):.3f}")
# Representative output for a slightly dog-eared rectangular label:
# area 50412 px^2, perimeter 916.4 px
# vertices after Douglas-Peucker: 4
# circularity 0.754, solidity 0.988
3. Moments & the Invariance Chain Intermediate
Geometry measures a shape as placed; recognition needs numbers that survive placement. The instrument is the family of image moments, borrowed directly from mechanics: treat the foreground as a sheet of unit mass and compute
$$ m_{pq} \;=\; \sum_{(x,\,y) \in A} x^p\, y^q . $$
The zeroth moment $m_{00}$ is the area; the first moments give the centroid $\bar x = m_{10}/m_{00}$, $\bar y = m_{01}/m_{00}$, the formula Section 6.4 promised we would derive. Second moments describe how mass spreads around the centroid, and from them comes the orientation that regionprops reports: the major axis of the best-fit ellipse sits at
$$ \theta \;=\; \tfrac{1}{2}\, \operatorname{atan2}\!\bigl( 2\mu_{11},\; \mu_{20} - \mu_{02} \bigr), $$
where the $\mu_{pq}$ are the central moments, the same sums computed about the centroid, $\mu_{pq} = \sum (x - \bar x)^p (y - \bar y)^q$. Centering is the first rung of a ladder that Figure 6.6.2 lays out in full. Central moments do not change when the object translates. Dividing by the right power of area, $\eta_{pq} = \mu_{pq} / m_{00}^{\,1 + (p+q)/2}$, yields normalized central moments that also ignore scale. The final rung is Hu's 1962 construction: seven polynomial combinations of the $\eta_{pq}$ up to third order that are additionally unchanged by rotation. The first two are
$$ \phi_1 = \eta_{20} + \eta_{02}, \qquad \phi_2 = (\eta_{20} - \eta_{02})^2 + 4\eta_{11}^2, $$
and five more follow in the same spirit. The seventh is special: it is a skew invariant whose magnitude survives reflection but whose sign flips when the shape is mirrored, because mirroring negates every central moment with an odd power of one coordinate, and $\phi_7$ alone is built from an odd combination. One bit of asymmetry survives all that quotienting, and Section 5's practical example will spend exactly that bit.
Hu invariants have a wide dynamic range ($\phi_1$ might be $10^{-3}$ while $\phi_7$ is $10^{-21}$), so in practice everyone compares them on a signed log scale, $h_i = -\operatorname{sign}(\phi_i)\,\log_{10}\lvert\phi_i\rvert$. The demonstration below puts the whole chain to the test: a wrench mask is rotated 37 degrees and shrunk to 60 percent, then mirrored, and the log-scale invariants are printed for all three. Watch the last column:
import cv2
import numpy as np
def log_hu(mask):
"""Seven Hu invariants on a signed log10 scale."""
hu = cv2.HuMoments(cv2.moments(mask, binaryImage=True)).flatten()
return -np.sign(hu) * np.log10(np.abs(hu) + 1e-30)
mask = cv2.imread("wrench_mask.png", cv2.IMREAD_GRAYSCALE)
h, w = mask.shape
M = cv2.getRotationMatrix2D((w/2, h/2), angle=37, scale=0.6)
moved = cv2.warpAffine(mask, M, (w, h)) # rotated + scaled copy
mirrored = cv2.flip(mask, 1) # left-right reflection
for name, m in [("original", mask), ("rot 37, scale 0.6", moved),
("mirrored", mirrored)]:
print(f"{name:18s}", np.round(log_hu(m), 2))
# Representative output (phi_1 .. phi_7, log scale):
# original [ 2.95 6.61 9.69 10.31 20.43 13.66 -20.65]
# rot 37, scale 0.6 [ 2.95 6.60 9.70 10.33 20.47 13.66 -20.70]
# mirrored [ 2.95 6.61 9.69 10.31 20.43 13.66 20.65]
Every rung of the chain in Figure 6.6.2 destroys information, and that is its job: position, scale, and orientation are nuisances when classifying, so you quotient them out. But they are the answer when gripping or aligning, so a real pipeline reads the centroid and $\theta$ off the early rungs, then climbs the rest of the ladder for recognition. And sometimes an invariance is a bug: a classifier built on $\phi_1$ through $\phi_6$ (or on matchShapes, which compares magnitudes) is constitutionally incapable of telling a part from its mirror image. Choose which transformations your numbers should forget; do not let the library choose for you.
Hu published his seven invariants in 1962, and for decades they were taught, implemented, and shipped as a canonical set. It took until 2000 for Jan Flusser to prove the set is neither independent ($\phi_3$ is expressible from the others) nor complete. Thirty-eight years of production use before anyone audited the algebra; the fix (Flusser's complete invariant systems) exists, but the seven originals remain the default in every library, a monument to "good enough, shipped first."
4. A Descriptor Vector for Every Blob Beginner
Moments are one family of shape numbers; the geometry of Section 2 supplies the rest. In practice a small, interpretable descriptor vector covers an enormous range of industrial decisions. Circularity, $4\pi A / P^2$, is 1 for a perfect disk and falls as the boundary wrinkles or elongates. Aspect ratio from cv2.minAreaRect captures elongation without caring about orientation. Extent (area over bounding-box area) separates crosses from blobs; solidity (area over convex-hull area) drops the moment a bite, crack, or concavity appears; the Douglas-Peucker vertex count reads off polygonal class directly (4 means quadrilateral, 6 means hexagon); and the hierarchy's hole count from Section 1 is the topological feature no amount of boundary noise can fake. Each is one or two lines on top of code we have already written, and each has a failure mode you can state in a sentence, which is exactly what makes the vector auditable.
When classes are too subtle for thresholds on individual descriptors, the next step up is comparing whole invariant vectors against labeled templates, which is nearest-neighbor classification in descriptor space and the conceptual ancestor of everything in Chapter 16. OpenCV ships the comparison ready-made:
cv2.matchShapes Is a Shape Metric in One LineHand-rolling template comparison means computing moments, building Hu vectors, applying the signed log map, guarding against zero moments, and choosing a distance: about 30 lines of fussy code. cv2.matchShapes(c1, c2, cv2.CONTOURS_MATCH_I1, 0) is one line, a 30-to-1 reduction, and handles all of it internally: it forms $m_i = \operatorname{sign}(\phi_i) \log_{10} \lvert \phi_i \rvert$ for both shapes and returns $\sum_i \lvert 1/m_i^A - 1/m_i^B \rvert$ (methods I2 and I3 are alternative formulas on the same ingredients). Near-zero is a match. It accepts contours or whole grayscale images, and because it compares magnitudes only, it is mirror-blind by design; when handedness matters, check the sign of $\phi_7$ yourself as in Section 3.
5. The Complete Shape Classifier Intermediate
Now the payoff the chapter overview promised: a complete shape classifier in a page of code, every stage of which this chapter built. Threshold with Otsu (Chapter 2), clean with an opening (Section 6.3), trace with hierarchy (Section 1), describe (Sections 2 to 4), and decide with rules a quality auditor can read aloud. The parts tray contains screws, washers, and hex nuts; topology and three descriptors are enough to tell them apart:
import cv2
import numpy as np
def describe(c):
"""Descriptor vector for one outer contour."""
area = cv2.contourArea(c)
perim = cv2.arcLength(c, True)
circularity = 4 * np.pi * area / perim**2 if perim > 0 else 0.0
solidity = area / cv2.contourArea(cv2.convexHull(c))
(w, h) = cv2.minAreaRect(c)[1] # oriented box sides
aspect = max(w, h) / max(min(w, h), 1e-6)
verts = len(cv2.approxPolyDP(c, 0.02 * perim, True))
return dict(area=area, circularity=circularity,
solidity=solidity, aspect=aspect, vertices=verts)
def classify(d, has_hole):
if has_hole and d["circularity"] > 0.80: # round ring
return "washer"
if has_hole and 5 <= d["vertices"] <= 7: # hexagonal ring
return "nut"
if not has_hole and d["aspect"] > 2.5: # long and solid
return "screw"
return "reject" # fail loud, not plausible
img = cv2.imread("tray.png", cv2.IMREAD_GRAYSCALE)
_, bw = cv2.threshold(img, 0, 255,
cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
se = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
bw = cv2.morphologyEx(bw, cv2.MORPH_OPEN, se) # de-speckle first
contours, hier = cv2.findContours(bw, cv2.RETR_CCOMP,
cv2.CHAIN_APPROX_SIMPLE)
counts = {}
for i, c in enumerate(contours):
if hier[0][i][3] != -1: # skip hole borders themselves
continue
if cv2.contourArea(c) < 200: # dust gate
continue
has_hole = hier[0][i][2] != -1 # any child contour means a hole
label = classify(describe(c), has_hole)
counts[label] = counts.get(label, 0) + 1
print(counts)
# Representative output on a 24-piece kit tray:
# {'screw': 12, 'washer': 8, 'nut': 4}
Three habits in this code repay study. The reject class exists because a rule-based system should fail loudly on the unexpected, never shrug a foreign object into the nearest class. The hole feature comes from the hierarchy, not from any intensity heuristic, so grease stains cannot fake it. And every threshold (0.80, 2.5, 200) traces to a measurable property of golden samples, so recalibrating for a new part is a measurement session, not a retraining run. When classes finally overlap in every descriptor you can write down, that is the boundary of this method and the on-ramp to the learned features of Chapter 10 and the classifiers of Chapter 16.
Who: A vision engineer at a furniture-hardware supplier whose kitting line packs door-hinge sets: screws, washers, and a left-hand and a right-hand mounting bracket per kit.
Situation: A camera over a backlit tray verified each kit before sealing, using a descriptor classifier much like this section's. Screws and washers sorted perfectly. The two brackets did not: kits kept shipping with two left brackets, and the resulting installer complaints were the warranty department's top line item.
Problem: The brackets are exact mirror images. Area, perimeter, circularity, solidity, aspect ratio, vertex count, $\phi_1$ through $\phi_6$, and cv2.matchShapes are all reflection-blind by construction, so every number the system computed was identical for both parts. The team's first instinct, training a small CNN, stalled on explainability requirements and a 5 ms cycle budget.
Decision: Use the one reflection-sensitive number in the standard toolkit: the sign of $\phi_7$. They verified on 2,000 golden samples that $\lvert\phi_7\rvert$ for this bracket sits far from zero (the shape is strongly chiral), then added a single rule: positive sign means right-hand, negative means left-hand, magnitude below a guard threshold means reject for manual inspection.
Result: Left/right separation went to 100 percent on a 50,000-part audit at no measurable cycle-time cost, the guard threshold caught a handful of bent (and therefore nearly symmetric) brackets as a free bonus, and the auditors signed off on a decision rule that fits in one sentence.
Lesson: Invariances are design choices. The team did not need a richer model; they needed one number that deliberately kept the asymmetry every other number had been engineered to forget.
This section's objects are thriving inside modern pipelines. SAM 2 (Ravi et al., 2024, arXiv:2408.00714) made promptable segmentation a commodity, and the masks it emits are post-processed in production with exactly this section's toolkit: findContours, hole-aware hierarchies, polygon simplification for annotation formats, and descriptor gates to reject implausible masks; the measurement layer of Chapter 24's models is this chapter. Topology is becoming a training signal too: Betti-matching losses (Stucki et al., ICML 2023, with an efficient 3D extension in 2024) penalize a segmentation for predicting the wrong number of holes or components, differentiating the very quantities our hierarchy walk counted. And orientation-from-moments has a learned descendant: oriented-bounding-box heads in Ultralytics YOLO11 (2024) regress per-object angles end to end for aerial and industrial imagery, while the broader handcrafted-descriptor program of this section is carried forward by the learned representations of Chapter 25, which trade auditability for breadth.
Show algebraically why mirroring an image about the $y$ axis negates every central moment $\mu_{pq}$ with odd $p$ and leaves the rest unchanged, and use this to argue that $\phi_1$ through $\phi_6$ are reflection-invariant while $\phi_7$ flips sign. Then explain why cv2.matchShapes cannot distinguish a part from its mirror image, and describe two industrial situations from this chapter's examples where that blindness is harmless and one where it ships the wrong part.
Render the 26 uppercase letters as binary masks with cv2.putText at a large font size. Build a template library of log-scale Hu vectors, then query it with rotated (random angles) and rescaled (0.5x to 2x) versions of each letter using nearest neighbor on the Hu vectors, and separately with cv2.matchShapes. Report the confusion matrix for each method. Which letter pairs collapse under rotation invariance (consider N and Z, M and W under mirroring), and which descriptor from Section 4 would you add to break each tie?
Rasterize perfect disks of radius 2 to 200 pixels and plot measured circularity $4\pi A / P^2$ using cv2.contourArea and cv2.arcLength against radius. Explain the curve's small-radius behavior and its large-radius asymptote (it does not reach 1.0; quantify the gap and attribute it to the staircase perimeter bias of Section 2). Repeat with skimage.measure.perimeter_crofton, which estimates perimeter by the Crofton intersection formula, and recommend, with numbers, a minimum part diameter in pixels below which a circularity-based accept/reject gate should not be trusted.