Chapter 6: Morphology, Binary Images & Shape

"Everyone keeps buying bigger GPUs. Meanwhile I clean an entire production line with a 3 by 3 square and two operations from 1964. Erode, dilate, invoice."
A Minimalist Structuring Element

Chapter Overview

Walk into almost any modern factory and find the machine-vision cabinet. Inside it, a camera stares at a backlit conveyor, and a program decides, hundreds of times per minute, whether each passing part ships or gets rejected. The deciding code is rarely a neural network. More often it is a threshold, a couple of erosions and dilations, a connected-component pass, and three or four measured numbers compared against a spec sheet. This pipeline runs in microseconds, fails in ways an engineer can explain to a regulator, and has quietly inspected most of the bottles, pills, circuit boards, and engine parts manufactured since the 1980s. This chapter is about that pipeline: what happens after an image becomes binary.

Binary images are where the previous chapters have been heading. Chapter 2 built the thresholding machinery that converts grayscale into foreground and background, and the document scanner of Chapter 5 ends by binarizing a rectified page. But raw thresholds are messy: salt speckle where sensor noise crossed the line, pinholes where glare punched through an object, ragged edges, accidental bridges between touching parts. Section 6.1 lays the foundations needed to even talk about this mess precisely: what a binary image is as a set, how pixels become neighbors, and why the innocent-looking question "are these two pixels connected?" has a paradox at its center. Section 6.2 then introduces the two atoms of mathematical morphology, erosion and dilation: a small probe shape called a structuring element slides over the image and votes each pixel off or onto the island.

Atoms become molecules in Section 6.3. Composing erosion and dilation in opposite orders yields opening and closing, the workhorse cleanup operators that remove speckle and fill pinholes while leaving everything larger untouched; their differences yield morphological gradients and the top-hat transforms that rescue thresholding from uneven illumination. With a clean mask in hand, Section 6.4 crosses the most important bridge in classical vision: from pixels to objects. Connected-component labeling gives every blob an identity, and region properties (area, centroid, eccentricity, solidity, hole count) turn each identity into a row of measurements ready for a decision rule.

The last two sections deepen what "shape" means. Section 6.5 introduces the distance transform, which answers "how far is every pixel from the boundary?" in one pass and, with it, finds the widest point of a part, separates touching objects, and collapses elongated structures onto their skeletons for length and width measurement. Section 6.6 traces object boundaries into contours, compresses them into moments and invariant descriptors, and assembles a complete shape classifier in a page of code: a small preview of the classical recognition pipelines of Chapter 16.

This chapter also continues one of the book's arcs and starts another. The thresholding-to-binary thread that began in Chapter 2 runs through every page here and resurfaces in Chapter 24, where neural networks output per-pixel probabilities that are thresholded into exactly the masks this chapter manipulates; the cleanup and measurement tools you learn here are the standard post-processing behind modern segmentation models. And the distance transforms and skeletons of Section 6.5 return as watershed seeds in Chapter 11, as boundary-aware losses in segmentation training, and as signed distance fields in the 3D representations of Chapter 27.

Big Picture

Once an image is binary, vision becomes set theory: objects are sets of pixels, and a tiny algebra of set operations (erode, dilate, open, close, label, measure) solves a surprising share of real-world vision problems exactly, in microseconds, with failure modes you can reason about. Learn the algebra once and you will reach for it constantly, including as the post-processing stage behind deep segmentation models.

Remember the Chapter in Five Verbs: Threshold, Clean, Label, Measure, Decide

Nearly every system in this chapter is the same five-step pipeline, and the sections map onto it one for one: threshold the image into foreground and background (Chapter 2, and the connectivity vocabulary of Section 6.1), clean the mask with erosion, dilation, opening, and closing (Sections 6.2 and 6.3), label the connected components into discrete objects (Section 6.4), measure each object's geometry and shape (Sections 6.4 to 6.6), and decide with rules a quality auditor can read aloud. Hold those five verbs and you hold the chapter's skeleton; every algorithm below slots into one of them.

Learning Objectives

Represent binary images as pixel sets, choose between 4- and 8-connectivity deliberately, and explain the digital Jordan curve paradox that makes the choice matter.
Implement erosion and dilation from scratch, predict their effect from the structuring element's size and shape, and state their duality and composition properties.
Clean real thresholded masks with opening and closing, extract boundaries with morphological gradients, and fix uneven illumination with top-hat transforms.
Label connected components, filter them by area, and measure region properties with cv2.connectedComponentsWithStats and skimage.measure.regionprops.
Use distance transforms to measure thickness, place inscribed disks, and seed watershed separation of touching objects; extract and prune skeletons for length and width measurement.
Trace contours with hierarchy, compute moment invariants, build shape-descriptor feature vectors, and assemble a rule-based shape classifier.

Prerequisites

This chapter assumes you can produce a binary image, which is the business of Chapter 2: Point Operations, Histograms & Thresholding; Otsu's method appears here repeatedly without re-introduction. The sliding-window habit of mind from Chapter 3: Spatial Filtering & Convolution transfers directly: erosion and dilation are the min and max cousins of convolution, and the border-handling questions from that chapter return with new answers. You should be fluent with images as NumPy arrays from Chapter 1: Digital Image Fundamentals, and have a working environment from Chapter 0: Foundations: The Python Imaging Stack. The document scanner of Chapter 5: Geometric Transformations & Image Warping is the perfect warm-up: its final binarized page is precisely the kind of input this chapter feasts on.

Chapter Roadmap

6.1 Binary Images, Neighborhoods & Connectivity Binary images as pixel sets, 4- versus 8-neighborhoods, the digital Jordan curve paradox, and the grid distance metrics everything else builds on.
6.2 Erosion & Dilation The two atoms of morphology: a structuring element probes the image, shrinking or growing foreground; built from scratch, then run through OpenCV.
6.3 Opening, Closing & Morphological Gradients Composing the atoms: speckle removal, hole filling, idempotence, boundary extraction, and top-hat rescue of unevenly lit thresholds.
6.4 Connected Components & Region Properties From pixels to objects: labeling algorithms, area filtering, and the regionprops measurement toolkit that turns blobs into rows of numbers.
6.5 Distance Transforms & Skeletonization How far is every pixel from the boundary? Thickness measurement, separating touching objects, and collapsing shapes onto their medial axes.
6.6 Contours, Moments & Shape Descriptors Boundaries as polygons, moments as invariant signatures, and a complete rule-based shape classifier built from a handful of descriptors.

Fun Fact

Mathematical morphology was not invented for computers at all. Georges Matheron and Jean Serra developed it in 1964 at the Paris School of Mines to quantify ore samples viewed under microscopes: how much of this rock is mineral, and in grains of what size? The term "granulometry" you will meet in Section 6.3 literally refers to sieving crushed rock through ever-finer meshes. The mathematics of mining built the eyes of modern manufacturing.

What's Next?

This chapter assumed the binary image was obtainable: threshold, clean, measure. But many images arrive damaged in ways no threshold can fix: sensor noise, motion blur, missing regions, crushed dynamic range. Chapter 7: Image Restoration & Enhancement turns to undoing that damage with classical denoising, deblurring, and inpainting, and in doing so plants the seed of one of the book's biggest ideas: the denoising methods of Chapter 7 are exactly what diffusion models later learn to do, one small step at a time, in Chapter 33.

Bibliography & Further Reading

Foundational Papers

Haralick, R., Sternberg, S., and Zhuang, X. "Image Analysis Using Mathematical Morphology." IEEE TPAMI (1987). DOI

The paper that brought Matheron and Serra's morphology to the computer-vision mainstream: erosion, dilation, opening, closing, and their grayscale extensions, with the algebraic properties Sections 6.2 and 6.3 rely on.

Zhang, T. Y. and Suen, C. Y. "A Fast Parallel Algorithm for Thinning Digital Patterns." Communications of the ACM (1984). ACM Digital Library

The classic two-subiteration thinning algorithm behind skimage.morphology.skeletonize for 2D images (Section 6.5). Short, readable, and still in production forty years on.

Suzuki, S. and Abe, K. "Topological Structural Analysis of Digitized Binary Images by Border Following." Computer Vision, Graphics, and Image Processing (1985). DOI

The border-following algorithm implemented by cv2.findContours, including the contour hierarchy that distinguishes outer boundaries from holes (Section 6.6).

Hu, M.-K. "Visual Pattern Recognition by Moment Invariants." IRE Transactions on Information Theory (1962). DOI

Seven combinations of normalized central moments that are invariant to translation, scale, and rotation. Section 6.6 derives the invariance chain and exposes the seventh moment's mirror-detecting sign.

Felzenszwalb, P. and Huttenlocher, D. "Distance Transforms of Sampled Functions." Theory of Computing (2012). Open access

The elegant linear-time exact Euclidean distance transform, computed as two passes of a lower-envelope construction. The algorithm inside scipy.ndimage.distance_transform_edt style implementations (Section 6.5).

Books

Soille, P. "Morphological Image Analysis: Principles and Applications." Springer, 2nd ed. (2004). Springer

The standard practitioner's reference, successor to Serra's founding 1982 treatise. Covers everything in this chapter plus geodesic operators, reconstruction filters, and granulometries in depth.

Szeliski, R. "Computer Vision: Algorithms and Applications." Springer, 2nd ed. (2022). Free online edition

Section 3.3 covers binary image processing, morphology, distance transforms, and connected components with extensive pointers into the literature; free PDF from the author's site.

Tools & Libraries

OpenCV: Structural Analysis and Shape Descriptors (imgproc module). Official API reference

The authoritative documentation for findContours, moments, HuMoments, matchShapes, connectedComponentsWithStats, and the contour geometry toolkit of Sections 6.4 and 6.6.

scikit-image: skimage.morphology and skimage.measure APIs. Official documentation

NumPy-native morphology (including remove_small_objects, skeletonize, medial_axis) and the regionprops measurement engine that powers Section 6.4's region analysis.

SciPy: scipy.ndimage multidimensional image processing. Official documentation

Home of distance_transform_edt, label, and binary morphology that generalizes to 3D volumes unchanged; note its 4-connected default differs from OpenCV's 8-connected one (Section 6.1).

Kornia: differentiable morphology for PyTorch. kornia.morphology documentation

GPU-batched, autograd-compatible erosion, dilation, opening, closing, and gradients, used when morphological operations must live inside a training loop (Section 6.2's research frontier).

Tutorials

OpenCV-Python Tutorials: Morphological Transformations. Official tutorial

A short, code-first walkthrough of erosion, dilation, opening, closing, gradients, and the top-hat family; a good companion to Sections 6.2 and 6.3.

Datasets & Benchmarks

YACCLAB: Yet Another Connected Components Labeling Benchmark. GitHub repository

An open benchmark comparing dozens of connected-component labeling algorithms on standard datasets; the reason modern libraries' labelers (Section 6.4) are dramatically faster than textbook two-pass implementations.

Recent Research (2024-2026)

Kirchhoff, Y. et al. "Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures." ECCV (2024). arXiv:2404.03010

Uses the skeletons of Section 6.5 inside a deep-learning loss to keep vessels, roads, and cracks topologically connected: classical morphology steering modern segmentation training.

Ravi, N. et al. "SAM 2: Segment Anything in Images and Videos." (2024). arXiv:2408.00714

The foundation segmentation model whose output masks are, in practice, post-processed with exactly this chapter's tools: hole filling, small-object removal, and connected-component analysis.