"For years I was just foreground. Then one Tuesday a two-pass algorithm gave me a name, an area, a centroid, and a bounding box. I am component 17 now. I contain 4,212 pixels and my eccentricity is none of your business."
A Freshly Labeled Connected Component
Connected-component labeling is the bridge between image processing and object reasoning: it converts a cleaned binary mask into a list of discrete things, and region properties convert each thing into a row of numbers, at which point vision problems become data problems. Everything before this section manipulated pixels; everything after it (counting, gating, classifying, the recognition pipelines of Chapter 16) manipulates objects. This section builds the bridge twice: once by hand to understand it, once with the library calls used in production.
The previous section delivered a clean mask: one connected blob per physical object, no speckle, no pinholes. The definitions are also already in place: Section 6.1 defined a connected component as a maximal set of mutually reachable pixels under a chosen adjacency. What remains is the algorithmic step of actually assigning every foreground pixel an integer identity (this pixel belongs to object 3), and then the measurement step of summarizing each identity into properties a decision rule can consume. Both steps are humble, and the second is arguably the most economically productive operation in this book: most deployed vision systems are, at bottom, regionprops plus an if-statement.
1. Labeling: Giving Pixels an Identity Beginner
A labeling of a binary image is an integer image of the same size in which background pixels hold 0 and every foreground pixel holds the index of its component: all pixels of the first object hold 1, the second 2, and so on. Figure 6.4.1 shows the transformation on a small grid, together with the measurements that fall out of it almost for free.
The simplest correct algorithm is flood fill: scan for an unlabeled foreground pixel, assign it the next label, and propagate that label outward through neighbors (breadth-first or depth-first) until the component is exhausted; repeat. It visits each pixel a constant number of times, so it is linear in image size, and it makes the connectivity definition of Section 6.1 executable in a screenful of code:
import numpy as np
from collections import deque
def label_floodfill(mask, connectivity=8):
"""Connected-component labeling by BFS flood fill."""
offs = [(-1,0),(1,0),(0,-1),(0,1)]
if connectivity == 8:
offs += [(-1,-1),(-1,1),(1,-1),(1,1)]
h, w = mask.shape
labels = np.zeros((h, w), np.int32)
current = 0
for y in range(h):
for x in range(w):
if mask[y, x] and not labels[y, x]:
current += 1 # found a new object
queue = deque([(y, x)])
labels[y, x] = current
while queue: # spread the label
cy, cx = queue.popleft()
for dy, dx in offs:
ny, nx = cy + dy, cx + dx
if (0 <= ny < h and 0 <= nx < w
and mask[ny, nx] and not labels[ny, nx]):
labels[ny, nx] = current
queue.append((ny, nx))
return labels, current
yy, xx = np.mgrid[0:9, 0:9]
ring = ((np.abs(xx - 4) + np.abs(yy - 4)) == 3)
print(label_floodfill(ring, 8)[1], label_floodfill(ring, 4)[1])
# Output:
# 1 12 (the Section 6.1 diamond: one object 8-connected, twelve 4-connected)
Production labelers use a different classic, the two-pass algorithm: a single raster scan assigns provisional labels by looking only at already-visited neighbors (up and left), records an equivalence whenever two different provisional labels meet (a U-shaped object's two arms, discovered to be one object at the bottom of the U), and resolves all equivalences with a union-find structure before a second scan rewrites final labels. Two sequential, cache-friendly sweeps and a near-constant-time union-find make it extremely fast; the open YACCLAB benchmark tracks a whole zoo of refinements (block-based scanning, SIMD, GPU variants) that push hundreds of megapixels per second. You will never need to write one, but knowing the mechanism explains the otherwise mysterious phrase "label equivalence" in library documentation, and explains why labels come out in raster order.
2. The Production Call: connectedComponentsWithStats Beginner
OpenCV bundles labeling with the most common measurements in a single call, which is the workhorse of counting pipelines:
import cv2
import numpy as np
mask = cv2.imread("cleaned_parts_mask.png", cv2.IMREAD_GRAYSCALE)
n, labels, stats, centroids = cv2.connectedComponentsWithStats(
mask, connectivity=8)
print(f"{n - 1} objects") # label 0 is the background
for i in range(1, n):
x, y, w, h, area = stats[i] # bbox corner, bbox size, pixel count
cx, cy = centroids[i]
print(f" #{i}: area={area:5d} bbox=({x},{y},{w}x{h}) "
f"centroid=({cx:.1f},{cy:.1f})")
# Representative output:
# 4 objects
# #1: area= 8120 bbox=(64,40,112x99) centroid=(119.6,89.2)
# #2: area= 7903 bbox=(241,38,109x101) centroid=(295.0,88.8)
# #3: area= 311 bbox=(402,71,25x31) centroid=(414.1,86.0)
# #4: area= 8045 bbox=(78,210,110x100) centroid=(132.8,259.7)
cv2.connectedComponentsWithStats yields the count, the label image, a per-label stats matrix (bounding box and area, indexed by the cv2.CC_STAT_* constants), and sub-pixel centroids; object #3's suspiciously small area marks it as a fragment, not a part.
The stats matrix supports the single most useful post-cleanup operation: area filtering. Component #3 above is an order of magnitude smaller than its siblings, almost certainly debris that survived the opening of Section 6.3. Rebuilding the mask while dropping every component outside an area band is three lines (keep = np.isin(labels, good_ids) after selecting good_ids from the stats column), and unlike a larger opening it removes the debris with zero distortion of the survivors. Area filtering and morphological cleanup are complements, not rivals: morphology fixes pixel-scale damage within objects, area filtering enforces object-scale sanity between them.
3. regionprops: The Measurement Toolkit Intermediate
When a pipeline needs more than areas and boxes, scikit-image's regionprops is the standard instrument. It computes dozens of per-region properties lazily, and its sibling regionprops_table emits them as columnar data ready for pandas, which is the moment a vision problem officially becomes a data-science problem. The properties worth knowing by heart are summarized in Table 6.4.1.
| Property | What it measures | Typical decision it powers |
|---|---|---|
area | Pixel count of the region | Size gating; part present/absent |
centroid | Mean pixel coordinate | Pick-point for a robot; position check |
bbox | Axis-aligned bounding box | Cropping; coarse size limits |
perimeter | Boundary length estimate | Compactness and roughness scores |
eccentricity | Elongation of the best-fit ellipse, 0 (circle) to 1 (line) | Round versus elongated part classes |
orientation | Angle of the ellipse's major axis | Grasp angle; alignment check |
solidity | area / convex hull area | Detecting bites, cracks, concavities |
extent | area / bounding-box area | Distinguishing crosses from blobs |
euler_number | Components minus holes (here 1 minus holes) | Counting holes: washers versus discs |
Several of these will be derived properly in Section 6.6 (centroid and orientation from moments, perimeter from contours); here they are consumers' goods. The code below runs the full measurement pipeline on a parts image and lands the results in a dataframe:
import pandas as pd
from skimage import measure
lbl = measure.label(mask > 0, connectivity=2) # 8-connectivity in 2D
df = pd.DataFrame(measure.regionprops_table(
lbl,
properties=["label", "area", "centroid", "eccentricity",
"solidity", "euler_number"]))
print(df.round(3).to_string(index=False))
# Representative output:
# label area centroid-0 centroid-1 eccentricity solidity euler_number
# 1 8120 89.21 119.62 0.214 0.987 1
# 2 7903 88.83 295.04 0.198 0.991 0
# 3 311 86.04 414.13 0.892 0.704 1
# 4 8045 259.72 132.81 0.226 0.989 1
skimage.measure.regionprops_table: region 2's Euler number of 0 betrays a hole (one component minus one hole), region 3's high eccentricity and low solidity mark it as an elongated sliver, and downstream logic is now ordinary pandas.After labeling and measurement, the pipeline never touches pixels again: decisions run on a table of per-object features. This is the architectural pattern of all recognition, classical and deep. The pipelines of Chapter 16 replace these nine geometric features with hand-crafted descriptor vectors, and the detectors of Chapter 23 replace them with learned ones, but the shape of the computation (image to objects to feature rows to decision) is the one established on this page.
Computing Table 6.4.1's nine properties by hand means looping over labels, masking each region, and writing each formula (the ellipse fits alone, via second-order moments, are a page of careful code); call it 60 to 80 lines. regionprops_table(lbl, properties=[...]) is one line, evaluates lazily so you pay only for requested properties, handles edge-touching regions and empty inputs, and returns NumPy columns that drop straight into pandas. It also generalizes to 3D volumes unchanged, which matters the day your masks come from a CT scanner rather than a camera. Use spacing= to get physical units instead of pixels.
4. From Measurements to Decisions Beginner
The last step of a counting or gating pipeline is almost embarrassingly simple, and that simplicity is the selling point. A bolt-kit verifier might require exactly 12 regions with area in [7000, 9000] and eccentricity below 0.4 (the washers), plus 4 regions with eccentricity above 0.85 (the bolts), and reject otherwise. Such rules execute in microseconds, can be reviewed line by line in an audit, and fail loudly rather than plausibly. Their weakness is equally plain: they generalize exactly as far as the engineer's foresight. The moment classes overlap in every measured feature, rules give out, and that is the cliff edge where Section 6.6's richer shape descriptors, then Chapter 16's learned classifiers, take over.
Who: The automation lead at an e-commerce hardware retailer, building a returns-verification station that must confirm the contents of opened fastener kits (mixed screws, nuts, washers, up to 200 pieces) poured onto a tray.
Situation: The first prototype used a fine-tuned object detector from Chapter 23's family. On dense piles it missed heavily occluded pieces and double-counted at tile boundaries, plateauing at 96 percent count accuracy, far below the 99.9 percent the business case needed, and each new kit SKU required retraining.
Problem: Counting many small, identical, high-contrast objects is precisely the regime where detection networks struggle (tiny instances, mutual occlusion) and classical machinery excels, but the team had not considered the classical route.
Decision: Rebuild the station around physics plus this chapter: a translucent backlit tray (making segmentation trivial per Chapter 2), a vibration pulse to spread the pile, then threshold, open, label, and measure. Pieces were classified by area and eccentricity bands per SKU, configured from 20 golden samples in minutes rather than retraining. Clumps flagged by implausible area were routed to the distance-transform separator of Section 6.5.
Result: 99.95 percent count accuracy at 1.8 seconds per tray on a fanless industrial PC, new SKUs onboarded by a technician without a data scientist, and the deep detector was retired from this station (it kept its job on the loading dock, where lighting could not be controlled).
Lesson: When you control the lighting, contrast is free, and once contrast is free, counting is connected components plus a dataframe. Engineer the scene before you engineer the model.
Both halves of this section are active research surfaces. On the labeling side, GPU implementations matter at video rate and volume scale: RAPIDS cuCIM ports label and regionprops to CUDA for microscopy volumes, and the YACCLAB benchmark continues to absorb new block-and-union GPU labelers measured in gigapixels per second. On the counting side, the open-world counters now compete with the backlit tray: CountGD (Amini-Naieni et al., NeurIPS 2024, arXiv:2407.04619) counts arbitrary objects specified by text or visual exemplars in uncontrolled images, and SAM 2 (2024, arXiv:2408.00714) plus per-mask regionprops has become a standard zero-shot measurement pipeline in biology labs: a foundation model proposes the regions, and the 1960s measurement algebra of this section still produces the numbers that get published.
Trace the two-pass algorithm by hand on a U-shaped object scanned top to bottom, writing the provisional labels each pixel receives and the moment the equivalence between the two arms is discovered. Then construct an object shape that generates not one but three pairwise equivalences that must all resolve to a single label (hint: think of a comb), and explain why a naive "relabel on discovery" strategy without union-find can degrade to quadratic time on such shapes.
Create a mask of 50 random disks (radii 8 to 15 px) plus 500 specks (1 to 2 px). Clean it two ways: (a) morphological opening with an SE just large enough to kill the specks, and (b) labeling plus area filtering at a 50 px threshold. Compare the two results to the speck-free ground truth with a pixel-level intersection-over-union score, measure both runtimes, and explain the IoU gap you observe near disk boundaries in terms of Section 6.3's corner-rounding behavior.
Render the 26 uppercase letters in a bold sans-serif font, label each letter image, and compute its Euler number with regionprops. Verify the relation $E = C - H$ (components minus holes) letter by letter: B should give $1 - 2 = -1$, A and D give 0, and so on. Report which letters' results depend on the foreground connectivity chosen, and connect the failures you find at thin junctions back to the Jordan pairing of Section 6.1.