Section 6.4: Connected Components & Region Properties

"For years I was just foreground. Then one Tuesday a two-pass algorithm gave me a name, an area, a centroid, and a bounding box. I am component 17 now. I contain 4,212 pixels and my eccentricity is none of your business."
A Freshly Labeled Connected Component

Big Picture

Connected-component labeling is the bridge between image processing and object reasoning: it converts a cleaned binary mask into a list of discrete things, and region properties convert each thing into a row of numbers, at which point vision problems become data problems. Everything before this section manipulated pixels; everything after it (counting, gating, classifying, the recognition pipelines of Chapter 16) manipulates objects. This section builds the bridge twice: once by hand to understand it, once with the library calls used in production.

The previous section delivered a clean mask: one connected blob per physical object, no speckle, no pinholes. The definitions are also already in place: Section 6.1 defined a connected component as a maximal set of mutually reachable pixels under a chosen adjacency. What remains is the algorithmic step of actually assigning every foreground pixel an integer identity (this pixel belongs to object 3), and then the measurement step of summarizing each identity into properties a decision rule can consume. Both steps are humble, and the second is arguably the most economically productive operation in this book: most deployed vision systems are, at bottom, regionprops plus an if-statement.

1. Labeling: Giving Pixels an Identity Beginner

This is the step where your image stops being a picture and starts being a list of things you can count, sort, and reject, and almost every dollar a vision system earns is earned right here. A labeling of a binary image is an integer image of the same size in which background pixels hold 0 and every foreground pixel holds the index of its component: all pixels of the first object hold 1, the second 2, and so on. Figure 6.4.1 shows the transformation on a small grid, together with the measurements that fall out of it almost for free.

Figure 6.4.1 Labeling turns a binary mask (left) into a label image (center) where each connected component carries a distinct integer, here rendered as colors. From the label image, per-object measurements (right) such as area and centroid follow by simple accumulation, and the image has become a table.

The simplest correct algorithm is flood fill: scan for an unlabeled foreground pixel, assign it the next label, and propagate that label outward through neighbors (breadth-first or depth-first) until the component is exhausted; repeat. It visits each pixel a constant number of times, so it is linear in image size, and it makes the connectivity definition of Section 6.1 executable in a screenful of code:

# Connected-component labeling by BFS flood fill, the simplest correct method:
# find an unlabeled foreground pixel, give it the next label, and spread that
# label through neighbors. Validated on Section 6.1's paradox diamond.
import numpy as np
from collections import deque

def label_floodfill(mask, connectivity=8):
    """Connected-component labeling by BFS flood fill."""
    offs = [(-1,0),(1,0),(0,-1),(0,1)]
    if connectivity == 8:
        offs += [(-1,-1),(-1,1),(1,-1),(1,1)]
    h, w = mask.shape
    labels = np.zeros((h, w), np.int32)
    current = 0
    for y in range(h):
        for x in range(w):
            if mask[y, x] and not labels[y, x]:
                current += 1                      # found a new object
                queue = deque([(y, x)])
                labels[y, x] = current
                while queue:                      # spread the label
                    cy, cx = queue.popleft()
                    for dy, dx in offs:
                        ny, nx = cy + dy, cx + dx
                        if (0 <= ny < h and 0 <= nx < w
                                and mask[ny, nx] and not labels[ny, nx]):
                            labels[ny, nx] = current
                            queue.append((ny, nx))
    return labels, current

yy, xx = np.mgrid[0:9, 0:9]
ring = ((np.abs(xx - 4) + np.abs(yy - 4)) == 3)
print(label_floodfill(ring, 8)[1], label_floodfill(ring, 4)[1])
# Output:
# 1 12        (the Section 6.1 diamond: one object 8-connected, twelve 4-connected)

Code Fragment 1: A complete flood-fill labeler in 25 lines, parameterized by connectivity and validated on Section 6.1's paradox diamond; readable and correct, but Python-loop slow, which is exactly the gap the library implementations close.

Production labelers use a different classic, the two-pass algorithm. A single raster scan assigns provisional labels by looking only at already-visited neighbors (up and left), and records an equivalence whenever two different provisional labels meet on the same object: a U-shaped object's two arms, for instance, are discovered to be one object only at the bottom of the U. A union-find structure then resolves all those equivalences before a second scan rewrites the final labels. (Union-find is a bookkeeping data structure that groups labels known to be equal and answers "which group is this in?" in near-constant time.)

Two sequential, cache-friendly sweeps and a near-constant-time union-find make it extremely fast; the open YACCLAB benchmark tracks a whole zoo of refinements (block-based scanning, SIMD, GPU variants) that push hundreds of megapixels per second. The gap is worth feeling rather than merely noting: the readable flood fill of Code Fragment 1, running in a pure Python loop, labels on the order of a single megapixel per second, so a 4K frame (about 8 megapixels) takes it several seconds, while cv2.connectedComponents finishes the same frame in well under a millisecond, a roughly thousandfold speedup for the identical answer. That is the difference between one frame every few seconds and a thousand frames a second, which is precisely why the conveyor in this chapter's opening can inspect parts hundreds of times per minute.

You will never need to write one, but knowing the mechanism explains the otherwise mysterious phrase "label equivalence" in library documentation, and explains why labels come out in raster order.

💡 Mental Model: Union-Find Is a Room of People Merging Name Tags

Think of the first scan as handing out provisional name tags, and union-find as a roomful of people who keep discovering they are on the same team. Each foreground pixel that has no labeled neighbor above or to the left gets a brand-new tag (a new provisional label); a pixel touching two already-tagged neighbors with different tags has just discovered those two groups are actually one object, so it records "tag 5 and tag 8 are the same team." Union-find is the clipboard that tracks these merges: union staples two groups together when an equivalence is found, and find answers "whose team is this tag on, really?" by following the staples to the group's elected representative. The second scan simply replaces every provisional tag with its team's representative, which is why a U-shaped object, tagged as two separate arms on the way down, collapses to a single label once the bottom of the U reveals the arms were one team all along.

Where this model breaks down: real union-find compresses the chain of staples so future lookups are near-instant (the "near-constant time" claimed above), an optimization the roomful-of-people picture does not capture.

2. The Production Call: connectedComponentsWithStats Beginner

OpenCV bundles labeling with the most common measurements in a single call, which is the workhorse of counting pipelines:

# Label and measure in one call: connectedComponentsWithStats returns the
# count, the label image, a per-label bounding-box + area matrix, and sub-pixel
# centroids. The tiny-area component (#3) is the giveaway for a stray fragment.
import cv2
import numpy as np

mask = cv2.imread("cleaned_parts_mask.png", cv2.IMREAD_GRAYSCALE)

n, labels, stats, centroids = cv2.connectedComponentsWithStats(
    mask, connectivity=8)

print(f"{n - 1} objects")               # label 0 is the background
for i in range(1, n):
    x, y, w, h, area = stats[i]         # bbox corner, bbox size, pixel count
    cx, cy = centroids[i]
    print(f"  #{i}: area={area:5d}  bbox=({x},{y},{w}x{h})  "
          f"centroid=({cx:.1f},{cy:.1f})")
# Representative output:
# 4 objects
#   #1: area= 8120  bbox=(64,40,112x99)   centroid=(119.6,89.2)
#   #2: area= 7903  bbox=(241,38,109x101) centroid=(295.0,88.8)
#   #3: area=  311  bbox=(402,71,25x31)   centroid=(414.1,86.0)
#   #4: area= 8045  bbox=(78,210,110x100) centroid=(132.8,259.7)

Code Fragment 2: One call, four return values: cv2.connectedComponentsWithStats yields the count, the label image, a per-label stats matrix (bounding box and area, indexed by the cv2.CC_STAT_* constants), and sub-pixel centroids; object #3's suspiciously small area marks it as a fragment, not a part.

The stats matrix supports the single most useful post-cleanup operation: area filtering. Component #3 above is an order of magnitude smaller than its siblings, almost certainly debris that survived the opening of Section 6.3. Rebuilding the mask while dropping every component outside an area band is three lines (keep = np.isin(labels, good_ids) after selecting good_ids from the stats column), and unlike a larger opening it removes the debris with zero distortion of the survivors. Area filtering and morphological cleanup are complements, not rivals: morphology fixes pixel-scale damage within objects, area filtering enforces object-scale sanity between them.

3. regionprops: The Measurement Toolkit Intermediate

When a pipeline needs more than areas and boxes, scikit-image's regionprops is the standard instrument. It computes dozens of per-region properties lazily, and its sibling regionprops_table emits them as columnar data ready for pandas, which is the moment a vision problem officially becomes a data-science problem. The properties worth knowing by heart are summarized in Table 6.4.1.

**Table 6.4.1** The core `regionprops` measurements and what they are for.
Property	What it measures	Typical decision it powers
`area`	Pixel count of the region	Size gating; part present/absent
`centroid`	Mean pixel coordinate	Pick-point for a robot; position check
`bbox`	Axis-aligned bounding box	Cropping; coarse size limits
`perimeter`	Boundary length estimate	Compactness and roughness scores
`eccentricity`	Elongation of the best-fit ellipse, 0 (circle) to 1 (line)	Round versus elongated part classes
`orientation`	Angle of the ellipse's major axis	Grasp angle; alignment check
`solidity`	area / convex hull area (the hull is the smallest convex shape, a rubber band snapped around the region, that contains every foreground pixel)	Detecting bites, cracks, concavities
`extent`	area / bounding-box area	Distinguishing crosses from blobs
`euler_number`	Components minus holes (here 1 minus holes)	Counting holes: washers versus discs

Several of these will be derived properly in Section 6.6 (centroid and orientation from moments, perimeter from contours); here they are consumers' goods. The code below runs the full measurement pipeline on a parts image and lands the results in a dataframe:

# Run the full measurement pipeline and land the results in a dataframe:
# regionprops_table emits the chosen properties as NumPy columns, so the
# Euler number, eccentricity, and solidity become ordinary pandas columns.
import pandas as pd
from skimage import measure

lbl = measure.label(mask > 0, connectivity=2)        # 8-connectivity in 2D
df = pd.DataFrame(measure.regionprops_table(
    lbl,
    properties=["label", "area", "centroid", "eccentricity",
                "solidity", "euler_number"]))
print(df.round(3).to_string(index=False))
# Representative output:
#  label  area  centroid-0  centroid-1  eccentricity  solidity  euler_number
#      1  8120      89.21      119.62         0.214     0.987             1
#      2  7903      88.83      295.04         0.198     0.991             0
#      3   311      86.04      414.13         0.892     0.704             1
#      4  8045     259.72     132.81          0.226     0.989             1

Code Fragment 3: From mask to dataframe in four statements with skimage.measure.regionprops_table: region 2's Euler number of 0 betrays a hole (one component minus one hole), region 3's high eccentricity and low solidity mark it as an elongated sliver, and downstream logic is now ordinary pandas.

Key Insight: The Image Disappears Here

After labeling and measurement, the pipeline never touches pixels again: decisions run on a table of per-object features. This is the architectural pattern of all recognition, classical and deep. The pipelines of Chapter 16 replace these nine geometric features with hand-crafted descriptor vectors, and the detectors of Chapter 23 replace them with learned ones, but the shape of the computation (image to objects to feature rows to decision) is the one established on this page.

Library Shortcut: regionprops_table Replaces Your Measurement Loop

Computing Table 6.4.1's nine properties by hand means looping over labels, masking each region, and writing each formula (the ellipse fits alone, via second-order moments, are a page of careful code); call it 60 to 80 lines. regionprops_table(lbl, properties=[...]) is one line, evaluates lazily so you pay only for requested properties, handles edge-touching regions and empty inputs, and returns NumPy columns that drop straight into pandas. It also generalizes to 3D volumes unchanged, which matters the day your masks come from a CT scanner rather than a camera. Use spacing= to get physical units instead of pixels.

Fun Fact

The phrase "connected component" predates the computers that count them: it was borrowed wholesale from topology, where mathematicians had spent a century arguing about exactly which point sets count as one piece. Vision quietly inherited the whole vocabulary, which is why a humble two-pass labeler that counts pills on a tray secretly speaks the same language as the proofs about coffee cups and donuts. The blob does not know it is a topological object; it just knows it is now component 17, with an area, a centroid, and an eccentricity it would rather you not bring up.

4. From Measurements to Decisions Beginner

The last step of a counting or gating pipeline is almost embarrassingly simple, and that simplicity is the selling point. A bolt-kit verifier might require exactly 12 regions with area in [7000, 9000] and eccentricity below 0.4 (the washers), plus 4 regions with eccentricity above 0.85 (the bolts), and reject otherwise. Such rules execute in microseconds, can be reviewed line by line in an audit, and fail loudly rather than plausibly. Their weakness is equally plain: they generalize exactly as far as the engineer's foresight. The moment classes overlap in every measured feature, rules give out, and that is the cliff edge where Section 6.6's richer shape descriptors, then Chapter 16's learned classifiers, take over.

Practical Example: The Returns Station That Out-Counted the Detector

Who: The automation lead at an e-commerce hardware retailer, building a returns-verification station that must confirm the contents of opened fastener kits (mixed screws, nuts, washers, up to 200 pieces) poured onto a tray.

Situation: The first prototype used a fine-tuned object detector from Chapter 23's family. On dense piles it missed heavily occluded pieces and double-counted at tile boundaries, plateauing at 96 percent count accuracy, far below the 99.9 percent the business case needed, and each new kit SKU required retraining.

Problem: Counting many small, identical, high-contrast objects is precisely the regime where detection networks struggle (tiny instances, mutual occlusion) and classical machinery excels, but the team had not considered the classical route.

Decision: Rebuild the station around physics plus this chapter: a translucent backlit tray (making segmentation trivial per Chapter 2), a vibration pulse to spread the pile, then threshold, open, label, and measure. Pieces were classified by area and eccentricity bands per SKU, configured from 20 golden samples in minutes rather than retraining. Clumps flagged by implausible area were routed to the distance-transform separator of Section 6.5.

Result: 99.95 percent count accuracy at 1.8 seconds per tray on a fanless industrial PC, new SKUs onboarded by a technician without a data scientist, and the deep detector was retired from this station (it kept its job on the loading dock, where lighting could not be controlled).

Lesson: When you control the lighting, contrast is free, and once contrast is free, counting is connected components plus a dataframe. Engineer the scene before you engineer the model.

Research Frontier: Counting and Labeling in 2024-2026

Both halves of this section are active research surfaces. On the labeling side, GPU implementations matter at video rate and volume scale: RAPIDS cuCIM ports label and regionprops to CUDA for microscopy volumes, and the YACCLAB benchmark continues to absorb new block-and-union GPU labelers measured in gigapixels per second. On the counting side, the open-world counters now compete with the backlit tray: CountGD (Amini-Naieni et al., NeurIPS 2024, arXiv:2407.04619) counts arbitrary objects specified by text or visual exemplars in uncontrolled images, and SAM 2 (2024, arXiv:2408.00714) plus per-mask regionprops has become a standard zero-shot measurement pipeline in biology labs: a foundation model proposes the regions, and the 1960s measurement algebra of this section still produces the numbers that get published.

You Could Build This: A Phone-Camera Object Counter (Beginner, about one hour)

You now have the entire threshold, clean, label, measure, decide pipeline in your hands, which is enough for a genuinely useful portfolio piece: a small web app that counts and sizes objects in a phone photo. Pick one concrete domain (coins on a desk, lentils on a plate, pills in a tray, or seeds on graph paper for scale) and wire up Otsu thresholding from Chapter 2, an opening to kill speckle (Section 6.3), connectedComponentsWithStats for the count, and a regionprops area histogram to flag outliers (two coins stuck together read as one double-area blob). Wrap it in a tiny upload page and report a count, a size distribution, and a flagged-clump list. It demonstrates exactly the architecture the returns-station example above shipped to production, runs in milliseconds with no GPU, and the size-histogram clump detector is a natural on-ramp to the distance-transform separator of Section 6.5. Frame it as "controlled-scene measurement" in an interview and you have a story about engineering the scene before the model.

Exercise 6.4.1: Equivalences in the Wild Conceptual

Trace the two-pass algorithm by hand on a U-shaped object scanned top to bottom, writing the provisional labels each pixel receives and the moment the equivalence between the two arms is discovered. Then construct an object shape that generates not one but three pairwise equivalences that must all resolve to a single label (hint: think of a comb), and explain why a naive "relabel on discovery" strategy without union-find can degrade to quadratic time on such shapes.

Exercise 6.4.2: The Area-Filter Bake-Off Coding

Create a mask of 50 random disks (radii 8 to 15 px) plus 500 specks (1 to 2 px). Clean it two ways: (a) morphological opening with an SE just large enough to kill the specks, and (b) labeling plus area filtering at a 50 px threshold. Compare the two results to the speck-free ground truth with a pixel-level intersection-over-union score, measure both runtimes, and explain the IoU gap you observe near disk boundaries in terms of Section 6.3's corner-rounding behavior.

Exercise 6.4.3: Euler's Typography Audit Analysis

Render the 26 uppercase letters in a bold sans-serif font, label each letter image, and compute its Euler number with regionprops. Verify the relation $E = C - H$ (components minus holes) letter by letter: B should give $1 - 2 = -1$, A and D give 0, and so on. Report which letters' results depend on the foreground connectivity chosen, and connect the failures you find at thin junctions back to the Jordan pairing of Section 6.1.