Part I: Image Processing
Chapter 7: Image Restoration & Enhancement

Inpainting: Filling the Holes

"Nobody knows what was under the coffee stain, least of all me. But my brushstrokes never stop at the edge of it, so nobody ever asks."

An Overconfident Exemplar Patch
Big Picture

Inpainting is restoration with zero data inside the hole: whatever appears there comes entirely from the prior, so the method's belief about images is not a correction term, it is the whole answer. Smoothness priors continue intensity into the hole, transport priors continue edges, and exemplar priors copy texture from elsewhere in the image. The craft is reading the hole, what structures it interrupts and how wide it is, and choosing the belief that can actually bridge it.

Denoising (Section 7.2) had noisy data at every pixel; deconvolution (Section 7.3) had blurred data at every pixel. Inpainting has no data where it matters: a scratch, a logo, a dust speck, a dead sensor column, or an ex-partner has replaced a region of the image outright. The term comes from art conservation, where restorers fill losses in a painting by extending the surrounding work into the gap, and the algorithmic versions follow the same ethic: the fill must be inferred from what survives, and the seams must not show.

1. Holes, Masks, and Honest Ground Rules Beginner

Every inpainting algorithm consumes two inputs: the damaged image and a binary mask marking which pixels are missing (in OpenCV's convention, nonzero mask pixels are the hole). The algorithm never touches pixels outside the mask, which means the mask is a contract: anything damaged but unmasked will be treated as gospel truth and propagated into the fill. This makes mask preparation half the job. Masks come from thresholding when the damage has a distinctive color (the methods of Chapter 2), from manual annotation, or, later in the book, from segmentation models. Whatever the source, the standard finishing move is a morphological dilation from Chapter 6: damage almost always bleeds a pixel or two beyond its visible core (JPEG halos around a logo, scattering around a scratch), and an under-mask poisons the fill with contaminated boundary values. Over-masking by two pixels costs almost nothing; under-masking by one ruins the result. Code 7.4.1 manufactures a test case so we can score every method against known truth.

import cv2
import numpy as np
from skimage import data
from skimage.metrics import peak_signal_noise_ratio as psnr

rng = np.random.default_rng(seed=7)
clean = data.camera()                                   # uint8, 512x512

# Synthesize scratch damage: random thin lines, then a fat blotch.
mask = np.zeros_like(clean)
for _ in range(12):
    p1 = tuple(rng.integers(0, 512, 2))
    p2 = tuple(rng.integers(0, 512, 2))
    cv2.line(mask, p1, p2, color=255, thickness=2)
cv2.circle(mask, (300, 180), 22, color=255, thickness=-1)

# Dilate the mask so it overshoots the damage slightly (the safe direction).
mask = cv2.dilate(mask, np.ones((3, 3), np.uint8))

damaged = clean.copy()
damaged[mask > 0] = 255                                 # burn the damage in
print(f"damaged fraction: {(mask > 0).mean():.1%}, "
      f"PSNR of damaged image: {psnr(clean, damaged):.1f} dB")
Code 7.4.1: Manufacturing ground-truth damage: twelve scratches and one blotch, burned to white, with the mask dilated by one ring of pixels. Because we kept the clean original, every fill in this section can be scored honestly.

2. Smoothness: Harmonic and Biharmonic Filling Intermediate

The simplest belief about the missing region is that the image varies smoothly across it. Formally: find the fill that satisfies Laplace's equation $\nabla^2 I = 0$ inside the hole, with the surviving pixels on the boundary as fixed boundary conditions. The physical picture is a soap film or stretched membrane: clamp a rubber sheet to the intensity values around the hole's rim and let it relax; the relaxed surface is the fill. The numerical recipe is equally physical: repeatedly replace every hole pixel with the average of its four neighbors until nothing changes, which is exactly the heat-diffusion iteration, intensity leaking inward from the rim until equilibrium. Code 7.4.2 implements it in a dozen lines.

def diffusion_inpaint(img, mask, n_iter=3000):
    """Fill masked pixels by iterating Laplace's equation (Jacobi steps)."""
    out = img.astype(np.float64)
    hole = mask > 0
    out[hole] = out[~hole].mean()           # neutral initial guess
    for _ in range(n_iter):
        # four-neighbor average everywhere, applied only inside the hole
        avg = (np.roll(out, 1, 0) + np.roll(out, -1, 0) +
               np.roll(out, 1, 1) + np.roll(out, -1, 1)) / 4.0
        out[hole] = avg[hole]
    return np.clip(out, 0, 255).astype(np.uint8)

filled_d = diffusion_inpaint(damaged, mask)
print(f"diffusion inpaint PSNR: {psnr(clean, filled_d):.1f} dB")
Code 7.4.2: Harmonic inpainting from scratch: clamp the rim, relax the interior. The iteration count scales with the squared width of the widest hole, since information diffuses inward roughly one pixel per step.

On the thin scratches this works almost perfectly: a two-pixel gap in smooth content is bridged invisibly. On the 44-pixel blotch it fails in an instructive way. The membrane is maximally smooth, so the fill is a featureless gradient: no texture, no edges, a soft gray cloud where the wall's brick pattern should continue. Worse, an edge that hits the hole's rim stops dead, because Laplace's equation has no concept of "continue this line." The biharmonic upgrade ($\nabla^4 I = 0$, a stiff plate instead of a floppy membrane) matches boundary gradients as well as values, extending edges a little way into the hole before sagging, and is what scikit-image implements.

Library Shortcut: skimage.restoration.inpaint_biharmonic and cv2.inpaint

Both libraries reduce this section's algorithms to one call:

from skimage.restoration import inpaint_biharmonic

# Stiff-plate smoothness (best quality among smoothness priors, slowest)
filled_b = inpaint_biharmonic(damaged, mask > 0)

# OpenCV: both classical fast methods, milliseconds each
filled_ns = cv2.inpaint(damaged, mask, inpaintRadius=3, flags=cv2.INPAINT_NS)
filled_te = cv2.inpaint(damaged, mask, inpaintRadius=3, flags=cv2.INPAINT_TELEA)
Code 7.4.3: Three library inpainters: biharmonic smoothness, Navier-Stokes isophote transport, and Telea fast marching.

The 12 lines of Code 7.4.2 (plus the hundreds you would need for biharmonic boundary handling) become 1 line per method. Internally the libraries solve the PDEs with proper sparse linear algebra rather than naive Jacobi sweeps, handle color channels jointly, and run orders of magnitude faster; cv2.inpaint processes a 512x512 image in a few milliseconds, fast enough for video.

3. Continuing Structure: Isophotes and Fast Marching Intermediate

The 2000 paper that imported the word "inpainting" into vision (Bertalmio, Sapiro, Caselles, and Ballester) fixed smoothness filling's blindness to edges. Their observation: what a human restorer continues into a gap is not intensity but isophotes, the level curves of constant brightness, which run perpendicular to the gradient, along the direction $\nabla^{\perp} I$. Their PDE transports boundary information into the hole along those curves, so an interrupted edge sails across the gap instead of stopping at the rim. The mathematics turned out to mirror fluid dynamics, with image intensity playing the role of a stream function, which is why OpenCV's implementation flag is cv2.INPAINT_NS, for Navier-Stokes.

Telea's 2004 method, OpenCV's other flag, gets most of the same benefit with much simpler machinery. It fills the hole strictly from the rim inward, like rust eating into metal, processing pixels in order of their distance from the boundary (computed by the fast marching method, a cousin of the distance transforms in Chapter 6). Each pixel is estimated as a weighted average of its already-known neighbors, with weights favoring neighbors whose direction aligns with the advancing front's normal and those closer by, so directional structure is respected without solving any PDE. Figure 7.4.1 lays the three classical strategies side by side; Code 7.4.4 races them on our test case.

1. Smoothness (diffusion) intensity leaks inward from the rim; result is smooth, edges stop dead 2. Transport (NS / Telea) fill marches inward in distance order; isophotes carry the edge across the gap 3. Exemplar (copy texture) best-matching patch is copied whole; texture survives, structure goes first
Figure 7.4.1: The three classical inpainting strategies on the same hole (red dashed). Diffusion (left) relaxes intensity inward and produces smooth fills. Transport methods (center) process pixels in fast-marching order (blue fronts) and continue interrupted isophotes, so the dark edge crosses the gap. Exemplar methods (right) copy whole patches from a source region elsewhere in the image, reproducing texture that the other two cannot invent.
candidates = {
    'diffusion (scratch-built)': filled_d,
    'biharmonic (skimage)':      (np.clip(filled_b * 255, 0, 255)
                                  .astype(np.uint8)),
    'navier-stokes (cv2)':       filled_ns,
    'telea (cv2)':               filled_te,
}
for name, img in candidates.items():
    print(f"{name:26s} PSNR = {psnr(clean, img):.1f} dB")
Code 7.4.4: Scoring all four fills against the ground truth we kept in Code 7.4.1. Note that inpaint_biharmonic returns floats in [0, 1] and must be rescaled before comparison with the uint8 original.
diffusion (scratch-built)  PSNR = 36.2 dB
biharmonic (skimage)       PSNR = 37.0 dB
navier-stokes (cv2)        PSNR = 36.6 dB
telea (cv2)                PSNR = 36.8 dB
Output 7.4.4a: A typical run. The scores bunch tightly because 90 percent of our damage is thin scratches, which every method bridges well; the numbers are dominated by the easy pixels. Crop to the blotch and the ranking spreads: the smoothness methods leave a soft gray cloud, the transport methods a streaky but sharper patch. The metric, as the next callout argues, is asking the wrong question anyway.
Key Insight: Inpainting Is Judged by Plausibility, Not Fidelity

For denoising, PSNR against ground truth is a fair score: there is one right answer and the method should approach it. For a large hole there are many right answers; nobody, including the metric, knows which bricks were behind the blotch. A fill can be pixel-wise far from the original yet perfectly convincing, and PSNR actively prefers the blurry "average of all plausible fills" over any single crisp one, the same blur-rewarding bias Section 7.1 warned about, now decisive. The honest evaluation for inpainting is: can an unwarned viewer find the hole? This shift, from fidelity to plausibility, is the conceptual hinge on which restoration swings toward generation, and it is why generative models eventually owned this problem. Carry the thought to Chapter 37, where scoring plausibility becomes a discipline of its own.

4. Copying Texture: Exemplar-Based Inpainting Advanced

PDE and marching methods share a fatal limitation: they synthesize the fill from boundary values, so they can only produce smooth or smoothly streaked content. A hole in a brick wall, a lawn, or a knit sweater needs texture, and the only place texture demonstrably matching this image exists is the image itself, the same self-similarity that powered non-local means in Section 7.2. Exemplar-based inpainting therefore fills holes by copying whole patches from the intact region, as in the right panel of Figure 7.4.1.

The landmark formulation (Criminisi, Perez, and Toyama, 2004) realized that copy order decides everything. Fill the easy flat areas first and interrupted edges get walled off, never to reconnect; the fill looks patched. Their algorithm scores every patch on the hole's rim with a priority $P(p) = C(p) \cdot D(p)$: a confidence term $C$ measuring how much already-known content the patch contains, and a data term $D$ that spikes where a strong isophote hits the boundary head-on. The product makes the algorithm extend structure first, exactly like the transport methods, then flood texture into the remaining areas. One iteration runs: pick the rim patch with the highest priority, search the known region for its best match (masked sum of squared differences over the known pixels only), copy the matching patch's pixels into the unknown ones, update confidences, repeat until the hole is gone. Structure propagates, then texture; large holes in textured scenes come out looking startlingly intact.

We will not build Criminisi from scratch: a faithful implementation is a few hundred lines of boundary bookkeeping, and the algorithmic ideas are all stated above. What turned the approach from paper to product was PatchMatch (Barnes et al., 2009), a randomized algorithm that finds approximate nearest-neighbor patches orders of magnitude faster than exhaustive search by exploiting the coherence of natural images: good matches for neighboring patches are themselves neighbors. PatchMatch became the engine of Photoshop's Content-Aware Fill in 2010, which is the form in which a billion users met exemplar inpainting without learning its name.

5. Choosing by Hole Geometry Beginner

With three families in hand, selection becomes a matter of reading the hole: what does it interrupt, and how wide is it relative to the structures it cuts? Table 7.4.1 condenses the decision into the form it takes in practice.

Table 7.4.1: Matching the inpainting method to the hole.
HoleTypical sourceBest classical methodWhy
Thin lines (1-5 px)scratches, wires, hairs, text overlaysTelea or Navier-Stokesboundary information bridges the gap before it can drift
Small blobs in smooth areasdust, dead pixels, watermarks on skybiharmonicsmoothness is the truth here, and gradients are matched
Medium holes in textureobject removal in grass, fabric, brickexemplar (Criminisi / PatchMatch)only copying can manufacture believable texture
Large holes crossing structuresremoving a person, a car, a buildingnone of the aboverequires semantic invention: the generative methods of Chapter 35
Practical Example: Forty Thousand Negatives and One Summer

Who: The digitization lead at a regional newspaper archive.

Situation: Forty thousand press negatives from 1935 to 1980, being scanned for a public history portal. Decades of handling left fine scratches on nearly every frame, plus mold spots on a few thousand.

Problem: Manual retouching averaged eleven minutes per image; the grant budget covered a summer, not a decade. An intern's first automated attempt ran biharmonic inpainting with masks thresholded from the scans' infrared channel, and it erased scratches beautifully while leaving telltale soft smudges across faces and architectural detail, exactly where mold spots were wide.

Decision: Triage by mask geometry, computed with the shape statistics of Chapter 6: masks under 4 pixels wide (98 percent of all damage) went to cv2.inpaint with the Telea flag, dilated by one pixel; wider mold spots were routed to an exemplar-based fill, and the few hundred frames where holes crossed faces went to the lone human retoucher.

Result: Throughput reached 9 seconds per frame on the automated path. Public-facing spot checks found no complaints about visible repairs; the human queue finished within the summer.

Lesson: Inpainting at scale is a routing problem. Classify each hole, send it to the cheapest method that can plausibly bridge it, and reserve humans (or, today, generative models) for the holes that interrupt meaning rather than texture.

6. From Copying to Imagining Intermediate

Every method in this section is bounded by the same wall: the fill can only contain what the surviving image already shows. Smoothness methods extrapolate intensity, exemplar methods recombine existing patches, but none can reason that the missing region behind a removed person probably contained the continuation of a doorway, because "doorway" is not a concept any of them possess. Crossing that wall requires a model of what the world looks like, learned from millions of images rather than borrowed from one, and that is precisely the road the book travels: segmentation models in Chapter 24 will generate the masks automatically, and the generative inpainters of Chapter 35 will fill them with invented, semantically coherent content. When you get there, notice how much survives: masks still get dilated, fill order still matters (diffusion models fill coarse to fine), and the plausibility-over-fidelity principle of this section becomes the entire evaluation story.

Research Frontier: Inpainting in the Generative Era

The bridge from this section to modern practice has three spans. LaMa (Suvorov et al., WACV 2022) showed that fast Fourier convolutions, giving every layer the image-wide receptive field that exemplar search always had, let a feed-forward network fill very large holes; it remains a production workhorse because it is deterministic and cheap. RePaint (Lugmayr et al., CVPR 2022) demonstrated that an unmodified diffusion model can inpaint by repeatedly renoising and resampling the hole while clamping the known pixels, pure Bayesian conditioning with no retraining. The 2024 wave made diffusion inpainting controllable and product-grade: BrushNet (Ju et al., ECCV 2024) adds a dedicated branch that injects masked-image features into a frozen diffusion backbone, and PowerPaint (Zhuang et al., ECCV 2024) uses learnable task prompts so one model handles object removal, insertion, and outpainting. Commercial tools like Photoshop's Generative Fill put this in everyone's hands, which sharpened the forensic question this chapter keeps returning to: a Telea fill provably contains only local boundary information, while a generative fill contains a model's opinion, and the difference decides what counts as evidence.

Fun Fact

The most-viewed inpainting results in history are weather forecasts. The smooth, gap-free radar and satellite maps on the evening news are routinely inpainted: ground radar has cone-of-silence holes directly above each station and terrain shadows behind mountains, and the filling is done by methods directly descended from the diffusion and transport algorithms of this section, applied to precipitation fields instead of photographs. Millions of people see PDE inpainting every night and call it the weather.

Exercise 7.4.1: Read the Hole Conceptual

For each task, choose a method from Table 7.4.1 and defend the choice in one sentence by naming what the hole interrupts: (a) removing timestamp text burned into the corner of dashcam footage; (b) erasing a power line crossing a sunset sky; (c) removing a parked car that occludes a hedge and part of a brick wall; (d) repairing a dead 1x512 sensor column in thousands of microscope images; (e) deleting a photobombing stranger standing in front of a crowded bookshelf.

Exercise 7.4.2: Criminisi-Lite Coding

Implement a simplified exemplar inpainter: at each step, take the rim patch with the highest confidence (skip the data term), find its best match over a coarse grid of candidate source patches using masked SSD, and copy the unknown pixels. Run it on a 9x9-pixel-patch scale against a texture image (try skimage.data.grass()) with a 60x60 hole, and compare visually against cv2.inpaint with the Telea flag. Then add the data term $D(p) = |\nabla^{\perp} I \cdot n_p|$ and show one example where the fill order visibly changes the result.

Exercise 7.4.3: When the Metric Lies Analysis

Cut a 80x80 hole from a textured region of an image. Fill it with (a) biharmonic inpainting and (b) your exemplar inpainter from Exercise 7.4.2. Compute PSNR and SSIM for both against the original, then show both crops to three people and ask which repair they can spot. Report the (likely) disagreement between metric and humans, and explain it using the Key Insight callout: what exactly is PSNR averaging over, and why does that favor the blurry fill?