Section 7.3: Deblurring & Deconvolution: Wiener & Richardson-Lucy

"You want me to divide by my smallest frequencies? That is how a whisper of noise becomes a scream, and I will be telling everyone it was your idea."
A Cautiously Regularized Wiener Filter

Big Picture

Deconvolution is division in the frequency domain, and the entire craft lies in refusing to divide where the divisor is tiny. Blur multiplies each spatial frequency of the image by a known factor; undoing it means dividing each frequency back out. But blur crushes high frequencies toward zero while noise does not, and the naive quotient amplifies that noise without bound. The Wiener filter and Richardson-Lucy iteration are the two classical answers: one a closed-form frequency-by-frequency compromise, the other a patient iterative bargain with photon statistics.

Section 7.2 fought the additive term in the degradation model $g = h \ast f + n$. This section fights the convolution. Blur is in some ways a kinder enemy than noise: it is deterministic, it destroys no pixels outright, and if you know the kernel $h$ exactly, the damage is in principle reversible. The phrase "in principle" is where three decades of engineering live, because the reversal is the most spectacularly ill-posed operation in this chapter, and noise, even a whisper of it, is what makes it so. The stepping-stone illustration below previews the whole danger: dividing where the blur has crushed a frequency to almost nothing is a leap no careful filter will make.

A careful cartoon hiker crosses a canyon on stepping stones, striding over solid high stones but wisely stepping aside where the stones shrink to almost nothing, while a reckless twin who leaps onto a tiny stone is flung skyward in a burst of noise specks, depicting the Wiener filter recovering frequencies where signal is strong but refusing to divide where the blur has crushed them near zero. — Dividing by a frequency the blur nearly erased is how a whisper of noise becomes a scream; the smart filter simply refuses to make that jump.

1. A Gallery of Point Spread Functions Beginner

Before you can undo a blur you have to name its exact shape, and that shape is rarely the soft Gaussian you might guess: a defocused lens, a shaking hand, and a hazy atmosphere each leave a different fingerprint, and reaching for the wrong one ruins the deconvolution before it starts. The blur kernel $h$ has a physical name, the point spread function: it is literally the image that a single point of light produces after passing through your optics. Section 7.1 placed it in the degradation pipeline; here we meet the three shapes that cover most of practice. A defocused lens spreads a point into a filled disk (the aperture's shape, which is why out-of-focus highlights look like little polygons matching the iris blades). Camera shake or subject motion during the exposure drags the point along a path, in the simplest case a straight line whose length and angle encode the motion. And the combined small imperfections of atmosphere, diffraction, and lens design aggregate into something well approximated by the Gaussian kernel of Chapter 3. Code 7.3.1 builds the motion case, which we will use as the running example.

# Build a motion-blur PSF, then apply it to a clean image plus faint noise.
# fftconvolve gives the h*f term; the tiny noise added below is the n term.
import numpy as np
from scipy.signal import fftconvolve
from skimage import data, img_as_float
from skimage.metrics import peak_signal_noise_ratio as psnr

def motion_psf(length=15, angle_deg=30, size=31):
    """Line PSF: a 1-pixel-wide segment of given length and angle,
    normalized to sum to 1 (blur must conserve brightness)."""
    psf = np.zeros((size, size))
    c = size // 2
    a = np.deg2rad(angle_deg)
    for t in np.linspace(-length / 2, length / 2, 4 * length):
        i = int(round(c + t * np.sin(a)))
        j = int(round(c + t * np.cos(a)))
        psf[i, j] = 1.0
    return psf / psf.sum()

rng = np.random.default_rng(seed=7)
clean = img_as_float(data.camera())
psf = motion_psf()
blurred = fftconvolve(clean, psf, mode='same')
noisy_blurred = np.clip(blurred + rng.normal(0, 0.005, clean.shape), 0, 1)
print(f"blurred PSNR vs clean: {psnr(clean, noisy_blurred):.1f} dB")

Code 7.3.1: Synthesizing motion blur with a line-shaped PSF, plus a deliberately faint amount of sensor noise ($\sigma = 0.005$, invisible to the eye). Keep that tiny noise in mind; it is about to matter enormously.

In real systems you obtain the PSF rather than assume it: astronomers read it off stars (genuine point sources), microscopists image sub-resolution fluorescent beads, and lens labs photograph backlit pinholes or slanted edges. When none of that is possible, the PSF must be estimated from the blurred image itself, the "blind" problem we defer to the blind deconvolution discussion below. For now, assume $h$ is known and exact.

2. The Inverse Filter Catastrophe Intermediate

The convolution theorem from Chapter 4 turns the degradation model into per-frequency multiplication: $G(u,v) = H(u,v)\,F(u,v) + N(u,v)$, where capital letters are Fourier transforms. Solving for the unknown image looks like one line of algebra:

$$ \hat F(u, v) = \frac{G(u, v)}{H(u, v)} = F(u, v) + \frac{N(u, v)}{H(u, v)}. $$

The first term is exactly what we want, the true image, fully recovered. The second term is the catastrophe. A blur kernel is an averaging operation, so $H$ falls toward zero at high frequencies, and a line-shaped motion PSF is worse: its transform is a sinc-like pattern that crosses zero repeatedly throughout the frequency plane. At every such frequency we are dividing the noise, which is flat and indifferent, by a number arbitrarily close to zero. Figure 7.3.1 sketches the disaster and the remedy to come.

Figure 7.3.1: Why naive deconvolution fails and how Wiener fixes it. The blur transfer function $|H(u)|$ (blue) decays and repeatedly crosses zero for motion blur. Wherever it dips below the noise floor (shaded bands), the inverse gain $1/|H|$ (red) explodes and multiplies pure noise. The Wiener gain (green) follows the inverse where the signal dominates and rolls smoothly to zero where noise does, surrendering exactly the frequencies that cannot be saved.

Code 7.3.2 commits the naive inversion so you can watch it fail. The noise we added is half a percent of the intensity range, far below visibility, yet the output is unrecognizable static. This is not an implementation bug; it is the textbook definition of an ill-posed inverse problem, and the single most important demonstration in this chapter.

def fft_of_psf(psf, shape):
    """Zero-pad the PSF to image size, center it, return its 2D FFT."""
    padded = np.zeros(shape)
    k = psf.shape[0]
    padded[:k, :k] = psf
    # roll so the PSF center sits at (0, 0), as the FFT convention expects
    padded = np.roll(padded, (-(k // 2), -(k // 2)), axis=(0, 1))
    return np.fft.fft2(padded)

H = fft_of_psf(psf, clean.shape)
G = np.fft.fft2(noisy_blurred)

F_naive = G / H                          # the one-line "solution"
restored_naive = np.real(np.fft.ifft2(F_naive))
print(f"inverse filter PSNR: {psnr(clean, np.clip(restored_naive, 0, 1)):.1f} dB")

Code 7.3.2: Naive inverse filtering of the faintly noisy motion blur from Code 7.3.1. Note the PSF centering roll; forgetting it shifts the restored image by half the kernel size, a classic deconvolution bug.

inverse filter PSNR: 5.8 dB

Output 7.3.2a: A typical run. The blurred input scored about 24 dB against the clean image; "restoring" it scored under 6 dB, worse than replacing the image with gray. Invisible noise, amplified at the zero crossings of $H$, has buried the picture completely.

3. The Wiener Filter: Divide Only Where It Pays Intermediate

The fix is to stop demanding exact inversion and ask instead for the linear filter that minimizes the expected squared error of the restoration, given the statistics of signal and noise. Norbert Wiener solved that problem in general; for our model the answer replaces $1/H$ with

$$ W(u, v) = \frac{H^*(u, v)}{|H(u, v)|^2 + K(u, v)}, \qquad K(u, v) = \frac{S_n(u, v)}{S_f(u, v)}, $$

where $H^*$ is the complex conjugate, $S_n$ and $S_f$ are the power spectra (how much energy the noise and the clean image each carry at a given frequency), and $K = S_n/S_f$ is therefore the noise-to-signal power ratio at each frequency. Read it as a negotiation. Where the blurred signal towers over the noise ($|H|^2 \gg K$), the filter behaves like the exact inverse $1/H$. Where blur has crushed the signal below the noise floor ($|H|^2 \ll K$), the denominator is dominated by $K$ and the gain glides to zero: those frequencies are abandoned rather than amplified, which is precisely the green curve of Figure 7.3.1. In practice nobody knows the true power spectra, so $K$ is collapsed to a single constant and tuned, which makes it this method's one honest knob. Code 7.3.3 implements the whole thing in six lines and sweeps the knob.

Collapsing $K$ to a constant is less crude than it looks, and seeing why connects the Wiener filter to the regularization language that runs through the rest of this book. There are two ways to do better than a constant, and they meet in the middle. The first is to honor the real $S_f$: natural images have a power spectrum that falls off as roughly $S_f(u,v) \propto 1/(u^2 + v^2)$, the famous $1/f^2$ law, because most image energy lives at low frequencies and edges contribute a heavy high-frequency tail. Plugging that prior in makes $K(u,v) = S_n/S_f \propto (u^2+v^2)$ rise with frequency, so the filter regularizes high frequencies harder than low ones, which is exactly the behavior you want and strictly better than a flat constant. The second route arrives at the same place from optimization rather than statistics. Instead of minimizing expected error, demand the sharpest restoration consistent with the data, $\min_f \lVert Cf \rVert^2$ subject to $\lVert g - h \ast f\rVert^2 = \lVert n \rVert^2$, where $C$ is a high-pass operator (typically the Laplacian of Chapter 3) penalizing roughness. This constrained least squares, or Tikhonov, problem has the closed-form frequency solution

$$ \hat F(u,v) = \frac{H^*(u,v)}{|H(u,v)|^2 + \gamma\,|C(u,v)|^2}\, G(u,v), $$

which is the Wiener filter with the substitution $K(u,v) = \gamma\,|C(u,v)|^2$. The constant-$K$ Wiener filter is thus the special case $C = I$ (penalize total energy equally at all frequencies); choosing $C$ to be the Laplacian recovers the frequency-rising $K$ that the $1/f^2$ image prior also predicts, from a completely different argument. The single Lagrange multiplier $\gamma$ is the same knob as $K$, now with a precise meaning: it trades the residual norm against the roughness penalty. This is the first appearance of a pattern the book returns to constantly, a data-fidelity term plus a regularizer weighted by one scalar, which is the skeleton of total-variation deconvolution, the MAP estimates later in this section, and the score-based priors of Chapter 33.

def wiener_deconv(g, H, K):
    """Wiener deconvolution with a constant noise-to-signal ratio K."""
    G = np.fft.fft2(g)
    W = np.conj(H) / (np.abs(H) ** 2 + K)
    return np.clip(np.real(np.fft.ifft2(W * G)), 0, 1)

for K in [1e-1, 1e-2, 1e-3, 1e-4]:
    out = wiener_deconv(noisy_blurred, H, K)
    print(f"K = {K:.0e}   PSNR = {psnr(clean, out):.1f} dB")

Code 7.3.3: Wiener deconvolution from scratch. The constant $K$ stands in for the unknown noise-to-signal ratio; sweeping it shows the regularization tradeoff directly.

K = 1e-01   PSNR = 24.8 dB
K = 1e-02   PSNR = 26.7 dB
K = 1e-03   PSNR = 28.4 dB
K = 1e-04   PSNR = 26.0 dB

Output 7.3.3a: A typical run (values shift with seed and PSF). Too much regularization ($K = 10^{-1}$) barely deblurs; too little ($K = 10^{-4}$) lets noise back in, with faint ripples echoing the PSF's zero crossings; the sweet spot recovers several decibels over the blurred input. Compare this curve to the catastrophic Output 7.3.2a, which is the $K \to 0$ limit.

The visual signature of Wiener restoration is worth knowing: crisp recovered edges accompanied by faint ringing, ripples parallel to strong edges at the frequencies the filter gave up on. Ringing is not a bug but a receipt, the visible cost of the frequencies that were surrendered to noise. Heavier-handed priors (total variation deconvolution, which penalizes gradient magnitude, is the classic upgrade) trade ringing for staircase flatness; every deconvolution method leaves a fingerprint.

Library Shortcut: skimage.restoration.wiener and unsupervised_wiener

scikit-image ships both the classical filter and a variant that estimates the regularization automatically from the data, removing the manual $K$ sweep:

from skimage.restoration import wiener, unsupervised_wiener

restored_w = wiener(noisy_blurred, psf, balance=0.01)
restored_u, _ = unsupervised_wiener(noisy_blurred, psf)   # tunes itself

Code 7.3.4: Library Wiener deconvolution; unsupervised_wiener uses a Gibbs sampler to estimate the regularization, trading compute for one less knob.

About 25 lines of FFT bookkeeping (padding, centering, conjugates, clipping, the $K$ sweep) become 1 or 2. Internally the library also applies edge tapering to suppress the boundary artifacts that circular convolution otherwise wraps around the frame, a detail our from-scratch version quietly ignores and real images quietly punish.

4. Richardson-Lucy: Deconvolution for Photon Counters Advanced

The Wiener filter assumes additive Gaussian noise, but Section 7.1 taught that photon-limited images, astronomy, fluorescence microscopy, low-dose X-ray, obey Poisson statistics, where the noise scales with the signal. Richardson (1972) and Lucy (1974), working independently, derived the maximum-likelihood estimate for exactly that case. The result is not a closed form but a beautifully simple fixed-point iteration:

$$ f^{(k+1)} = f^{(k)} \cdot \left( h^{\dagger} \ast \frac{g}{h \ast f^{(k)}} \right), $$

where $h^{\dagger}$ is the PSF flipped in both axes. The loop reads like a conversation. Blur your current guess ($h \ast f^{(k)}$) and compare it to what the camera actually saw, as a ratio: 1 where the guess explains the data, above 1 where the guess is too dim, below 1 where too bright. Smear that ratio backward through the optics ($h^{\dagger} \ast$) so each source pixel hears about the mismatches it could have caused, then scale the guess multiplicatively. The multiplicative update has two elegant consequences: a nonnegative guess stays nonnegative forever (no negative light, a constraint Wiener cheerfully violates), and total flux is conserved, which is why astronomers trust it for photometry. Code 7.3.5 is the entire algorithm.

def richardson_lucy_scratch(g, psf, n_iter=30):
    """Richardson-Lucy deconvolution, the whole algorithm."""
    f = np.full(g.shape, 0.5)                 # flat, positive initial guess
    psf_flipped = psf[::-1, ::-1]
    for _ in range(n_iter):
        estimate = fftconvolve(f, psf, mode='same')
        ratio = g / (estimate + 1e-12)        # observed vs predicted
        f = f * fftconvolve(ratio, psf_flipped, mode='same')
    return np.clip(f, 0, 1)

for n in [5, 15, 30, 80]:
    out = richardson_lucy_scratch(noisy_blurred, psf, n_iter=n)
    print(f"{n:3d} iterations   PSNR = {psnr(clean, out):.1f} dB")

Code 7.3.5: Richardson-Lucy in eight lines. The 1e-12 guards division where the prediction is zero; the iteration-count sweep exposes the algorithm's central tradeoff.

Common Misconception: The Backward Step Convolves; It Does Not

Students reading the update often skip past the psf[::-1, ::-1] flip, assuming the backward pass is "just another convolution" with the same kernel. It is not: pushing the residual back through the optics is a correlation with the PSF, which equals a convolution with the kernel flipped in both axes ($h^{\dagger}$). The two operations coincide only when the PSF is symmetric (a centered Gaussian or disk), which is exactly why the bug hides: it works on your symmetric test kernel and silently shifts or skews the result the moment you use an asymmetric one such as the motion line of Code 7.3.1. The same correlation-versus-convolution distinction returns with a vengeance in Chapter 19, where the "convolution" layer of a CNN is implemented as correlation (no flip); the framework drops the flip on purpose because the kernel is learned, so its orientation does not matter. Here the kernel is fixed physics, and the flip is mandatory.

  5 iterations   PSNR = 25.9 dB
 15 iterations   PSNR = 27.8 dB
 30 iterations   PSNR = 28.3 dB
 80 iterations   PSNR = 26.9 dB

Output 7.3.5a: A typical run showing semi-convergence: quality rises, peaks, then degrades as further iterations begin fitting the noise. The rising phase restores genuine structure; past the peak, the maximum-likelihood machinery starts explaining individual noisy photons with speckle artifacts.

Semi-convergence is the Richardson-Lucy version of the Wiener knob: the iteration count is the regularization. Stop early and you accept residual blur; run long and you amplify noise, just more gracefully than the inverse filter did. Production variants add explicit damping or a total-variation penalty inside the loop so the iteration can run to convergence safely. The library one-liner, skimage.restoration.richardson_lucy(noisy_blurred, psf, num_iter=30), implements the same loop with the same semantics, and is the version to use outside teaching.

Fun Fact

When the Hubble Space Telescope reached orbit in 1990 with its famously mis-ground mirror, its PSF was not a neat point: spherical aberration scattered most of the light from each star into a broad halo a few arcseconds across instead of concentrating it in the intended sub-arcsecond core. The silver lining: a space telescope photographs point sources for a living, so the PSF was known exquisitely well, turning a hardware disaster into the best-posed deconvolution problem in history. Richardson-Lucy and friends extracted publishable science for three years until the 1993 repair mission, and NASA workshops from that era turned deconvolution from an astronomical niche into standard imaging practice. The lesson generalizes: knowing $h$ precisely is worth more than any algorithmic cleverness.

Key Insight: The PSF Is Data. Go Measure It

Every method in this section assumed $h$ was known, and every practical success story of classical deconvolution (Hubble, microscopy, industrial inspection) earned that assumption by measuring the PSF: stars, beads, pinholes, calibration targets, encoder logs. Deconvolving with a slightly wrong PSF fails much faster than deconvolving with a slightly wrong $K$ or iteration count; the exercises quantify how fast. When your deblurring disappoints, spend your next hour improving the PSF estimate, not swapping algorithms. The cheapest decibels are always in $h$.

5. Blind Deconvolution and the Real World Advanced

When the PSF cannot be measured, it must be estimated from the blurred image itself, jointly with the image: the blind deconvolution problem, which is doubly ill-posed (a sharp image with wide blur and a blurry image with narrow blur explain the data equally well; so does the original photo of an intrinsically blurry scene). Classical practice tames it with structure. For straight-line motion blur, the PSF has only two parameters, length and angle, and both can be read off the image's power spectrum: the blur's periodic zero crossings (Figure 7.3.1) leave measurable notches there. The standard tool for finding them is the cepstrum, the Fourier transform of the log power spectrum, which turns those periodic notches into isolated peaks that are easy to locate. For defocus, the disk radius is a one-parameter search scored by how natural the deconvolution result looks. General blind methods alternate between updating the image given the PSF and the PSF given the image, each step a constrained Richardson-Lucy-like update; they work, fragilely, and they are the part of this section that deep learning improved first and most.

Practical Example: The Conveyor Belt That Outran Its Camera

Who: A machine-vision engineer at a bottling plant integrator.

Situation: A label-inspection camera photographed bottles on a conveyor moving at fixed speed. A production-line upgrade doubled the belt speed, and suddenly every label image carried 12 pixels of horizontal motion blur. OCR accuracy on lot codes collapsed from 99.6 to 81 percent.

Problem: The "proper" fixes were rejected: a faster shutter needed lighting the plant would not buy, and retraining the OCR on blurry images needed months of data collection. The team had weeks.

Decision: The blur here is the dream case: pure horizontal motion whose length is exactly computable from belt speed (from the PLC encoder) times exposure time (from the camera config). The engineer generated the line PSF analytically per Code 7.3.1 and inserted a Wiener deconvolution (Code 7.3.4, with edge tapering) before OCR, tuning $K$ once on a saved set of blurred lot codes.

Result: OCR recovered to 99.2 percent. Total compute cost: one FFT round trip per image, about 4 ms, invisible in the line's cycle time.

Lesson: Deconvolution with a known PSF is cheap, fast, and nearly magical; deconvolution with a guessed PSF is a research project. When the physics hands you $h$ (encoders, shutter timings, optics specs), take it.

Research Frontier: Deblurring After the Classical Era

Learned deblurring now dominates benchmarks: NAFNet (Chen et al., ECCV 2022) showed a stripped-down network without activation functions setting records on the GoPro motion-blur benchmark, and Restormer (CVPR 2022) remains the standard transformer baseline. The diffusion wave reframed deblurring as posterior sampling: HI-Diff (Chen et al., NeurIPS 2023) couples a compact latent diffusion prior to a regression backbone, while blind restorers like DiffBIR (ECCV 2024) and SUPIR (CVPR 2024) treat blur as just one ingredient of compound degradation, inverting it with a generative prior strong enough to invent plausible detail. That strength is the new caveat: a Wiener filter can only redistribute the evidence in the photons, but a diffusion deblurrer can manufacture a sharp face that was never there, which is why forensic and scientific users still reach for the methods of this section, and why Chapter 33 revisits deconvolution as a guided-sampling problem. Meanwhile your phone quietly runs learned deblurring on every shot; features like Google Photos' Unblur put a consumer button on this entire section.

Exercise 7.3.1: Limits of the Compromise Conceptual

(a) Show from the Wiener formula that $K \to 0$ recovers the inverse filter and large $K$ approaches a scaled matched filter (gain proportional to $H^*$). (b) Explain why no setting of $K$ can recover a frequency at which $H(u,v) = 0$ exactly, and what that implies about the maximum information any deconvolution method can extract. (c) Motion blur of length $L$ has spectral zeros spaced $1/L$ apart; what does this predict about deblurring long versus short exposures?

Exercise 7.3.2: Map the Semi-Convergence Coding

Using Code 7.3.5, run Richardson-Lucy for 200 iterations on the motion-blurred image at three noise levels ($\sigma \in \{0.001, 0.005, 0.02\}$), recording PSNR every 5 iterations. Plot the three curves on one chart. Where does each peak? Fit the relationship between noise level and optimal iteration count, and explain it: why should noisier data demand earlier stopping?

Exercise 7.3.3: How Wrong Can the PSF Be? Analysis

Blur an image with a 15-pixel, 30-degree motion PSF. Now deconvolve (Wiener, best $K$) with deliberately wrong PSFs: every combination of length $\{11, 13, 15, 17, 19\}$ and angle $\{20, 25, 30, 35, 40\}$ degrees. Render the 5x5 PSNR grid as a heatmap. Which parameter is the restoration more sensitive to, and by how much? Reconcile your finding with the Key Insight callout: how accurately must a blind method estimate each parameter before deconvolution helps rather than harms?