Part I: Image Processing
Chapter 7: Image Restoration & Enhancement

HDR Imaging & Tone Mapping

"The scene speaks in twenty stops, my sensor hears twelve, and your screen repeats eight. I am the simultaneous interpreter who decides what gets lost in translation."

A Diplomatic Tone Mapping Operator
Big Picture

HDR imaging splits one impossible measurement into several possible ones, and tone mapping is the lossy diplomacy that fits the merged result onto a display. Bracketed exposures each capture a different slice of the scene's brightness range; calibration and merging recover a radiance map that is a genuine physical measurement; tone mapping then spends that measurement on an eight-bit rendering, deciding which contrasts survive. Keep the two halves separate in your head: one is metrology, the other is taste.

Every earlier section of this chapter repaired damage inflicted after the light was recorded. This closing section confronts a degradation that happens at the moment of recording and cannot be filtered away afterward: the scene contains a wider range of brightness than the sensor can represent in one exposure. The highlights clip to pure white, carrying no information at all (the inpainting situation of Section 7.4, minus the mask), or the shadows drown beneath the read-noise floor of Section 7.1, or, in the classic interior-with-a-window shot, both at once. The cure is the chapter's recurring move one final time: change the acquisition so the inverse problem becomes solvable, then model your way back to the quantity you actually wanted, which here is the scene's true radiance.

1. The Dynamic Range Gap Beginner

Photographers measure brightness ranges in stops: one stop is a factor of two in light, so a range of $n$ stops spans a ratio of $2^n$. A sunlit scene with open shade can span 17 to 20 stops (a ratio above 100,000:1, and far more if the sun itself is in frame). A good modern sensor captures 12 to 14 stops in a single exposure, bounded above by pixel full-well capacity (the bucket overflows: clipping) and below by the noise floor (the signal drowns: Chapter 1 covered both ends of this pipeline). A typical SDR display renders maybe 8 to 10 stops in practice. Two gaps, then: scene to sensor, and sensor to display. The first is closed by capturing more data; the second cannot be closed at all, only negotiated. Figure 7.6.1 draws both, and the whole section is a walk along its arrows. You can diagnose the problem without any new tools: the histogram skills of Chapter 2 show a scene exceeding the sensor as a histogram slammed against one or both ends, with tall clipping spikes at 0 and 255.

log luminance (stops) scene radiance: ~20 stops long exposure (shadows land in range; highlights clip) middle exposure short exposure (highlights land in range; shadows drown) merged radiance map: full scene range, float32 (a measurement) display: ~8 stops (a rendering) bracket: each exposure slides a 12-stop window along the scene range merge + calibrate (Section 2) tone map or fuse (Sections 3, 4)
Figure 7.6.1: The dynamic range gap on a log-luminance axis. The scene (blue, top) spans more stops than any single exposure window (green) can capture, so bracketing slides the window: each exposure records a different slice with overlap. Merging recovers the full-range radiance map (dashed), a physical measurement; tone mapping then compresses it into the display's far narrower budget (red), an aesthetic decision.

2. Radiance from Brackets: Debevec-Malik Advanced

Bracketing is easy: shoot the same scene at, say, 1/400, 1/50, and 1/6 of a second. Merging is subtler than averaging, because pixel values are not light. Every camera applies a nonlinear response curve $f$ between collected exposure and stored value, $Z = f(E \cdot \Delta t)$, where $E$ is the irradiance we want and $\Delta t$ the exposure time. To get back to radiance we need $f^{-1}$, and the elegant result of Debevec and Malik (1997) is that the brackets themselves reveal it. Taking $g = \ln f^{-1}$, each pixel $i$ in each exposure $j$ supplies one linear equation,

$$ g(Z_{ij}) = \ln E_i + \ln \Delta t_j , $$

with the unknowns being the 256 values of $g$ and the per-pixel log irradiances $\ln E_i$. The same scene point photographed at known different $\Delta t$ pins down the curve's shape; a smoothness penalty on $g$ and a hat-shaped weighting $w(Z)$ that distrusts values near 0 and 255 (clipped or noise-drowned, per Section 1) complete a small least-squares problem solvable from a few hundred sampled pixels. With $g$ known, every exposure votes for each pixel's radiance, weighted by how well-exposed that pixel was in that frame. The output is a float32 radiance map proportional to actual scene radiance, the quantity the whole pipeline exists to recover. Code 7.6.1 runs the entire flow in OpenCV.

import cv2
import numpy as np

# Three handheld brackets of the same scene, 2 stops apart.
files = ['win_short.jpg', 'win_mid.jpg', 'win_long.jpg']
images = [cv2.imread(f) for f in files]               # uint8 BGR
times = np.array([1/400, 1/100, 1/25], dtype=np.float32)

# 1. Align: handheld brackets are never perfectly registered.
cv2.createAlignMTB().process(images, images)

# 2. Recover the response curve g from the brackets themselves.
calibrate = cv2.createCalibrateDebevec()
response = calibrate.process(images, times)

# 3. Merge into a float32 radiance map (values proportional to radiance).
merge = cv2.createMergeDebevec()
hdr = merge.process(images, times, response)

cv2.imwrite('scene.hdr', hdr)                         # Radiance RGBE format
print(f"radiance range: {hdr.min():.4f} .. {hdr.max():.1f}, "
      f"ratio {hdr.max() / max(hdr.min(), 1e-6):.0f}:1")
Code 7.6.1: The full radiometric HDR pipeline: align, self-calibrate, merge. The printed ratio routinely exceeds 100,000:1, confirming that the float32 map holds a range no single uint8 image could. Save the .hdr file; it is the measurement, and everything after this point is rendering.

Note what step 2 quietly does: it inverts a stage of the in-camera processing pipeline from Chapter 1 using nothing but the photographs themselves. If you shoot RAW, the response is already linear and calibration is unnecessary; the Debevec step exists because the world runs on JPEGs.

3. Tone Mapping: Spending the Range Intermediate

Now the second gap. Naively normalizing the radiance map to [0, 255] produces a nearly black image with a few bright speckles: radiance is so skewed that the window and the lamp own the entire linear range while the room's contents huddle in the bottom few codes. Tone mapping operators (TMOs) compress intelligently instead, and they split into two families. Global operators apply one monotone curve to every pixel. Reinhard's classic is barely more than one line,

$$ L_d = \frac{L}{1 + L}, $$

applied to suitably scaled luminance $L$: it maps zero to zero, infinity to one, and leaves mid-tones nearly linear, a graceful soft shoulder (Drago's logarithmic operator is the same spirit with a different curve). Global operators are fast, artifact-free, and fundamentally limited: one curve cannot give both the window and the bookshelf generous contrast, because it must be monotone over the whole range. Local operators adapt to neighborhoods, brightening this shadow while preserving detail in that highlight. Durand and Dorsey's classic uses the bilateral filter of Chapter 3 to split the image into a base layer (compressed hard) and a detail layer (preserved fully); Mantiuk's operator works on gradients instead of intensities. Local operators recover far more visible detail and, pushed hard, produce the halos and radioactive "HDR look" that gave the term a bad name in the late 2000s. Code 7.6.2 renders the same radiance map through three operators so the family differences are visible side by side.

def to_uint8(ldr_float):
    return np.clip(ldr_float * 255, 0, 255).astype(np.uint8)

reinhard = cv2.createTonemapReinhard(gamma=2.2)         # global, gentle
drago    = cv2.createTonemapDrago(gamma=2.2, bias=0.85) # global, log curve
mantiuk  = cv2.createTonemapMantiuk(gamma=2.2, scale=0.8)  # local, gradients

for name, tmo in [('reinhard', reinhard), ('drago', drago),
                  ('mantiuk', mantiuk)]:
    ldr = tmo.process(hdr)                # float32 in [0, 1], NaNs possible
    cv2.imwrite(f'tm_{name}.jpg', to_uint8(np.nan_to_num(ldr)))
print("wrote tm_reinhard.jpg, tm_drago.jpg, tm_mantiuk.jpg")
Code 7.6.2: One measurement, three renderings. Compare the JPEGs: the global pair keeps the scene natural but lets the window wash out or the room sink; Mantiuk pulls detail from both at the cost of a flatter, more "processed" character. None is wrong; they are different spending decisions.
Key Insight: The Radiance Map Is the Measurement; the JPEG Is an Opinion

Tone mapping destroys information by design, and there are infinitely many defensible ways to do it. So never archive only the tone-mapped output. The float32 radiance map is the scene's photometric record: it can be re-rendered by any future operator, displayed natively on an HDR screen, or used as physical input for relighting and image-based lighting. This is the same archival logic that makes RAW files precious in Chapter 1, one level higher up the pipeline. Measurement and rendering are different artifacts; keep the measurement.

4. Exposure Fusion: Skipping Radiometry Entirely Intermediate

Step back and ask what the consumer use case actually needs. Nobody photographing their kitchen wants a radiance measurement; they want a nice JPEG where the window and the cabinets are both visible. Mertens, Kautz, and Van Reeth (2007) observed that you can get that directly from the brackets, with no exposure times, no response curve, and no tone mapping. Exposure fusion scores every pixel of every bracket on three qualities: contrast (does it carry local detail, measured by a Laplacian response), saturation (channel standard deviation), and well-exposedness (a Gaussian bump around mid-gray, $\exp(-(z - 0.5)^2 / 2\sigma^2)$), then blends the brackets per pixel with the scores as weights. The blend must be done within a Laplacian pyramid, the multi-scale machinery of Chapter 4, or the weight maps' seams show; pyramid blending hides the transitions across scales. Code 7.6.3 computes the Mertens weights from scratch, both to demystify them and because per-pixel quality maps are a broadly reusable trick.

def mertens_weights(img_u8, sigma=0.2):
    """Per-pixel quality scores for one bracket: contrast * sat * exposedness."""
    img = img_u8.astype(np.float32) / 255.0
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    contrast = np.abs(cv2.Laplacian(gray, cv2.CV_32F))          # local detail
    saturation = img.std(axis=2)                                # colorfulness
    well_exposed = np.exp(-((img - 0.5) ** 2) / (2 * sigma ** 2)).prod(axis=2)
    return contrast * saturation * well_exposed + 1e-12

W = np.stack([mertens_weights(im) for im in images])
W /= W.sum(axis=0, keepdims=True)        # weights sum to 1 at every pixel
# A naive per-pixel blend (visible seams; the pyramid fixes this):
naive = sum(w[..., None] * im.astype(np.float32)
            for w, im in zip(W, images))
cv2.imwrite('fusion_naive.jpg', np.clip(naive, 0, 255).astype(np.uint8))
Code 7.6.3: The three Mertens quality measures and a deliberately naive per-pixel blend. Inspect fusion_naive.jpg closely and you will find soft seams where the dominant bracket switches; the production version blends inside a Laplacian pyramid precisely to erase them.
Library Shortcut: cv2.createMergeMertens

The complete algorithm, weights plus pyramid blending plus normalization, is two lines:

fusion = cv2.createMergeMertens().process(images)     # no times needed!
cv2.imwrite('fusion.jpg', np.clip(fusion * 255, 0, 255).astype(np.uint8))
Code 7.6.4: Production exposure fusion: brackets in, displayable image out, with no exposure metadata required.

Our weight computation plus a correct multi-resolution blend is 80-odd lines; the library call is 2. Internally it builds Gaussian pyramids of the weight maps and Laplacian pyramids of the images, blends level by level, and collapses, the exact construction from Chapter 4. Notice what it never computes: radiance. Fusion produces a rendering directly, which is why it cannot feed photometric applications, and why it powers virtually every phone "HDR" button, where the rendering is the entire point.

5. Ghosts: When the Scene Refuses to Hold Still Intermediate

Everything above assumed the brackets photograph the same scene, and across a second of shooting they never quite do. Camera motion is the tractable half: AlignMTB in Code 7.6.1 aligns brackets by translating median threshold bitmaps, binary images that mark whether each pixel is above or below the frame's median, a representation deliberately invariant to exposure (the same thresholding instinct as Chapter 2, used here because pixel values differ wildly across brackets while their ordering largely does not). Subject motion is the hard half: a pedestrian who moved between brackets appears in different positions, and merging produces a translucent ghost at each. Classical deghosting detects pixels whose values across brackets are inconsistent with a static scene, then locally falls back to a single reference bracket, building the motion masks with exactly the difference-threshold-dilate toolkit of Chapter 2 and Chapter 6. Modern burst pipelines sidestep the problem by shooting many short identical exposures and merging with per-tile robust alignment, trading bracket depth for motion immunity.

Practical Example: Four Hundred Listings a Week and One Bright Window

Who: The pipeline developer at a real-estate photography service processing about four hundred property shoots weekly.

Situation: Interior shots must show the room and the view through the window; agents reject either a cave with a glowing rectangle or a lovely view framed by silhouettes. Photographers shoot 3-bracket sequences on tripods.

Problem: The first pipeline ran the full radiometric chain (Code 7.6.1) followed by a local tone mapper tuned for maximum detail. Output quality was technically impressive and clients hated it: halos around window frames, grayish "dynamic" skies, the unmistakable overcooked-HDR look. Tuning per shoot did not scale.

Decision: Switch the default path to Mertens fusion (Code 7.6.4), which has essentially no parameters to mistune, and keep the radiometric path only for the premium "twilight" product, where editors hand-blend a tone-mapped layer. A cheap ghost detector (bracket disagreement above threshold, dilated, per Chapter 6) flags shots with moving curtains or ceiling fans for the reference-bracket fallback.

Result: Client rejections for "unnatural look" fell to near zero, per-image processing dropped from 14 seconds to under 2, and the parameter-tuning queue disappeared.

Lesson: Choose the pipeline by the deliverable. If the output is an 8-bit image for human eyes, fusion's directness is a feature, not a shortcut; reserve radiometry for when you need the measurement.

6. HDR Without the Tone Mapping Funeral Intermediate

This section's framing, capture wide, then compress for an 8-bit screen, is quietly becoming dated in its second half. Phone and laptop displays now reach brightness ranges that cover much of a radiance map's span, and the delivery format caught up in 2023 and 2024: gain maps. A gain-map file carries a normal SDR rendering plus a small map of per-pixel brightness multipliers; SDR screens show the base image, HDR screens apply the gains and recover the wide-range version, and the standard was formalized as ISO 21496-1 with Adobe, Google (Ultra HDR in Android 14), and Apple (Adaptive HDR) shipping interoperable support. Notice the architecture: it is precisely this section's separation of measurement and rendering, frozen into a file format, with the tone-mapping decision deferred to the display instead of buried at capture time. The pipeline you learned does not disappear; its output just stops being forced through the 8-bit bottleneck.

Research Frontier: Single Shots, Learned Curves, Generative Range

Three active fronts as of 2024 to 2026. First, single-image HDR reconstruction (inverse tone mapping): networks hallucinate plausible radiance for clipped regions, with GAN-based work like GlowGAN (Wang et al., ICCV 2023) learning HDR structure from ordinary LDR photo collections, and diffusion-prior follow-ups inheriting both the power and the evidentiary caveats of Section 7.5: invented highlights are not measurements. Second, learned merging and tone mapping: burst pipelines such as the HDR+ lineage behind Google's Night Sight replaced hand-designed weights with trained components while keeping the classical architecture (align, merge robustly, tone map) fully recognizable. Third, the gain-map ecosystem (ISO 21496-1, Ultra HDR, Adaptive HDR) is shifting research from "how to compress range" toward "how to render intent across wildly different displays," reviving tone mapping as a display-adaptive, sometimes learned, runtime decision. The through-line: every one of these systems still speaks this section's vocabulary of response curves, radiance, and rendering intent.

Fun Fact

Debevec and Malik's 1997 paper was motivated less by pretty pictures than by physics: a radiance map can serve as a light source. Photograph a mirrored ball with brackets, recover the radiance of the entire surrounding environment, and you can illuminate computer-generated objects with the real world's light. That technique, image-based lighting, became the backbone of film visual effects compositing, which means this section's pipeline has been quietly lighting movie monsters for over two decades.

With dynamic range handled, the chapter's repair shop is fully stocked: you can model damage, denoise, deblur, fill holes, recover resolution, and capture range beyond the sensor's reach. What remains for Part I is engineering: Chapter 8 surveys the library landscape where all of these algorithms live in production form, and how to compose them into pipelines that survive contact with real data.

Exercise 7.6.1: Trust and the Hat Function Conceptual

(a) Explain why the Debevec-Malik weighting $w(Z)$ must vanish at both $Z = 0$ and $Z = 255$, connecting each end to a specific degradation from Section 7.1. (b) A colleague proposes saving time by merging just two brackets, 6 stops apart, instead of three spaced 2 stops apart. Using Figure 7.6.1, identify what goes wrong in the luminance range between them. (c) Why does exposure fusion need no equivalent of the response-curve recovery step?

Exercise 7.6.2: Build Your Own Reinhard Coding

Implement the Reinhard global operator from scratch on the radiance map from Code 7.6.1: compute log-average luminance $\bar L = \exp(\mathrm{mean}(\ln(L + \epsilon)))$, scale by a key parameter $a$ so $L = a \cdot L_{\mathrm{world}} / \bar L$, apply $L_d = L / (1 + L)$, and reattach color by scaling each channel by $L_d / L_{\mathrm{world}}$. Render with $a \in \{0.09, 0.18, 0.36, 0.72\}$ and describe what the key controls. Verify your $a = 0.18$ output approximates cv2.createTonemapReinhard.

Exercise 7.6.3: Measurement versus Rendering Analysis

Shoot (or download) a 3-bracket interior scene and produce two outputs: the radiometric path (Code 7.6.1 plus your Exercise 7.6.2 tone mapper) and direct fusion (Code 7.6.4). Compare them three ways: visually; by the fraction of clipped pixels in each (histogram analysis per Chapter 2); and by a thought experiment for two clients, a listing website and a daylighting-simulation consultancy that needs luminance ratios between window and wall. Which pipeline serves each client, and what exactly does the fusion path fail to provide the second one?