Part I: Image Processing
Chapter 1: Digital Image Fundamentals

Digital Image Fundamentals

From photons to pixels: how a digital image is born, encoded, and judged.

"I was born in a burst of photons, white-balanced, gamma-encoded, and saved at quality 85. I have seen things you would not believe. Most of them were compression artifacts."

A Sentimental Image Sensor

Chapter Overview

Every project in this book begins the same way: an image arrives, and code goes to work on it. Chapter 0 taught you to hold that image competently, as a NumPy array with a dtype, a channel order, and a set of conventions. This chapter asks the question that makes everything downstream make sense: what is that array, really? The answer is a story with a beginning (photons striking silicon), a middle (a chain of discretizations and encodings, each one a deliberate engineering compromise), and an end (a compressed file that preserves what a human viewer would miss least). Knowing this story is the difference between treating image data as a mysterious given and treating it as the output of a machine you understand, can reason about, and can debug.

The chapter follows the data's own path. We start inside the camera: lenses focus light, sensors count photons through a mosaic of color filters, and an image signal processor performs a dozen irreversible transformations before your code ever runs. We then formalize the two discretizations that turn a continuous optical image into numbers: sampling, whose failure mode is aliasing (detail that lies), and quantization, whose failure mode is banding (gradients that shatter). With those tools in hand we can read a camera datasheet critically, distinguishing the three budgets, resolution, bit depth, and dynamic range, and seeing which one actually limits a given application, including the high-dynamic-range capture tricks that widen the narrowest budget of all.

The last two sections explain the remaining mysteries of the array. The channel dimension gets its due in a tour of color science: why three numbers per pixel, what RGB actually encodes (and the gamma trap waiting inside it), and why the same color wears different coordinates in HSV, Lab, and YCbCr depending on whether the job is selection, measurement, or compression. Finally, the chapter reassembles all of its own ideas into the file formats you use daily: PNG's lossless contract, JPEG's perceptual gamble (chroma subsampling from the color section, quantization from the sampling section, frequency transforms prefiguring Chapter 4), and the modern WebP, AVIF, and learned codecs now replacing them. Along the way we meet PSNR and SSIM, the first members of an evaluation-metric family this book follows all the way to FID and beyond in Chapter 37.

A word on why this matters for AI specifically. Modern vision models are trained on millions of images that all passed through the machinery in this chapter, with its auto white balance, its tone curves, its 8-bit quantization, and its JPEG artifacts. The pipeline's choices become the model's silent assumptions, and the pipeline's failure modes (clipped highlights, aliased textures, compression-shifted embeddings) become the model's failure modes. The engineers who debug those failures fastest are invariably the ones who can look at a wrong prediction and ask not just "what did the model do?" but "what did the camera do?". This chapter makes you one of them.

Prerequisites

This chapter assumes you can load, index, and display images as NumPy arrays, and that you know the BGR-versus-RGB and uint8-versus-float conventions, all covered in Chapter 0: Foundations: The Python Imaging Stack. The code uses OpenCV, NumPy, scikit-image, and Pillow, the stack installed there. No prior optics, signal processing, or color science is required; the chapter builds each from scratch. Comfort with logarithms and basic probability (mean, variance) is enough for all the math.

Remember the Chapter as One Sentence

If you keep one thing from this chapter, keep the data path that names every section in order: photons to charge, charge to samples, samples to levels, levels to color, color to bytes. Read it forward and it is the camera making a picture (Sections 1.1 to 1.5); read it backward and it is your debugging checklist, because every artifact you will ever chase was introduced at exactly one of these five hops. The chapter's signature phrase says the rest: an image is not a recording of the world, it is the output of an opinionated pipeline, and you cannot recover what a given stage threw away. The roadmap below walks the five hops one section at a time.

Chapter Roadmap

Once you have worked through the five sections, the Hands-On Lab below reassembles them into a single diagnostic script that makes every hop of the data path visible on one image at once. Treat it as the chapter's capstone: each step maps to one section, and the finished tool is something you will keep pointing at images long after you close the book.

Hands-On Lab: Build an Image Pipeline Inspector

Duration: about 60 to 75 minutes Difficulty: Beginner to Intermediate

Objective

Build a single command-line script, inspect_pipeline.py, that takes any image and walks it back along this chapter's five-hop data path (photons to charge, charge to samples, samples to levels, levels to color, color to bytes). For one input image it prints a one-screen report and saves an annotated contact sheet that makes every discretization in the chapter visible at once: a downsampling alias, a bit-depth banding ladder, the per-channel color decomposition, and a JPEG-quality sweep scored with PSNR and SSIM. The finished script is a reusable diagnostic you can point at any image to answer "what did the pipeline do to this picture?".

What You'll Practice

  • Reading an image as a NumPy array and reasoning about its dtype, shape, and value range, the conventions from Section 1.3.
  • Demonstrating aliasing by sampling with and without a pre-filter, the core failure mode of Section 1.2.
  • Quantizing to fewer bits to reproduce the banding ladder of Section 1.2 and Section 1.3.
  • Converting between RGB, HSV, Lab, and YCbCr and visualizing the channels, from Section 1.4.
  • Sweeping JPEG quality and scoring the loss with PSNR and SSIM, the metrics introduced in Section 1.5.

Setup

Use the stack from Chapter 0. Install the four libraries and grab any photo with both fine texture and a smooth gradient (a brick wall behind a clear sky works well, because the bricks expose aliasing and the sky exposes banding).

pip install opencv-python numpy scikit-image matplotlib

Steps

Step 1: Load the image and print its array facts

Every investigation starts by interrogating the array itself. Load the image in RGB order (OpenCV reads BGR, so convert once) and report the four facts that tell you which budget you are working with.

import cv2
import numpy as np

def load_rgb(path):
    bgr = cv2.imread(path, cv2.IMREAD_COLOR)
    if bgr is None:
        raise FileNotFoundError(path)
    return cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)

img = load_rgb("input.jpg")

# TODO: print shape (H, W, channels), dtype, min, and max.
# Hint: a uint8 photo should report dtype uint8 and a max at or near 255.
Hint

Use img.shape, img.dtype, int(img.min()), and int(img.max()). If the max is well below 255 the image never used its full 8-bit range, which is exactly the under-exposure diagnosis from Section 1.3.

Step 2: Demonstrate aliasing with and without a pre-filter

Downsample the image by a large factor two ways: naive nearest-neighbor decimation (which skips samples and aliases) and area averaging (which pre-filters first). The brick texture is where the difference screams.

def downsample_naive(img, factor):
    # TODO: take every factor-th pixel in each axis (no filtering).
    # Hint: array slicing img[::factor, ::factor] does exactly this.
    pass

def downsample_filtered(img, factor):
    h, w = img.shape[:2]
    return cv2.resize(img, (w // factor, h // factor),
                      interpolation=cv2.INTER_AREA)  # area = pre-filter then sample
Hint

return img[::factor, ::factor] for the naive version. Upscale both results back to a common size with cv2.INTER_NEAREST before placing them side by side so the aliasing in the naive crop stays visible instead of being smoothed away by the viewer.

Step 3: Build the bit-depth banding ladder

Quantize the image to fewer and fewer bits per channel and watch smooth gradients shatter into bands. This is the quantization failure mode of Section 1.2 made visible on a real photo.

def quantize_bits(img, bits):
    levels = 2 ** bits
    # TODO: map the 0..255 range onto `levels` evenly spaced values.
    # Hint: scale down, round, scale back up, and stay in uint8.
    pass

ladder = [quantize_bits(img, b) for b in (8, 4, 2, 1)]
Hint

A clean formula: step = 256 // levels; q = (img // step) * step + step // 2; return np.clip(q, 0, 255).astype(np.uint8). At 1 bit per channel you should see a poster of 8 flat colors, the extreme end of the banding curve.

Step 4: Decompose the color into HSV, Lab, and YCbCr channels

Split the image across the chapter's four coordinate systems and render each channel as grayscale. Seeing the Y channel of YCbCr next to the L channel of Lab makes concrete why compression and measurement reach for different spaces.

spaces = {
    "RGB":   img,
    "HSV":   cv2.cvtColor(img, cv2.COLOR_RGB2HSV),
    "Lab":   cv2.cvtColor(img, cv2.COLOR_RGB2Lab),
    "YCbCr": cv2.cvtColor(img, cv2.COLOR_RGB2YCrCb),
}

# TODO: for each non-RGB space, split into 3 single-channel images for the contact sheet.
# Hint: cv2.split returns a list of 2D arrays you can display as grayscale.
Hint

ch0, ch1, ch2 = cv2.split(spaces["YCbCr"]). Remember the Section 1.4 gotcha: OpenCV stores HSV hue in 0 to 179 and the Lab channels rescaled to 0 to 255, so display them as-is for inspection rather than trusting the raw numbers as physical units.

Step 5: Sweep JPEG quality and score it with PSNR and SSIM

Re-encode the image at a range of JPEG qualities, measure the file size of each, and score the distortion two ways. This reproduces the cost-of-compression curve from Section 1.5 on your own image.

from skimage.metrics import peak_signal_noise_ratio as psnr
from skimage.metrics import structural_similarity as ssim

def jpeg_roundtrip(img_rgb, quality):
    bgr = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2BGR)
    ok, buf = cv2.imencode(".jpg", bgr, [cv2.IMWRITE_JPEG_QUALITY, quality])
    decoded = cv2.imdecode(buf, cv2.IMREAD_COLOR)
    return cv2.cvtColor(decoded, cv2.COLOR_BGR2RGB), buf.nbytes

# TODO: loop qualities (e.g. 95, 75, 50, 25, 10); for each, round-trip and record
#       (quality, kilobytes, PSNR in dB, SSIM). Print the table sorted by quality.
# Hint: pass channel_axis=-1 to ssim for a color image.
Hint

Compute psnr(img, decoded) and ssim(img, decoded, channel_axis=-1). You should see PSNR and SSIM fall together as quality drops, but SSIM holds up better at mid qualities because it tracks structure rather than raw pixel error, the exact distinction Section 1.5 draws between the two metrics.

Step 6: Assemble the contact sheet and print the report

Tie the four experiments into one figure with matplotlib and print the array facts and the JPEG table to the terminal. This is the deliverable: one image and one report that summarize the whole pipeline.

import matplotlib.pyplot as plt

# TODO: build a grid of subplots: row 1 = naive vs filtered downsample,
#       row 2 = the four-step banding ladder, row 3 = YCbCr channels.
#       Title each panel, then plt.savefig("pipeline_report.png", dpi=150).
# Hint: fig, axes = plt.subplots(3, 4, figsize=(14, 10)); axes is a 2D array.
Hint

Call ax.imshow(panel, cmap="gray") for single-channel panels and drop the cmap for the RGB ones. Use ax.set_title(...) and ax.axis("off"). Keep the JPEG sweep as printed text rather than a plot to keep the figure uncluttered.

Expected Output

Running python inspect_pipeline.py input.jpg prints the array facts (for example shape=(1365, 2048, 3) dtype=uint8 min=3 max=255) followed by a JPEG table whose PSNR drops monotonically as quality falls (roughly 45 dB at quality 95 down to the high 20s at quality 10) while file size shrinks by an order of magnitude. It saves pipeline_report.png, a contact sheet where the naive downsample shows jagged aliased brick edges next to the smooth filtered version, the banding ladder progresses from photographic at 8 bits to an 8-color poster at 1 bit, and the YCbCr panels show a detailed luma channel beside two flat chroma channels (the visual reason chroma subsampling is nearly free).

Right Tool: scikit-image Does the Metrics and Conversions in Two Imports

The hand-rolled PSNR and SSIM and the OpenCV color conversions in this lab are deliberately explicit so you see what each step does. In production you would not reimplement them. The whole metric and color-space layer reduces to skimage.metrics.peak_signal_noise_ratio, skimage.metrics.structural_similarity, and skimage.color.rgb2lab and friends, which return reference-grade float results and handle the range and channel-axis bookkeeping (the 0 to 179 hue and rescaled-Lab gotchas from Section 1.4) for you. The learning path is steps 1 to 6; the practical payoff is that the same diagnostic is a dozen library calls.

Stretch Goals

  • Add a fifth experiment that reproduces the HDR motivation from Section 1.3: synthesize an under- and over-exposed pair from the input by scaling and clipping, then show how each loses detail at a different end of the tonal range.
  • Replace the JPEG sweep with a WebP and AVIF sweep (cv2.imencode(".webp", ...)) and plot PSNR against kilobytes for all three codecs on one axis to reproduce Section 1.5's rate-distortion comparison.
  • Turn the script into a small batch tool that runs over the Kodak image suite (linked in the bibliography) and reports the mean PSNR and SSIM per quality level, the way compression papers do.
Complete Solution
import sys
import cv2
import numpy as np
import matplotlib.pyplot as plt
from skimage.metrics import peak_signal_noise_ratio as psnr
from skimage.metrics import structural_similarity as ssim


def load_rgb(path):
    bgr = cv2.imread(path, cv2.IMREAD_COLOR)
    if bgr is None:
        raise FileNotFoundError(path)
    return cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)


def downsample_naive(img, factor):
    return img[::factor, ::factor]                       # skip samples, no filter


def downsample_filtered(img, factor):
    h, w = img.shape[:2]
    return cv2.resize(img, (w // factor, h // factor),
                      interpolation=cv2.INTER_AREA)       # area = pre-filter then sample


def quantize_bits(img, bits):
    levels = 2 ** bits
    step = 256 // levels
    q = (img // step) * step + step // 2                 # snap to bin centers
    return np.clip(q, 0, 255).astype(np.uint8)


def jpeg_roundtrip(img_rgb, quality):
    bgr = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2BGR)
    ok, buf = cv2.imencode(".jpg", bgr, [cv2.IMWRITE_JPEG_QUALITY, quality])
    decoded = cv2.imdecode(buf, cv2.IMREAD_COLOR)
    return cv2.cvtColor(decoded, cv2.COLOR_BGR2RGB), buf.nbytes


def main(path):
    img = load_rgb(path)
    print(f"shape={img.shape} dtype={img.dtype} "
          f"min={int(img.min())} max={int(img.max())}")

    # Aliasing demonstration.
    factor = 8
    naive = downsample_naive(img, factor)
    filt = downsample_filtered(img, factor)
    h, w = img.shape[:2]
    naive_up = cv2.resize(naive, (w, h), interpolation=cv2.INTER_NEAREST)
    filt_up = cv2.resize(filt, (w, h), interpolation=cv2.INTER_NEAREST)

    # Banding ladder.
    ladder = [(b, quantize_bits(img, b)) for b in (8, 4, 2, 1)]

    # Color decomposition (YCbCr for the contact sheet).
    ycc = cv2.cvtColor(img, cv2.COLOR_RGB2YCrCb)
    y, cr, cb = cv2.split(ycc)

    # JPEG quality sweep.
    print(f"{'quality':>8}{'KB':>10}{'PSNR_dB':>10}{'SSIM':>8}")
    for q in (95, 75, 50, 25, 10):
        decoded, nbytes = jpeg_roundtrip(img, q)
        p = psnr(img, decoded)
        s = ssim(img, decoded, channel_axis=-1)
        print(f"{q:>8}{nbytes / 1024:>10.1f}{p:>10.2f}{s:>8.3f}")

    # Contact sheet.
    fig, axes = plt.subplots(3, 4, figsize=(14, 10))
    axes[0, 0].imshow(naive_up); axes[0, 0].set_title(f"Naive /{factor} (aliased)")
    axes[0, 1].imshow(filt_up);  axes[0, 1].set_title(f"Area /{factor} (pre-filtered)")
    axes[0, 2].axis("off"); axes[0, 3].axis("off")
    for ax, (b, q) in zip(axes[1], ladder):
        ax.imshow(q); ax.set_title(f"{b}-bit/channel")
    axes[2, 0].imshow(y, cmap="gray");  axes[2, 0].set_title("Y (luma)")
    axes[2, 1].imshow(cb, cmap="gray"); axes[2, 1].set_title("Cb (chroma)")
    axes[2, 2].imshow(cr, cmap="gray"); axes[2, 2].set_title("Cr (chroma)")
    axes[2, 3].axis("off")
    for ax in axes.ravel():
        ax.set_xticks([]); ax.set_yticks([])
    fig.tight_layout()
    fig.savefig("pipeline_report.png", dpi=150)
    print("saved pipeline_report.png")


if __name__ == "__main__":
    main(sys.argv[1] if len(sys.argv) > 1 else "input.jpg")

What's Next?

With the image's origin story told, Chapter 2: Point Operations, Histograms & Thresholding starts changing images on purpose. The gamma curves this chapter met inside the ISP become tools you apply yourself; the quantization levels become histogram bins you read like an instrument panel; and the color channels become inputs to the simplest and most-used operation in vision: the threshold. Everything stays per-pixel for one more chapter before neighborhoods, kernels, and convolution enter in Chapter 3.

Bibliography & Further Reading

Foundational Papers

Bayer, B. E. "Color Imaging Array." US Patent 3,971,065 (1976). patents.google.com

The one-page idea inside nearly every camera made since: a 2×2 mosaic of color filters with green doubled, as taught in Section 1.1.

📄 Paper

Wallace, G. K. "The JPEG Still Picture Compression Standard." Communications of the ACM 34(4), 1991. dl.acm.org

The classic readable account of the JPEG pipeline (YCbCr, DCT, quantization, entropy coding) from one of its architects; Section 1.5 in its original voice.

📄 Paper

Wang, Z., Bovik, A. C., Sheikh, H. R., Simoncelli, E. P. "Image Quality Assessment: From Error Visibility to Structural Similarity." IEEE Transactions on Image Processing 13(4), 2004. cns.nyu.edu/~lcv/ssim

The SSIM paper, with reference code on the authors' page; the start of the perceptual-metric arc this book follows to FID in Chapter 37.

📄 Paper

Debevec, P., Malik, J. "Recovering High Dynamic Range Radiance Maps from Photographs." SIGGRAPH 1997. pauldebevec.com

The bracketed-exposure HDR method implemented by OpenCV's createCalibrateDebevec and used in Section 1.3's HDR experiments.

📄 Paper

Brooks, T., Mildenhall, B., Xue, T., Chen, J., Sharlet, D., Barron, J. T. "Unprocessing Images for Learned Raw Denoising." CVPR 2019. arXiv:1811.11127

Inverts the ISP stage by stage to synthesize realistic RAW training data; a precise, readable model of the pipeline drawn in Figure 1.1.1.

📄 Paper

Books & Reference

Gonzalez, R. C., Woods, R. E. "Digital Image Processing," 4th edition. imageprocessingplace.com

The standard textbook treatment of sampling, quantization, and intensity transformations; Chapters 2 and 8 parallel this chapter at greater mathematical depth.

📖 Book

Szeliski, R. "Computer Vision: Algorithms and Applications," 2nd edition (2022). szeliski.org/Book

Free online; its Chapter 2 covers image formation, photometric optics, and the camera pipeline with full derivations.

📖 Book

Poynton, C. "Frequently Asked Questions about Color" and "Frequently Asked Questions about Gamma." poynton.ca

The classic engineer-oriented explanations of gamma, luma, and color encoding; the definitive antidote to the gamma trap of Section 1.4.

📝 Blog Post

Tools & Libraries

OpenCV Documentation: Color Conversions. docs.opencv.org

The exact formulas and ranges behind every cvtColor flag used in this chapter, including the HSV hue halving and Lab rescaling gotchas.

🔧 Tool

rawpy: RAW image processing for Python. github.com/letmaik/rawpy

The LibRaw wrapper from Section 1.1's library shortcut: full RAW development (demosaic, white balance, color matrices, gamma) in one call.

🔧 Tool

scikit-image: the color module. scikit-image.org

Reference-grade color space conversions and Delta E metrics with float precision; the one-line replacement for hand-written Lab math.

🔧 Tool

WebP Developer Documentation. developers.google.com/speed/webp

Format internals, lossy and lossless modes, and the compression study comparing WebP against JPEG referenced in Section 1.5.

🔧 Tool

Standards, Formats & Datasets

JPEG AI (ISO/IEC 6048): Learning-based Image Coding. jpeg.org/jpegai

The first neural-network-based international image coding standard, finalized 2024 to 2025; the research-frontier endpoint of Section 1.5.

📄 Paper

Ultra HDR Image Format (gain maps), Android Developers. developer.android.com

The backward-compatible HDR file format (JPEG plus gain map) discussed in Section 1.3's research frontier, now standardized as ISO 21496-1.

🔧 Tool

Kodak Lossless True Color Image Suite. r0k.us/graphics/kodak

The 24-image benchmark set on which three decades of compression papers report PSNR and SSIM; useful for reproducing Section 1.5's sweeps on standard data.

📊 Dataset