Part I: Image Processing
Chapter 1: Digital Image Fundamentals

Digital Image Fundamentals

From photons to pixels: how a digital image is born, encoded, and judged.

"I was born in a burst of photons, white-balanced, gamma-encoded, and saved at quality 85. I have seen things you would not believe. Most of them were compression artifacts."

A Sentimental Image Sensor

Chapter Overview

Every project in this book begins the same way: an image arrives, and code goes to work on it. Chapter 0 taught you to hold that image competently, as a NumPy array with a dtype, a channel order, and a set of conventions. This chapter asks the question that makes everything downstream make sense: what is that array, really? The answer is a story with a beginning (photons striking silicon), a middle (a chain of discretizations and encodings, each one a deliberate engineering compromise), and an end (a compressed file that preserves what a human viewer would miss least). Knowing this story is the difference between treating image data as a mysterious given and treating it as the output of a machine you understand, can reason about, and can debug.

The chapter follows the data's own path. We start inside the camera: lenses focus light, sensors count photons through a mosaic of color filters, and an image signal processor performs a dozen irreversible transformations before your code ever runs. We then formalize the two discretizations that turn a continuous optical image into numbers: sampling, whose failure mode is aliasing (detail that lies), and quantization, whose failure mode is banding (gradients that shatter). With those tools in hand we can read a camera datasheet critically, distinguishing the three budgets, resolution, bit depth, and dynamic range, and seeing which one actually limits a given application, including the high-dynamic-range capture tricks that widen the narrowest budget of all.

The last two sections explain the remaining mysteries of the array. The channel dimension gets its due in a tour of color science: why three numbers per pixel, what RGB actually encodes (and the gamma trap waiting inside it), and why the same color wears different coordinates in HSV, Lab, and YCbCr depending on whether the job is selection, measurement, or compression. Finally, the chapter reassembles all of its own ideas into the file formats you use daily: PNG's lossless contract, JPEG's perceptual gamble (chroma subsampling from the color section, quantization from the sampling section, frequency transforms prefiguring Chapter 4), and the modern WebP, AVIF, and learned codecs now replacing them. Along the way we meet PSNR and SSIM, the first members of an evaluation-metric family this book follows all the way to FID and beyond in Chapter 37.

A word on why this matters for AI specifically. Modern vision models are trained on millions of images that all passed through the machinery in this chapter, with its auto white balance, its tone curves, its 8-bit quantization, and its JPEG artifacts. The pipeline's choices become the model's silent assumptions, and the pipeline's failure modes (clipped highlights, aliased textures, compression-shifted embeddings) become the model's failure modes. The engineers who debug those failures fastest are invariably the ones who can look at a wrong prediction and ask not just "what did the model do?" but "what did the camera do?". This chapter makes you one of them.

Prerequisites

This chapter assumes you can load, index, and display images as NumPy arrays, and that you know the BGR-versus-RGB and uint8-versus-float conventions, all covered in Chapter 0: Foundations: The Python Imaging Stack. The code uses OpenCV, NumPy, scikit-image, and Pillow, the stack installed there. No prior optics, signal processing, or color science is required; the chapter builds each from scratch. Comfort with logarithms and basic probability (mean, variance) is enough for all the math.

Chapter Roadmap

What's Next?

With the image's origin story told, Chapter 2: Point Operations, Histograms & Thresholding starts changing images on purpose. The gamma curves this chapter met inside the ISP become tools you apply yourself; the quantization levels become histogram bins you read like an instrument panel; and the color channels become inputs to the simplest and most-used operation in vision: the threshold. Everything stays per-pixel for one more chapter before neighborhoods, kernels, and convolution enter in Chapter 3.

Bibliography & Further Reading

Foundational Papers

Bayer, B. E. "Color Imaging Array." US Patent 3,971,065 (1976). patents.google.com
The one-page idea inside nearly every camera made since: a 2×2 mosaic of color filters with green doubled, as taught in Section 1.1.
Wallace, G. K. "The JPEG Still Picture Compression Standard." Communications of the ACM 34(4), 1991. dl.acm.org
The classic readable account of the JPEG pipeline (YCbCr, DCT, quantization, entropy coding) from one of its architects; Section 1.5 in its original voice.
Wang, Z., Bovik, A. C., Sheikh, H. R., Simoncelli, E. P. "Image Quality Assessment: From Error Visibility to Structural Similarity." IEEE Transactions on Image Processing 13(4), 2004. cns.nyu.edu/~lcv/ssim
The SSIM paper, with reference code on the authors' page; the start of the perceptual-metric arc this book follows to FID in Chapter 37.
Debevec, P., Malik, J. "Recovering High Dynamic Range Radiance Maps from Photographs." SIGGRAPH 1997. pauldebevec.com
The bracketed-exposure HDR method implemented by OpenCV's createCalibrateDebevec and used in Section 1.3's HDR experiments.
Brooks, T., Mildenhall, B., Xue, T., Chen, J., Sharlet, D., Barron, J. T. "Unprocessing Images for Learned Raw Denoising." CVPR 2019. arXiv:1811.11127
Inverts the ISP stage by stage to synthesize realistic RAW training data; a precise, readable model of the pipeline drawn in Figure 1.1.1.

Books & Reference

Gonzalez, R. C., Woods, R. E. "Digital Image Processing," 4th edition. imageprocessingplace.com
The standard textbook treatment of sampling, quantization, and intensity transformations; Chapters 2 and 8 parallel this chapter at greater mathematical depth.
Szeliski, R. "Computer Vision: Algorithms and Applications," 2nd edition (2022). szeliski.org/Book
Free online; its Chapter 2 covers image formation, photometric optics, and the camera pipeline with full derivations.
Poynton, C. "Frequently Asked Questions about Color" and "Frequently Asked Questions about Gamma." poynton.ca
The classic engineer-oriented explanations of gamma, luma, and color encoding; the definitive antidote to the gamma trap of Section 1.4.

Tools & Libraries

OpenCV Documentation: Color Conversions. docs.opencv.org
The exact formulas and ranges behind every cvtColor flag used in this chapter, including the HSV hue halving and Lab rescaling gotchas.
rawpy: RAW image processing for Python. github.com/letmaik/rawpy
The LibRaw wrapper from Section 1.1's library shortcut: full RAW development (demosaic, white balance, color matrices, gamma) in one call.
scikit-image: the color module. scikit-image.org
Reference-grade color space conversions and Delta E metrics with float precision; the one-line replacement for hand-written Lab math.
WebP Developer Documentation. developers.google.com/speed/webp
Format internals, lossy and lossless modes, and the compression study comparing WebP against JPEG referenced in Section 1.5.

Standards, Formats & Datasets

JPEG AI (ISO/IEC 6048): Learning-based Image Coding. jpeg.org/jpegai
The first neural-network-based international image coding standard, finalized 2024 to 2025; the research-frontier endpoint of Section 1.5.
Ultra HDR Image Format (gain maps), Android Developers. developer.android.com
The backward-compatible HDR file format (JPEG plus gain map) discussed in Section 1.3's research frontier, now standardized as ISO 21496-1.
Kodak Lossless True Color Image Suite. r0k.us/graphics/kodak
The 24-image benchmark set on which three decades of compression papers report PSNR and SSIM; useful for reproducing Section 1.5's sweeps on standard data.