"I was born in a burst of photons, white-balanced, gamma-encoded, and saved at quality 85. I have seen things you would not believe. Most of them were compression artifacts."
A Sentimental Image Sensor
Chapter Overview
Every project in this book begins the same way: an image arrives, and code goes to work on it. Chapter 0 taught you to hold that image competently, as a NumPy array with a dtype, a channel order, and a set of conventions. This chapter asks the question that makes everything downstream make sense: what is that array, really? The answer is a story with a beginning (photons striking silicon), a middle (a chain of discretizations and encodings, each one a deliberate engineering compromise), and an end (a compressed file that preserves what a human viewer would miss least). Knowing this story is the difference between treating image data as a mysterious given and treating it as the output of a machine you understand, can reason about, and can debug.
The chapter follows the data's own path. We start inside the camera: lenses focus light, sensors count photons through a mosaic of color filters, and an image signal processor performs a dozen irreversible transformations before your code ever runs. We then formalize the two discretizations that turn a continuous optical image into numbers: sampling, whose failure mode is aliasing (detail that lies), and quantization, whose failure mode is banding (gradients that shatter). With those tools in hand we can read a camera datasheet critically, distinguishing the three budgets, resolution, bit depth, and dynamic range, and seeing which one actually limits a given application, including the high-dynamic-range capture tricks that widen the narrowest budget of all.
The last two sections explain the remaining mysteries of the array. The channel dimension gets its due in a tour of color science: why three numbers per pixel, what RGB actually encodes (and the gamma trap waiting inside it), and why the same color wears different coordinates in HSV, Lab, and YCbCr depending on whether the job is selection, measurement, or compression. Finally, the chapter reassembles all of its own ideas into the file formats you use daily: PNG's lossless contract, JPEG's perceptual gamble (chroma subsampling from the color section, quantization from the sampling section, frequency transforms prefiguring Chapter 4), and the modern WebP, AVIF, and learned codecs now replacing them. Along the way we meet PSNR and SSIM, the first members of an evaluation-metric family this book follows all the way to FID and beyond in Chapter 37.
A word on why this matters for AI specifically. Modern vision models are trained on millions of images that all passed through the machinery in this chapter, with its auto white balance, its tone curves, its 8-bit quantization, and its JPEG artifacts. The pipeline's choices become the model's silent assumptions, and the pipeline's failure modes (clipped highlights, aliased textures, compression-shifted embeddings) become the model's failure modes. The engineers who debug those failures fastest are invariably the ones who can look at a wrong prediction and ask not just "what did the model do?" but "what did the camera do?". This chapter makes you one of them.
Prerequisites
This chapter assumes you can load, index, and display images as NumPy arrays, and that you know the BGR-versus-RGB and uint8-versus-float conventions, all covered in Chapter 0: Foundations: The Python Imaging Stack. The code uses OpenCV, NumPy, scikit-image, and Pillow, the stack installed there. No prior optics, signal processing, or color science is required; the chapter builds each from scratch. Comfort with logarithms and basic probability (mean, variance) is enough for all the math.
If you keep one thing from this chapter, keep the data path that names every section in order: photons to charge, charge to samples, samples to levels, levels to color, color to bytes. Read it forward and it is the camera making a picture (Sections 1.1 to 1.5); read it backward and it is your debugging checklist, because every artifact you will ever chase was introduced at exactly one of these five hops. The chapter's signature phrase says the rest: an image is not a recording of the world, it is the output of an opinionated pipeline, and you cannot recover what a given stage threw away. The roadmap below walks the five hops one section at a time.
Chapter Roadmap
- 1.1 Image Formation: Optics, Sensors & the ISP Pipeline Light through a lens, photons counted by a Bayer-filtered sensor, and the dozen irreversible decisions the camera's ISP makes before your code runs.
- 1.2 Sampling & Quantization The two discretizations behind every digital image, their failure modes (aliasing and banding), and their antidotes (pre-filtering and dithering).
- 1.3 Resolution, Bit Depth & Dynamic Range The three budgets of an image, how to diagnose which one is the bottleneck, and HDR capture for when one exposure cannot span the scene.
- 1.4 Color Science & Color Spaces: RGB, HSV, Lab & YCbCr Why color is a three-dimensional perceptual summary, the gamma trap inside RGB, and four coordinate systems matched to four different jobs.
- 1.5 Image Formats & Compression: PNG, JPEG & WebP Lossless versus lossy contracts, the JPEG pipeline read end to end with this chapter's tools, and PSNR and SSIM for measuring what compression costs.
Once you have worked through the five sections, the Hands-On Lab below reassembles them into a single diagnostic script that makes every hop of the data path visible on one image at once. Treat it as the chapter's capstone: each step maps to one section, and the finished tool is something you will keep pointing at images long after you close the book.
Hands-On Lab: Build an Image Pipeline Inspector
Objective
Build a single command-line script, inspect_pipeline.py, that takes any image and walks it back along this chapter's five-hop data path (photons to charge, charge to samples, samples to levels, levels to color, color to bytes). For one input image it prints a one-screen report and saves an annotated contact sheet that makes every discretization in the chapter visible at once: a downsampling alias, a bit-depth banding ladder, the per-channel color decomposition, and a JPEG-quality sweep scored with PSNR and SSIM. The finished script is a reusable diagnostic you can point at any image to answer "what did the pipeline do to this picture?".
What You'll Practice
- Reading an image as a NumPy array and reasoning about its dtype, shape, and value range, the conventions from Section 1.3.
- Demonstrating aliasing by sampling with and without a pre-filter, the core failure mode of Section 1.2.
- Quantizing to fewer bits to reproduce the banding ladder of Section 1.2 and Section 1.3.
- Converting between RGB, HSV, Lab, and YCbCr and visualizing the channels, from Section 1.4.
- Sweeping JPEG quality and scoring the loss with PSNR and SSIM, the metrics introduced in Section 1.5.
Setup
Use the stack from Chapter 0. Install the four libraries and grab any photo with both fine texture and a smooth gradient (a brick wall behind a clear sky works well, because the bricks expose aliasing and the sky exposes banding).
pip install opencv-python numpy scikit-image matplotlib
Steps
Step 1: Load the image and print its array facts
Every investigation starts by interrogating the array itself. Load the image in RGB order (OpenCV reads BGR, so convert once) and report the four facts that tell you which budget you are working with.
import cv2
import numpy as np
def load_rgb(path):
bgr = cv2.imread(path, cv2.IMREAD_COLOR)
if bgr is None:
raise FileNotFoundError(path)
return cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
img = load_rgb("input.jpg")
# TODO: print shape (H, W, channels), dtype, min, and max.
# Hint: a uint8 photo should report dtype uint8 and a max at or near 255.
Hint
Use img.shape, img.dtype, int(img.min()), and int(img.max()). If the max is well below 255 the image never used its full 8-bit range, which is exactly the under-exposure diagnosis from Section 1.3.
Step 2: Demonstrate aliasing with and without a pre-filter
Downsample the image by a large factor two ways: naive nearest-neighbor decimation (which skips samples and aliases) and area averaging (which pre-filters first). The brick texture is where the difference screams.
def downsample_naive(img, factor):
# TODO: take every factor-th pixel in each axis (no filtering).
# Hint: array slicing img[::factor, ::factor] does exactly this.
pass
def downsample_filtered(img, factor):
h, w = img.shape[:2]
return cv2.resize(img, (w // factor, h // factor),
interpolation=cv2.INTER_AREA) # area = pre-filter then sample
Hint
return img[::factor, ::factor] for the naive version. Upscale both results back to a common size with cv2.INTER_NEAREST before placing them side by side so the aliasing in the naive crop stays visible instead of being smoothed away by the viewer.
Step 3: Build the bit-depth banding ladder
Quantize the image to fewer and fewer bits per channel and watch smooth gradients shatter into bands. This is the quantization failure mode of Section 1.2 made visible on a real photo.
def quantize_bits(img, bits):
levels = 2 ** bits
# TODO: map the 0..255 range onto `levels` evenly spaced values.
# Hint: scale down, round, scale back up, and stay in uint8.
pass
ladder = [quantize_bits(img, b) for b in (8, 4, 2, 1)]
Hint
A clean formula: step = 256 // levels; q = (img // step) * step + step // 2; return np.clip(q, 0, 255).astype(np.uint8). At 1 bit per channel you should see a poster of 8 flat colors, the extreme end of the banding curve.
Step 4: Decompose the color into HSV, Lab, and YCbCr channels
Split the image across the chapter's four coordinate systems and render each channel as grayscale. Seeing the Y channel of YCbCr next to the L channel of Lab makes concrete why compression and measurement reach for different spaces.
spaces = {
"RGB": img,
"HSV": cv2.cvtColor(img, cv2.COLOR_RGB2HSV),
"Lab": cv2.cvtColor(img, cv2.COLOR_RGB2Lab),
"YCbCr": cv2.cvtColor(img, cv2.COLOR_RGB2YCrCb),
}
# TODO: for each non-RGB space, split into 3 single-channel images for the contact sheet.
# Hint: cv2.split returns a list of 2D arrays you can display as grayscale.
Hint
ch0, ch1, ch2 = cv2.split(spaces["YCbCr"]). Remember the Section 1.4 gotcha: OpenCV stores HSV hue in 0 to 179 and the Lab channels rescaled to 0 to 255, so display them as-is for inspection rather than trusting the raw numbers as physical units.
Step 5: Sweep JPEG quality and score it with PSNR and SSIM
Re-encode the image at a range of JPEG qualities, measure the file size of each, and score the distortion two ways. This reproduces the cost-of-compression curve from Section 1.5 on your own image.
from skimage.metrics import peak_signal_noise_ratio as psnr
from skimage.metrics import structural_similarity as ssim
def jpeg_roundtrip(img_rgb, quality):
bgr = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2BGR)
ok, buf = cv2.imencode(".jpg", bgr, [cv2.IMWRITE_JPEG_QUALITY, quality])
decoded = cv2.imdecode(buf, cv2.IMREAD_COLOR)
return cv2.cvtColor(decoded, cv2.COLOR_BGR2RGB), buf.nbytes
# TODO: loop qualities (e.g. 95, 75, 50, 25, 10); for each, round-trip and record
# (quality, kilobytes, PSNR in dB, SSIM). Print the table sorted by quality.
# Hint: pass channel_axis=-1 to ssim for a color image.
Hint
Compute psnr(img, decoded) and ssim(img, decoded, channel_axis=-1). You should see PSNR and SSIM fall together as quality drops, but SSIM holds up better at mid qualities because it tracks structure rather than raw pixel error, the exact distinction Section 1.5 draws between the two metrics.
Step 6: Assemble the contact sheet and print the report
Tie the four experiments into one figure with matplotlib and print the array facts and the JPEG table to the terminal. This is the deliverable: one image and one report that summarize the whole pipeline.
import matplotlib.pyplot as plt
# TODO: build a grid of subplots: row 1 = naive vs filtered downsample,
# row 2 = the four-step banding ladder, row 3 = YCbCr channels.
# Title each panel, then plt.savefig("pipeline_report.png", dpi=150).
# Hint: fig, axes = plt.subplots(3, 4, figsize=(14, 10)); axes is a 2D array.
Hint
Call ax.imshow(panel, cmap="gray") for single-channel panels and drop the cmap for the RGB ones. Use ax.set_title(...) and ax.axis("off"). Keep the JPEG sweep as printed text rather than a plot to keep the figure uncluttered.
Expected Output
Running python inspect_pipeline.py input.jpg prints the array facts (for example shape=(1365, 2048, 3) dtype=uint8 min=3 max=255) followed by a JPEG table whose PSNR drops monotonically as quality falls (roughly 45 dB at quality 95 down to the high 20s at quality 10) while file size shrinks by an order of magnitude. It saves pipeline_report.png, a contact sheet where the naive downsample shows jagged aliased brick edges next to the smooth filtered version, the banding ladder progresses from photographic at 8 bits to an 8-color poster at 1 bit, and the YCbCr panels show a detailed luma channel beside two flat chroma channels (the visual reason chroma subsampling is nearly free).
The hand-rolled PSNR and SSIM and the OpenCV color conversions in this lab are deliberately explicit so you see what each step does. In production you would not reimplement them. The whole metric and color-space layer reduces to skimage.metrics.peak_signal_noise_ratio, skimage.metrics.structural_similarity, and skimage.color.rgb2lab and friends, which return reference-grade float results and handle the range and channel-axis bookkeeping (the 0 to 179 hue and rescaled-Lab gotchas from Section 1.4) for you. The learning path is steps 1 to 6; the practical payoff is that the same diagnostic is a dozen library calls.
Stretch Goals
- Add a fifth experiment that reproduces the HDR motivation from Section 1.3: synthesize an under- and over-exposed pair from the input by scaling and clipping, then show how each loses detail at a different end of the tonal range.
- Replace the JPEG sweep with a WebP and AVIF sweep (
cv2.imencode(".webp", ...)) and plot PSNR against kilobytes for all three codecs on one axis to reproduce Section 1.5's rate-distortion comparison. - Turn the script into a small batch tool that runs over the Kodak image suite (linked in the bibliography) and reports the mean PSNR and SSIM per quality level, the way compression papers do.
Complete Solution
import sys
import cv2
import numpy as np
import matplotlib.pyplot as plt
from skimage.metrics import peak_signal_noise_ratio as psnr
from skimage.metrics import structural_similarity as ssim
def load_rgb(path):
bgr = cv2.imread(path, cv2.IMREAD_COLOR)
if bgr is None:
raise FileNotFoundError(path)
return cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
def downsample_naive(img, factor):
return img[::factor, ::factor] # skip samples, no filter
def downsample_filtered(img, factor):
h, w = img.shape[:2]
return cv2.resize(img, (w // factor, h // factor),
interpolation=cv2.INTER_AREA) # area = pre-filter then sample
def quantize_bits(img, bits):
levels = 2 ** bits
step = 256 // levels
q = (img // step) * step + step // 2 # snap to bin centers
return np.clip(q, 0, 255).astype(np.uint8)
def jpeg_roundtrip(img_rgb, quality):
bgr = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2BGR)
ok, buf = cv2.imencode(".jpg", bgr, [cv2.IMWRITE_JPEG_QUALITY, quality])
decoded = cv2.imdecode(buf, cv2.IMREAD_COLOR)
return cv2.cvtColor(decoded, cv2.COLOR_BGR2RGB), buf.nbytes
def main(path):
img = load_rgb(path)
print(f"shape={img.shape} dtype={img.dtype} "
f"min={int(img.min())} max={int(img.max())}")
# Aliasing demonstration.
factor = 8
naive = downsample_naive(img, factor)
filt = downsample_filtered(img, factor)
h, w = img.shape[:2]
naive_up = cv2.resize(naive, (w, h), interpolation=cv2.INTER_NEAREST)
filt_up = cv2.resize(filt, (w, h), interpolation=cv2.INTER_NEAREST)
# Banding ladder.
ladder = [(b, quantize_bits(img, b)) for b in (8, 4, 2, 1)]
# Color decomposition (YCbCr for the contact sheet).
ycc = cv2.cvtColor(img, cv2.COLOR_RGB2YCrCb)
y, cr, cb = cv2.split(ycc)
# JPEG quality sweep.
print(f"{'quality':>8}{'KB':>10}{'PSNR_dB':>10}{'SSIM':>8}")
for q in (95, 75, 50, 25, 10):
decoded, nbytes = jpeg_roundtrip(img, q)
p = psnr(img, decoded)
s = ssim(img, decoded, channel_axis=-1)
print(f"{q:>8}{nbytes / 1024:>10.1f}{p:>10.2f}{s:>8.3f}")
# Contact sheet.
fig, axes = plt.subplots(3, 4, figsize=(14, 10))
axes[0, 0].imshow(naive_up); axes[0, 0].set_title(f"Naive /{factor} (aliased)")
axes[0, 1].imshow(filt_up); axes[0, 1].set_title(f"Area /{factor} (pre-filtered)")
axes[0, 2].axis("off"); axes[0, 3].axis("off")
for ax, (b, q) in zip(axes[1], ladder):
ax.imshow(q); ax.set_title(f"{b}-bit/channel")
axes[2, 0].imshow(y, cmap="gray"); axes[2, 0].set_title("Y (luma)")
axes[2, 1].imshow(cb, cmap="gray"); axes[2, 1].set_title("Cb (chroma)")
axes[2, 2].imshow(cr, cmap="gray"); axes[2, 2].set_title("Cr (chroma)")
axes[2, 3].axis("off")
for ax in axes.ravel():
ax.set_xticks([]); ax.set_yticks([])
fig.tight_layout()
fig.savefig("pipeline_report.png", dpi=150)
print("saved pipeline_report.png")
if __name__ == "__main__":
main(sys.argv[1] if len(sys.argv) > 1 else "input.jpg")
What's Next?
With the image's origin story told, Chapter 2: Point Operations, Histograms & Thresholding starts changing images on purpose. The gamma curves this chapter met inside the ISP become tools you apply yourself; the quantization levels become histogram bins you read like an instrument panel; and the color channels become inputs to the simplest and most-used operation in vision: the threshold. Everything stays per-pixel for one more chapter before neighborhoods, kernels, and convolution enter in Chapter 3.
Bibliography & Further Reading
Foundational Papers
Bayer, B. E. "Color Imaging Array." US Patent 3,971,065 (1976). patents.google.com
The one-page idea inside nearly every camera made since: a 2×2 mosaic of color filters with green doubled, as taught in Section 1.1.
Wallace, G. K. "The JPEG Still Picture Compression Standard." Communications of the ACM 34(4), 1991. dl.acm.org
The classic readable account of the JPEG pipeline (YCbCr, DCT, quantization, entropy coding) from one of its architects; Section 1.5 in its original voice.
Wang, Z., Bovik, A. C., Sheikh, H. R., Simoncelli, E. P. "Image Quality Assessment: From Error Visibility to Structural Similarity." IEEE Transactions on Image Processing 13(4), 2004. cns.nyu.edu/~lcv/ssim
The SSIM paper, with reference code on the authors' page; the start of the perceptual-metric arc this book follows to FID in Chapter 37.
Debevec, P., Malik, J. "Recovering High Dynamic Range Radiance Maps from Photographs." SIGGRAPH 1997. pauldebevec.com
The bracketed-exposure HDR method implemented by OpenCV's createCalibrateDebevec and used in Section 1.3's HDR experiments.
Brooks, T., Mildenhall, B., Xue, T., Chen, J., Sharlet, D., Barron, J. T. "Unprocessing Images for Learned Raw Denoising." CVPR 2019. arXiv:1811.11127
Inverts the ISP stage by stage to synthesize realistic RAW training data; a precise, readable model of the pipeline drawn in Figure 1.1.1.
Books & Reference
Gonzalez, R. C., Woods, R. E. "Digital Image Processing," 4th edition. imageprocessingplace.com
The standard textbook treatment of sampling, quantization, and intensity transformations; Chapters 2 and 8 parallel this chapter at greater mathematical depth.
Szeliski, R. "Computer Vision: Algorithms and Applications," 2nd edition (2022). szeliski.org/Book
Free online; its Chapter 2 covers image formation, photometric optics, and the camera pipeline with full derivations.
Poynton, C. "Frequently Asked Questions about Color" and "Frequently Asked Questions about Gamma." poynton.ca
The classic engineer-oriented explanations of gamma, luma, and color encoding; the definitive antidote to the gamma trap of Section 1.4.
Tools & Libraries
OpenCV Documentation: Color Conversions. docs.opencv.org
The exact formulas and ranges behind every cvtColor flag used in this chapter, including the HSV hue halving and Lab rescaling gotchas.
rawpy: RAW image processing for Python. github.com/letmaik/rawpy
The LibRaw wrapper from Section 1.1's library shortcut: full RAW development (demosaic, white balance, color matrices, gamma) in one call.
scikit-image: the color module. scikit-image.org
Reference-grade color space conversions and Delta E metrics with float precision; the one-line replacement for hand-written Lab math.
WebP Developer Documentation. developers.google.com/speed/webp
Format internals, lossy and lossless modes, and the compression study comparing WebP against JPEG referenced in Section 1.5.
Standards, Formats & Datasets
JPEG AI (ISO/IEC 6048): Learning-based Image Coding. jpeg.org/jpegai
The first neural-network-based international image coding standard, finalized 2024 to 2025; the research-frontier endpoint of Section 1.5.
Ultra HDR Image Format (gain maps), Android Developers. developer.android.com
The backward-compatible HDR file format (JPEG plus gain map) discussed in Section 1.3's research frontier, now standardized as ISO 21496-1.
Kodak Lossless True Color Image Suite. r0k.us/graphics/kodak
The 24-image benchmark set on which three decades of compression papers report PSNR and SSIM; useful for reproducing Section 1.5's sweeps on standard data.