Part I: Image Processing | Building Vision AI: From Pixels to Generative Models

Part Overview

Every system in this book, from an edge detector to a text-to-image model, begins with the same humble object: a rectangular grid of numbers. Part I is the story of that grid. Before a network can recognize a pedestrian or a diffusion model can paint one, someone has to load the image correctly, untangle its color channels, balance its contrast, suppress its noise, and put its geometry right. That craft is image processing, and it predates deep learning by half a century. The nine chapters in this part teach it the way a working engineer needs it: with NumPy arrays on screen, OpenCV and scikit-image at hand, and a clear account of the signal-processing ideas underneath each function call.

The chapters build in deliberate order. Chapter 0 sets up the workbench: images as NumPy arrays, the Python imaging ecosystem, and the conventions (BGR versus RGB, uint8 versus float) that cause most beginner bugs. Chapter 1 steps back to ask where the array comes from, following light from photons through sensors, sampling, and quantization to color spaces and compressed files. Chapter 2 then makes the first changes to an image, one pixel at a time: brightness, gamma, histograms, and thresholding. Chapter 3 widens the view from a single pixel to its neighborhood and introduces convolution, the single most important operation in this book. Chapter 4 changes the coordinate system entirely, viewing images as sums of waves; the frequency domain explains aliasing, JPEG compression, and image pyramids in one stroke. Chapter 5 moves the pixels themselves, with the transformation matrices and interpolation methods behind rotation, rectification, and registration. Chapter 6 reduces images to binary masks and shows how a small algebra of erosions and dilations carries a surprising share of industrial vision. Chapter 7 assembles everything into restoration: undoing noise, blur, missing pixels, and limited dynamic range. Chapter 8 closes the part with a consolidated reference for its libraries, performance tooling, and datasets.

A recurring thread runs through this part and far beyond it: ideas introduced here classically return later, learned. The convolution kernel of Chapter 3 becomes the trainable CNN layer of Part III. The denoising problem of Chapter 7 becomes, almost verbatim, the training objective of diffusion models in Part IV, and inpainting returns there as generative editing. Part I is therefore not a warm-up to skim; it is the vocabulary the rest of the book is written in. Read it with an interpreter open, run the pipelines, and the later parts will feel like extensions of tools you already own.

Chapter 0 Foundations: The Python Imaging Stack

An image is a NumPy array; master the array and the entire vision stack opens up.

Chapter 1 Digital Image Fundamentals

From photons to pixels: how a digital image is born, encoded, and judged.

Chapter 2 Point Operations, Histograms & Thresholding

Per-pixel transforms are the simplest tools in vision, and still among the most used.

Chapter 3 Spatial Filtering & Convolution

The kernel is the atom of image processing, and the same operation that powers CNNs in Part III.

Chapter 4 The Frequency Domain & Multi-Scale Analysis

Every image is a sum of waves; seeing it that way explains aliasing, compression, and pyramids in one stroke.

Chapter 5 Geometric Transformations & Image Warping

Rotating, rectifying, and registering images: the coordinate machinery behind every camera app.

Chapter 6 Morphology, Binary Images & Shape

Once an image is binary, a small algebra of erosions and dilations solves a surprising share of industrial vision.

Chapter 7 Image Restoration & Enhancement

Undoing damage: noise, blur, missing pixels, and limited dynamic range, with the classical methods deep models later learned to beat.

Chapter 8 Tools of the Trade: The Image Processing Stack

Consolidated reference: libraries, performance tooling, datasets, and external resources for this part.

Where This Part Leads

With the pixel-level toolkit in hand, the book moves from processing images to understanding them. Part II: Classical Computer Vision builds directly on these chapters: the derivative filters of Chapter 3 grow into edge and corner detectors, the geometric machinery of Chapter 5 expands into camera models and multi-view geometry, and the binary-shape analysis of Chapter 6 feeds classical segmentation and the recognition pipelines that defined vision before deep learning.