Table of Contents

A practitioner's guide to image processing, classical computer vision, deep learning for vision, and generative vision models.

First Edition (Planning Draft) · 2026

Planning draft: 4 parts · 39 chapters · 219 sections, plus front matter, 5 appendices, and a capstone. Chapter links activate as content is produced; the planned directory path is shown under each chapter.

Front Matter · Why This Book Exists

7 entries
  1. F1
    Why This Book ExistsVision AI spans sixty years of ideas, from convolution kernels to diffusion models; this book teaches them as one connected story.
    front-matter/foreword.html
  2. F2
    What This Book CoversThe four-part arc: pixels, geometry, learning, generation.
    front-matter/fm-what-this-book-covers.html
  3. F3
    Who Should Read This BookEngineers with basic Python and linear algebra; no prior computer vision required.
    front-matter/fm-who-should-read.html
  4. F4
    What's InsideA guided preview of the book's signature elements: worked pipelines, library shortcuts, callouts, and labs.
    front-matter/look-inside-preview.html
  5. F5
    How to Use This BookReading paths for engineers, researchers, and self-study learners; how the parts depend on each other.
    front-matter/fm-how-to-use.html
  6. F6
    About the AuthorsWho wrote this book and how.
    front-matter/about-authors.html
  7. F7
    Copyright & LegalEdition, license, and attribution.
    front-matter/copyright.html

Part I · Image Processing

9 chapters · 49 sections

The signal-processing bedrock: pixels, color, histograms, filtering, frequency, geometry, morphology, and restoration.

  1. 0
    Foundations: The Python Imaging Stack An image is a NumPy array; master the array and the entire vision stack opens up.
    1. 0.1 Images as Arrays: Pixels, Channels & Dtypes
    2. 0.2 The Python Imaging Ecosystem: OpenCV, scikit-image & Pillow
    3. 0.3 Reading, Writing & Displaying Images
    4. 0.4 Conventions & Pitfalls: BGR vs RGB, uint8 vs float, Row-Column Order
    5. 0.5 A First Pipeline: Load, Process, Measure, Save
    part-1-image-processing/module-00-python-imaging-stack/
  2. 1
    Digital Image Fundamentals From photons to pixels: how a digital image is born, encoded, and judged.
    1. 1.1 Image Formation: Optics, Sensors & the ISP Pipeline
    2. 1.2 Sampling & Quantization
    3. 1.3 Resolution, Bit Depth & Dynamic Range
    4. 1.4 Color Science & Color Spaces: RGB, HSV, Lab & YCbCr
    5. 1.5 Image Formats & Compression: PNG, JPEG & WebP
    part-1-image-processing/module-01-digital-image-fundamentals/
  3. 2
    Point Operations, Histograms & Thresholding Per-pixel transforms are the simplest tools in vision, and still among the most used.
    1. 2.1 Brightness, Contrast & Gamma Correction
    2. 2.2 Image Histograms & Statistics
    3. 2.3 Histogram Equalization & CLAHE
    4. 2.4 Thresholding: Global, Otsu & Adaptive
    5. 2.5 Image Arithmetic, Blending & Compositing
    part-1-image-processing/module-02-point-operations-histograms/
  4. 3
    Spatial Filtering & Convolution The kernel is the atom of image processing, and the same operation that powers CNNs in Part III.
    1. 3.1 Convolution & Correlation: The Workhorse Operation
    2. 3.2 Smoothing: Box, Gaussian & Median Filters
    3. 3.3 Sharpening & Unsharp Masking
    4. 3.4 Derivative Filters: Sobel, Laplacian & LoG
    5. 3.5 Edge-Preserving Smoothing: Bilateral & Guided Filters
    6. 3.6 Borders, Separability & Performance
    part-1-image-processing/module-03-spatial-filtering-convolution/
  5. 4
    The Frequency Domain & Multi-Scale Analysis Every image is a sum of waves; seeing it that way explains aliasing, compression, and pyramids in one stroke.
    1. 4.1 Fourier Intuition: Images as Sums of Waves
    2. 4.2 The 2D DFT & FFT in Practice
    3. 4.3 Frequency-Domain Filtering: Low-Pass, High-Pass & Notch
    4. 4.4 The Sampling Theorem, Aliasing & Anti-Aliasing
    5. 4.5 Image Pyramids: Gaussian & Laplacian
    6. 4.6 Wavelets & Time-Frequency Trade-offs
    part-1-image-processing/module-04-frequency-domain-multiscale/
  6. 5
    Geometric Transformations & Image Warping Rotating, rectifying, and registering images: the coordinate machinery behind every camera app.
    1. 5.1 The Transformation Hierarchy: Translation to Projective
    2. 5.2 Homogeneous Coordinates & Transformation Matrices
    3. 5.3 Interpolation: Nearest, Bilinear, Bicubic & Lanczos
    4. 5.4 Warping, Remapping & Inverse Mapping
    5. 5.5 Image Registration & Alignment
    6. 5.6 Worked Example: A Document Scanner from Scratch
    part-1-image-processing/module-05-geometric-transformations/
  7. 6
    Morphology, Binary Images & Shape Once an image is binary, a small algebra of erosions and dilations solves a surprising share of industrial vision.
    1. 6.1 Binary Images, Neighborhoods & Connectivity
    2. 6.2 Erosion & Dilation
    3. 6.3 Opening, Closing & Morphological Gradients
    4. 6.4 Connected Components & Region Properties
    5. 6.5 Distance Transforms & Skeletonization
    6. 6.6 Contours, Moments & Shape Descriptors
    part-1-image-processing/module-06-morphology-binary-shape/
  8. 7
    Image Restoration & Enhancement Undoing damage: noise, blur, missing pixels, and limited dynamic range, with the classical methods deep models later learned to beat.
    1. 7.1 Noise Models & Degradation Pipelines
    2. 7.2 Classical Denoising: From Gaussian to Non-Local Means
    3. 7.3 Deblurring & Deconvolution: Wiener & Richardson-Lucy
    4. 7.4 Inpainting: Filling the Holes
    5. 7.5 Classical Super-Resolution
    6. 7.6 HDR Imaging & Tone Mapping
    part-1-image-processing/module-07-restoration-enhancement/
  9. 8
    Tools of the Trade: The Image Processing Stack Consolidated reference: libraries, performance tooling, datasets, and external resources for this part.
    1. 8.1 Library Landscape: OpenCV, scikit-image, Pillow & SciPy ndimage
    2. 8.2 Performance: Vectorization, OpenCV Optimizations & GPU Arrays
    3. 8.3 Test Images, Datasets & Quality Metrics Tooling
    4. 8.4 Curated References & Further Reading
    part-1-image-processing/module-08-tools-of-the-trade/

Part II · Classical Computer Vision

9 chapters · 48 sections

Vision before learning: features, matching, multi-view geometry, motion, and the recognition pipelines that defined an era.

  1. 9
    Edges, Lines & Curves From raw gradients to structured geometry: the first step from processing images to understanding them.
    1. 9.1 What Is an Edge? Gradients Revisited
    2. 9.2 The Canny Edge Detector, Step by Step
    3. 9.3 The Hough Transform: Lines & Circles
    4. 9.4 Fitting Curves: Least Squares & Robust Alternatives
    5. 9.5 Worked Example: Lane-Marking Detection
    part-2-classical-computer-vision/module-09-edges-lines-curves/
  2. 10
    Keypoints, Descriptors & Matching Find the same point in two photographs and most of geometric vision follows.
    1. 10.1 Corner Detection: Harris, Shi-Tomasi & FAST
    2. 10.2 Scale & Rotation Invariance: Scale Space
    3. 10.3 SIFT: The Descriptor That Defined a Decade
    4. 10.4 Fast Binary Alternatives: BRIEF, ORB & AKAZE
    5. 10.5 Descriptor Matching & the Ratio Test
    6. 10.6 RANSAC & Robust Model Fitting
    part-2-classical-computer-vision/module-10-keypoints-descriptors-matching/
  3. 11
    Classical Segmentation & Grouping Carving an image into meaningful regions with clustering, watersheds, and graphs.
    1. 11.1 Segmentation as Clustering: K-Means & Mean-Shift
    2. 11.2 Region Growing & Split-and-Merge
    3. 11.3 The Watershed Transform
    4. 11.4 Graph-Based Segmentation: Graph Cuts & GrabCut
    5. 11.5 Superpixels: SLIC & Friends
    part-2-classical-computer-vision/module-11-classical-segmentation/
  4. 12
    Camera Models & Calibration The pinhole camera turns 3D into 2D; calibration tells you exactly how.
    1. 12.1 The Pinhole Camera & Intrinsic Parameters
    2. 12.2 Lens Distortion & Its Correction
    3. 12.3 Camera Calibration: Zhang's Method in Practice
    4. 12.4 Extrinsics & Pose Estimation: The PnP Problem
    5. 12.5 Calibration Workflows, Targets & Quality Checks
    part-2-classical-computer-vision/module-12-camera-models-calibration/
  5. 13
    Two-View Geometry, Stereo & Depth Two cameras and a bit of linear algebra recover what one camera lost: depth.
    1. 13.1 Epipolar Geometry: The Geometry of Two Views
    2. 13.2 Essential & Fundamental Matrices
    3. 13.3 Homographies & Panorama Stitching
    4. 13.4 Stereo Rectification & Disparity Estimation
    5. 13.5 From Disparity to Depth Maps
    6. 13.6 Triangulation & 3D Point Recovery
    part-2-classical-computer-vision/module-13-two-view-stereo-depth/
  6. 14
    Structure from Motion & Visual SLAM From a pile of photos to a 3D model, and from a moving camera to a live map.
    1. 14.1 Feature Tracks & Correspondence Across Many Views
    2. 14.2 Incremental Structure from Motion
    3. 14.3 Bundle Adjustment: Polishing the Reconstruction
    4. 14.4 Visual SLAM: Mapping While Moving
    5. 14.5 COLMAP & Modern Reconstruction Pipelines
    part-2-classical-computer-vision/module-14-sfm-visual-slam/
  7. 15
    Motion, Optical Flow & Tracking Video adds time; flow and tracking turn pixel motion into object motion.
    1. 15.1 Motion Fields & the Brightness Constancy Assumption
    2. 15.2 Sparse Flow: Lucas-Kanade & Feature Tracking
    3. 15.3 Dense Flow: Horn-Schunck to Variational Methods
    4. 15.4 Background Subtraction & Change Detection
    5. 15.5 Object Tracking: Mean-Shift, Correlation Filters & Re-Detection
    6. 15.6 Kalman Filters & Multi-Object Data Association
    part-2-classical-computer-vision/module-15-motion-flow-tracking/
  8. 16
    Classical Recognition Pipelines Hand-crafted features plus shallow classifiers ruled recognition for two decades; understanding why they plateaued explains why deep learning won.
    1. 16.1 Template Matching & Its Limits
    2. 16.2 Bag of Visual Words & Spatial Pyramids
    3. 16.3 HOG + SVM: The Pedestrian Detection Era
    4. 16.4 Viola-Jones: Real-Time Face Detection
    5. 16.5 Deformable Part Models
    6. 16.6 Why Hand-Crafted Pipelines Plateaued: The Bridge to Deep Learning
    part-2-classical-computer-vision/module-16-classical-recognition/
  9. 17
    Tools of the Trade: The Classical CV Stack Consolidated reference: libraries, reconstruction tooling, datasets, and external resources for this part.
    1. 17.1 OpenCV Beyond the Basics: features2d, calib3d & video
    2. 17.2 Reconstruction Tooling: COLMAP, OpenMVG & Friends
    3. 17.3 Datasets & Benchmarks for Geometry, Flow & Tracking
    4. 17.4 Curated References & Further Reading
    part-2-classical-computer-vision/module-17-tools-of-the-trade/

Part III · Deep Learning for Computer Vision

12 chapters · 67 sections

Vision learned end to end: CNNs, transformers, detection, segmentation, self-supervision, video, 3D, and deployment.

  1. 18
    Neural Networks & PyTorch for Vision Everything Part III builds on: tensors, autograd, and a training loop you fully understand.
    1. 18.1 From Linear Models to Multi-Layer Perceptrons
    2. 18.2 Backpropagation & Optimization in a Nutshell
    3. 18.3 PyTorch Essentials: Tensors, Autograd & nn.Module
    4. 18.4 Datasets, DataLoaders & Input Pipelines
    5. 18.5 The Training Loop: Losses, Metrics & Checkpointing
    6. 18.6 GPUs, Mixed Precision & Reproducibility
    part-3-deep-learning-for-vision/module-18-neural-networks-pytorch/
  2. 19
    Convolutional Neural Networks The convolution from Chapter 3, made learnable: weight sharing, hierarchy, and the inductive bias that fits images.
    1. 19.1 Why Convolution? Locality, Weight Sharing & Inductive Bias
    2. 19.2 Convolution Layers: Channels, Stride, Padding & Dilation
    3. 19.3 Pooling, Receptive Fields & Feature Hierarchies
    4. 19.4 Batch Normalization & Friends
    5. 19.5 A CNN from Scratch: CIFAR-10 End to End
    6. 19.6 Visualizing What CNNs Learn
    part-3-deep-learning-for-vision/module-19-convolutional-neural-networks/
  3. 20
    CNN Architectures: From LeNet to ConvNeXt A decade of architecture search, told as a story of bottlenecks found and removed.
    1. 20.1 LeNet & AlexNet: The Breakthrough Years
    2. 20.2 VGG & Inception: Depth vs Width
    3. 20.3 ResNet: Residual Learning Changes Everything
    4. 20.4 Efficient Designs: MobileNet, ShuffleNet & EfficientNet
    5. 20.5 ConvNeXt: The CNN, Modernized
    6. 20.6 Choosing an Architecture in Practice
    part-3-deep-learning-for-vision/module-20-cnn-architectures/
  4. 21
    Training Recipes: Data, Augmentation & Transfer In practice the recipe matters as much as the architecture; this chapter is the recipe.
    1. 21.1 Vision Datasets & the ImageNet Legacy
    2. 21.2 Data Augmentation: From Flips to MixUp & CutMix
    3. 21.3 Transfer Learning & Fine-Tuning Strategies
    4. 21.4 Regularization, Schedules & the Modern Training Recipe
    5. 21.5 Class Imbalance, Label Noise & Real-World Data
    6. 21.6 Debugging Training: Curves, Overfitting & Sanity Checks
    part-3-deep-learning-for-vision/module-21-training-recipes/
  5. 22
    Vision Transformers Treat an image as a sequence of patches and the transformer takes over; the question is when that trade is worth it.
    1. 22.1 Attention & the Transformer Block, Vision Edition
    2. 22.2 ViT: Images as Sequences of Patches
    3. 22.3 Data-Efficient Training: DeiT & Augmentation for ViTs
    4. 22.4 Hierarchical Designs: Swin & Pyramid Transformers
    5. 22.5 CNNs vs ViTs: Inductive Bias, Scale & Hybrids
    part-3-deep-learning-for-vision/module-22-vision-transformers/
  6. 23
    Object Detection Where are the objects and what are they: the task that drives much of applied vision.
    1. 23.1 The Detection Problem: Boxes, IoU & mAP
    2. 23.2 Two-Stage Detectors: The R-CNN Family
    3. 23.3 One-Stage Detectors: YOLO, SSD & RetinaNet
    4. 23.4 Anchor-Free & Keypoint-Based Detection
    5. 23.5 DETR: Detection as Set Prediction
    6. 23.6 Training & Deploying a Detector on Custom Data
    part-3-deep-learning-for-vision/module-23-object-detection/
  7. 24
    Segmentation: Semantic, Instance & Promptable From a label per image to a label per pixel, and on to models that segment anything you point at.
    1. 24.1 Semantic Segmentation: FCN, U-Net & DeepLab
    2. 24.2 Instance Segmentation: Mask R-CNN
    3. 24.3 Panoptic Segmentation: Unifying Things & Stuff
    4. 24.4 Transformer Segmenters: SegFormer & Mask2Former
    5. 24.5 Segment Anything: Promptable Segmentation
    6. 24.6 Losses, Metrics & Evaluation for Dense Prediction
    part-3-deep-learning-for-vision/module-24-segmentation/
  8. 25
    Self-Supervised Learning & Vision Foundation Models Labels stopped being the bottleneck: how vision models learn from raw pixels and from language.
    1. 25.1 Pretext Tasks: Learning Without Labels
    2. 25.2 Contrastive Learning: SimCLR & MoCo
    3. 25.3 Self-Distillation & Masked Image Modeling: DINO & MAE
    4. 25.4 CLIP: Language as Supervision
    5. 25.5 Open-Vocabulary Detection & Segmentation
    6. 25.6 The Vision Foundation Model Landscape
    part-3-deep-learning-for-vision/module-25-self-supervised-foundation-models/
  9. 26
    Video Understanding Adding the time axis: actions, motion, and tracking with learned features.
    1. 26.1 From Frames to Clips: The Temporal Dimension
    2. 26.2 Action Recognition: 3D CNNs & Two-Stream Networks
    3. 26.3 Video Transformers
    4. 26.4 Deep Optical Flow: RAFT & Beyond
    5. 26.5 Multi-Object Tracking with Learned Features
    part-3-deep-learning-for-vision/module-26-video-understanding/
  10. 27
    Depth, 3D Vision & Neural Scene Representations Deep networks meet the geometry of Part II: learned depth, point clouds, radiance fields, and splats.
    1. 27.1 Monocular Depth Estimation
    2. 27.2 3D Representations: Point Clouds, Voxels & Meshes
    3. 27.3 Learning on Point Clouds: PointNet & Successors
    4. 27.4 NeRF: Neural Radiance Fields
    5. 27.5 3D Gaussian Splatting
    6. 27.6 Capture-to-Render Pipelines in Practice
    part-3-deep-learning-for-vision/module-27-depth-3d-neural-scenes/
  11. 28
    Efficient Vision & Edge Deployment A model that cannot run on the target hardware is a prototype; this chapter ships it.
    1. 28.1 The Efficiency Toolbox: Quantization, Pruning & Distillation
    2. 28.2 Export & Runtimes: ONNX, TensorRT & OpenVINO
    3. 28.3 Edge & Mobile Vision: From Jetson to Phones
    4. 28.4 Serving Vision Models: Batching, Throughput & Latency
    5. 28.5 Monitoring, Drift & Continual Improvement
    part-3-deep-learning-for-vision/module-28-efficient-vision-deployment/
  12. 29
    Tools of the Trade: The Deep Vision Stack Consolidated reference: model hubs, frameworks, data tooling, and external resources for this part.
    1. 29.1 Model Hubs & Libraries: torchvision, timm, Hugging Face & Ultralytics
    2. 29.2 Detection & Segmentation Frameworks: Detectron2 & MMDetection
    3. 29.3 Data Tooling: Annotation, Versioning, FiftyOne & Roboflow
    4. 29.4 Experiment Tracking, Curated References & Further Reading
    part-3-deep-learning-for-vision/module-29-tools-of-the-trade/

Part IV · Generative Vision Models

9 chapters · 55 sections

Models that create: VAEs, GANs, diffusion, text-to-image, controllable editing, video and 3D generation, evaluation and governance.

  1. 30
    Foundations of Generative Modeling From recognizing images to producing them: what it means to model the distribution of natural images.
    1. 30.1 Generative vs Discriminative: What Does It Mean to Model p(x)?
    2. 30.2 A Map of Generative Families: VAE, GAN, Flow, Autoregressive & Diffusion
    3. 30.3 Latent Variables & the Idea of a Latent Space
    4. 30.4 Energy-Based Models, Score Matching & Langevin Dynamics
    5. 30.5 Sampling, Likelihood & the Quality-Diversity-Speed Trilemma
    6. 30.6 Evaluating Generators: A First Look
    part-4-generative-vision-models/module-30-generative-foundations/
  2. 31
    Autoencoders & Variational Autoencoders Compression as representation, and the probabilistic twist that made decoders generative.
    1. 31.1 Autoencoders: Compression as Representation
    2. 31.2 Denoising & Sparse Autoencoders
    3. 31.3 The VAE: ELBO, Reparameterization & Amortized Inference
    4. 31.4 Disentanglement, beta-VAE & Posterior Collapse
    5. 31.5 Hierarchical VAEs: From Ladder Networks to NVAE
    6. 31.6 Discrete Latents: VQ-VAE & Learned Codebooks
    part-4-generative-vision-models/module-31-autoencoders-vaes/
  3. 32
    Generative Adversarial Networks Two networks in a game: the family that made photorealistic generation possible, and the lessons it left behind.
    1. 32.1 The Adversarial Game
    2. 32.2 Training Pathologies: Mode Collapse & Instability
    3. 32.3 DCGAN to StyleGAN: The Architecture Lineage
    4. 32.4 Conditional GANs & Image-to-Image Translation: pix2pix & CycleGAN
    5. 32.5 GAN Inversion & Latent-Space Editing
    6. 32.6 GANs Today: Where They Still Win
    part-4-generative-vision-models/module-32-gans/
  4. 33
    Diffusion Models Destroy an image with noise, learn to rebuild it, and you get the engine behind modern image generation.
    1. 33.1 Destroying & Rebuilding: The Forward & Reverse Processes
    2. 33.2 DDPM: Noise Schedules, Parameterizations & the Variational View
    3. 33.3 The Score-Based View: VE/VP SDEs & the Probability-Flow ODE
    4. 33.4 Fast Sampling: DDIM, Solvers & Step Distillation
    5. 33.5 Flow Matching, Rectified Flow & Consistency Models
    6. 33.6 Guidance: Classifier & Classifier-Free
    7. 33.7 Latent Diffusion: Compress First, Then Diffuse
    part-4-generative-vision-models/module-33-diffusion-models/
  5. 34
    Text-to-Image Systems Inside the systems that turn a sentence into an image, from CLIP conditioning to full production stacks.
    1. 34.1 Connecting Text & Pixels: CLIP & Text Encoders
    2. 34.2 Inside Stable Diffusion: VAE, U-Net, DiT & Conditioning
    3. 34.3 The Model Landscape: DALL-E, Imagen, Midjourney & FLUX
    4. 34.4 Autoregressive & Token-Based Image Generation
    5. 34.5 Prompt Engineering for Image Generation
    6. 34.6 Fine-Tuning Text-to-Image Models
    part-4-generative-vision-models/module-34-text-to-image/
  6. 35
    Controllable Generation & Image Editing From prompt roulette to precise control: structure, identity, and edits that preserve everything else.
    1. 35.1 Spatial Control: ControlNet & Conditioning Adapters
    2. 35.2 Personalization: LoRA, DreamBooth & Textual Inversion
    3. 35.3 Inpainting, Outpainting & Object Replacement
    4. 35.4 Instruction-Based Editing
    5. 35.5 Real-Image Inversion & Faithful Editing
    6. 35.6 Composing Multi-Step Editing Workflows
    part-4-generative-vision-models/module-35-controllable-generation-editing/
  7. 36
    Video, 3D Generation & World Models Generation grows axes: time, depth, and agency, from video diffusion to world models that learn to simulate.
    1. 36.1 Video Diffusion: Architectures & Temporal Consistency
    2. 36.2 Text-to-Video Systems: Sora-Class Models & the Open Ecosystem
    3. 36.3 Text-to-3D & Image-to-3D Generation
    4. 36.4 Generative Neural Rendering: From Splats to Scenes
    5. 36.5 World Models: Latent Dynamics, RSSM & Learning in Imagination
    6. 36.6 Generative World Simulators: From GAIA-1 to Interactive Environments
    7. 36.7 Predictive World Models: JEPA & Decoder-Free Latents
    8. 36.8 Evaluating World Models: Physical Consistency, Controllability & Coherence
    part-4-generative-vision-models/module-36-video-3d-world-generation/
  8. 37
    Evaluation, Safety & Generative Data Engines Measuring what generators produce, governing how they are used, and putting them to work as synthetic-data engines for the models of Part III.
    1. 37.1 Measuring Image Quality: FID, KID, Precision-Recall & CLIPScore
    2. 37.2 Human Evaluation & Preference Studies
    3. 37.3 Generative Models as Data Engines: Synthetic Data for Training Vision Systems
    4. 37.4 Deepfakes, Detection & Misuse
    5. 37.5 Watermarking & Content Provenance: C2PA & Beyond
    6. 37.6 Licensing, Copyright & Responsible Deployment
    part-4-generative-vision-models/module-37-evaluation-safety-data-engines/
  9. 38
    Tools of the Trade: The Generative Vision Stack Consolidated reference: generation libraries, workflow engines, hosted APIs, and external resources for this part.
    1. 38.1 Hugging Face Diffusers & the Python Generation Stack
    2. 38.2 Node-Based Workflows: ComfyUI & Workflow Engines
    3. 38.3 Hosted Generation APIs & Services
    4. 38.4 Curated References & Further Reading
    part-4-generative-vision-models/module-38-tools-of-the-trade/

Appendices · Reference and Pedagogy

6 appendices
  1. A
    Mathematical Foundations for VisionThe essential linear algebra, probability, optimization, and signal processing behind every chapter.
    appendices/appendix-a-mathematical-foundations/
  2. B
    Datasets & Benchmarks CatalogA per-task reference: classification, detection, segmentation, geometry, flow, video, and generation benchmarks, with licensing notes.
    appendices/appendix-b-datasets-benchmarks/
  3. C
    Course SyllabiTested course tracks built from the book: a one-semester image processing and classical CV course, a deep vision course, and a generative vision course, with week-by-week schedules.
    appendices/appendix-c-course-syllabi/
  4. D
    Reading PathwaysPer-audience reading guides for engineers, researchers, generative-AI practitioners, and self-study learners.
    appendices/appendix-d-reading-pathways/
  5. E
    Cameras, GPUs & Edge Hardware GuideChoosing sensors, lenses, GPUs, and edge devices for vision workloads, from lab prototypes to production lines.
    appendices/appendix-e-cameras-gpus-edge-hardware/
  6. F
    Agents That Helped to Write This BookRoster of the 42 specialist AI agents in the writing pipeline that produced this manuscript, with a card per agent.
    appendices/appendix-f-agent-roster/

Capstone · End-to-End Vision System

1 project
  1. Capstone Project: An End-to-End Vision SystemDesign, build, evaluate, and present a production-grade vision application that spans all four parts: classical preprocessing and geometry, a fine-tuned detector or segmenter, a generative synthetic-data engine, and honest evaluation with deployment.
    capstone/