"People ask how I cover so much material in a single pass. Simple: I slide over everything one local window at a time, keep my stride small, and trust the hierarchy to assemble the big picture by week thirteen."
A Methodically Scheduled Sliding Window
This book was written to be taught from, and this appendix is the teaching manual. It packages the thirty-nine chapters into five semester-length course tracks: an undergraduate course on image processing and classical computer vision built from Parts I and II, an upper-level course on deep learning for vision built from Part III, a graduate course on generative vision models built from Part IV, a focused graduate track, "Generative AI: From Variational Autoencoders to World Models", that drives a probability-first path from VAEs through diffusion to world models, and a code-first survey track, "Building Vision AI with Foundation and Generative Models", that cuts a fast diagonal across the whole book for undergraduate or graduate cohorts in engineering, digital health, and computer science. Each track comes with a week-by-week schedule, a lab program drawn from the chapters' own exercises, graded deliverables, a grading scheme tuned to the material, and a project arc that can culminate in the book's capstone project. Instructors can adopt a track as written or use the tables as a starting grid to rearrange.
1. How to Use These Syllabi
All three tracks assume the same semester shape: thirteen teaching weeks, with roughly three contact hours per week split between class meetings and a supervised lab session. The pacing heuristic that makes the schedules work is one chapter of reading per week as the default, two chapters in weeks where one of them is light or serves as a refresher, and the part-closing "Tools of the Trade" chapters (Chapter 8, Chapter 17, Chapter 29, Chapter 38) assigned as reference reading rather than examinable content. Students consistently report that the chapters read fastest when paired with an open interpreter, so every week's lab reuses the code from that week's reading rather than introducing a parallel codebase.
Each weekly row in the tables below lists three things: the chapters to read before the week's meetings, the lab (always derived from the exercises and worked examples inside those chapters), and the deliverable that gets graded. Labs are designed for two to three hours of supervised work plus a comparable amount of homework polish. Deliverables alternate between short notebook submissions and report-style writeups so that students practice both forms of communication.
The single highest-leverage scheduling decision is to make the labs come from the book's own exercises instead of a separate assignment bank. When the lab is the chapter's worked example extended by its exercises, the reading becomes preparation for something students must do, not something they may skim. Every schedule below applies this rule without exception: there is no lab that cannot be started by re-running a code block from the assigned chapters.
Hardware expectations differ by track. Track 1 runs entirely on student laptops: NumPy, OpenCV, and scikit-image need no GPU. Track 2 needs modest GPU access (a free hosted notebook tier is sufficient for every lab through week 9; weeks 10 and 11 benefit from a dedicated GPU). Track 3 assumes reliable GPU access from week 4 onward, since training even small VAEs, GANs, and diffusion models on CPU turns afternoon labs into overnight jobs. All three tracks end in a final project, and Section 6 explains how the capstone scales across them.
2. Track 1: Image Processing and Classical Computer Vision Undergraduate
This track covers Parts I and II (Chapter 0 through Chapter 17) in one semester. It is the gateway course: the only prerequisites are basic Python and first-year linear algebra, matching the book's own entry bar. The arc moves from pixels and arrays through filtering, frequency, and geometry into the classical vision canon of features, multi-view geometry, motion, and pre-deep-learning recognition. The closing week on Chapter 16 is deliberate: students leave understanding not just how classical pipelines work but why they plateaued, which sets up a sequel course built on Track 2. Table C.1 gives the full schedule.
| Week | Chapters | Lab | Deliverable |
|---|---|---|---|
| 1 | Ch 0, Ch 1 | Build the load-process-measure-save pipeline of Section 0.5 on your own photographs, then break it deliberately with BGR/RGB and dtype mistakes and document the symptoms. | Lab 1 notebook: pipeline plus bug postmortem |
| 2 | Ch 2 | Histogram equalization, CLAHE, and Otsu thresholding on low-contrast images; quantify each fix with histogram statistics. | Lab 2 report: before/after metrics |
| 3 | Ch 3 | Implement 2D convolution from scratch, verify it against cv2.filter2D, then unsharp-mask a blurry scan. | Lab 3 notebook: naive vs separable timing study |
| 4 | Ch 4 | Remove periodic noise with an FFT notch filter; blend two images seamlessly with Laplacian pyramids. | Lab 4; Quiz 1 (Ch 0 to 3) |
| 5 | Ch 5 | The document scanner of Section 5.6: corner detection, homography estimation, warping, and binarization on phone photos. | Lab 5: working scanner demo |
| 6 | Ch 6 | Count and measure objects (coins, cells, machine parts) with thresholding, morphology, connected components, and region properties. | Lab 6; one-page project proposal |
| 7 | Ch 7, Ch 8 (reference) | Denoising bake-off: Gaussian vs median vs bilateral vs non-local means, scored with PSNR and SSIM; Wiener deconvolution of a motion-blurred frame. | Midterm exam (Ch 0 to 7) |
| 8 | Ch 9 | Canny plus Hough lane-marking detection following the worked example of Section 9.5. | Lab 7: lane detector on dashcam clips |
| 9 | Ch 10 | ORB and SIFT keypoint matching with the ratio test, then RANSAC to reject outliers on your own image pairs. | Lab 8; project checkpoint 1: data and baseline |
| 10 | Ch 11, Ch 12 | GrabCut interactive segmentation; calibrate your own camera with a printed checkerboard using Zhang's method. | Lab 9: calibration report with reprojection error |
| 11 | Ch 13 | Panorama stitching end to end: matching, homography estimation, warping, and multi-band blending. | Lab 10; Quiz 2 (Ch 9 to 12) |
| 12 | Ch 14, Ch 15 | Reconstruct a small scene with COLMAP; track feature points through video with Lucas-Kanade. | Lab 11; project checkpoint 2: working pipeline |
| 13 | Ch 16, Ch 17 (reference) | HOG plus SVM pedestrian detection; class discussion of why hand-crafted pipelines plateaued. | Final project demo and report |
Weeks 10 and 12 of Table C.1 each carry two chapters, and both pairings are intentional: Chapter 11 is algorithmically self-contained and pairs well with the hands-on calibration of Chapter 12, while Chapter 14 is taught at survey depth because COLMAP does the heavy lifting in lab. If your semester has a fourteenth week, give it to Chapter 14 and run the COLMAP lab at full depth; it is the one students most often ask to revisit.
3. Track 2: Deep Learning for Computer Vision Upper-Level
This track covers Part III (Chapter 18 through Chapter 29) with targeted refreshers from Parts I and II in week 1. It suits students who have either taken Track 1 or absorbed equivalent material elsewhere; the only hard prerequisites are comfort with NumPy arrays (Chapter 0) and the convolution vocabulary of Chapter 3, both of which the first week refreshes explicitly. The semester builds one competence per week: training loops, CNNs, architectures, recipes, transformers, detection, segmentation, self-supervision, video, 3D, and deployment, in that order, so each lab strictly reuses the machinery of the previous ones. Table C.2 gives the schedule.
| Week | Chapters | Lab | Deliverable |
|---|---|---|---|
| 1 | Ch 18; refreshers: Ch 0, Ch 3 | PyTorch bootcamp: tensors, autograd, and a complete training loop on FashionMNIST written by hand, no high-level trainer allowed. | Lab 1: training loop from scratch |
| 2 | Ch 19 | The CIFAR-10 CNN of Section 19.5 end to end; visualize first-layer filters and feature maps. | Lab 2: accuracy target plus training curves |
| 3 | Ch 20 | Architecture bake-off with timm: ResNet-18 vs MobileNetV3 vs EfficientNet on one dataset, plotted as accuracy vs FLOPs and parameters. | Lab 3: comparison table with analysis |
| 4 | Ch 21 | Augmentation ablation (flips, RandAugment, MixUp) and a transfer-learning fine-tune on a small custom dataset. | Lab 4; one-page project proposal |
| 5 | Ch 22 | Fine-tune a ViT (DeiT-style recipe) and compare against the week 4 CNN at an equal compute budget. | Lab 5; Quiz 1 (Ch 18 to 21) |
| 6 | Ch 23 | Compute IoU and mAP by hand on toy boxes, then train a YOLO detector on a custom dataset following Section 23.6. | Lab 6: trained detector with error analysis |
| 7 | Ch 24 | Train a U-Net for semantic segmentation; prompt SAM on the same images and score both against ground truth. | Lab 7; project checkpoint 1: data card and baseline |
| 8 | Ch 25 | CLIP zero-shot classification; linear probe on DINOv2 features vs your supervised baseline from week 4. | Midterm exam (Ch 18 to 24) |
| 9 | Ch 26 | RAFT optical flow on your own clips, then a simple multi-object tracker: detector plus Kalman filter plus association. | Lab 8: tracker demo on street video |
| 10 | Ch 27 | Monocular depth with a pretrained network; capture and train a NeRF or 3D Gaussian splat of a small scene with nerfstudio. | Lab 9; project checkpoint 2: trained model and evaluation |
| 11 | Ch 28 | Quantize your project model, export it to ONNX, and benchmark latency and accuracy on CPU vs GPU. | Lab 10: deployment report, latency vs accuracy |
| 12 | Ch 29 (reference); project studio | Curate and debug the project dataset with FiftyOne; wire experiment tracking into the project training runs. | Project checkpoint 3: ablation and final evaluation plan |
| 13 | Presentations | Studio time: dry-run talks with peer feedback. | Final project presentation and report |
Two design choices in Table C.2 deserve a note. First, the midterm lands in week 8 rather than week 7 so that segmentation, the most technically dense chapter of the first half, is examined while detection is still fresh. Second, week 12 deliberately assigns no new examinable reading: in every trial run of this track, the projects that received a dedicated studio week before presentations were the ones that shipped working evaluations instead of half-finished training scripts.
4. Track 3: Generative Vision Models Graduate
This track covers Part IV (Chapter 30 through Chapter 38) for graduate students who may arrive from heterogeneous backgrounds: some from a vision course, some from NLP, some from outside machine learning entirely. The first two weeks are therefore a compressed prerequisite sprint through the parts of Part III that Part IV leans on: training mechanics (Chapter 18, Chapter 19), recipes (Chapter 21), transformers (Chapter 22), and CLIP-style representation learning (Chapter 25), since text conditioning in Chapter 34 is unintelligible without it. Diffusion gets two full weeks; it is the center of gravity of the modern generative stack and the chapter students most need to internalize rather than skim. As befits a graduate course, three paper-reading responses tie the book's material to the primary literature. Table C.3 gives the schedule.
| Week | Chapters | Lab | Deliverable |
|---|---|---|---|
| 1 | Prerequisite sprint: Ch 18, Ch 19 | Training-loop bootcamp: fine-tune a small CNN; verify GPU access, mixed precision, and checkpointing on the course cluster. | Lab 1; diagnostic quiz (probability, linear algebra, PyTorch) |
| 2 | Prerequisite sprint: Ch 21, Ch 22, Ch 25 | Fine-tune a ViT; extract CLIP embeddings and build a small image-text retrieval demo. | Lab 2: retrieval demo |
| 3 | Ch 30 | Fit tiny generative models to 2D toy distributions; map each onto the quality-diversity-speed trilemma of Section 30.4. | Lab 3; paper-reading response 1 |
| 4 | Ch 31 | Train an autoencoder and a VAE; latent traversals and a beta sweep over the reconstruction-vs-KL trade-off. | Lab 4: latent-space study |
| 5 | Ch 32 | Train a DCGAN, deliberately induce and diagnose mode collapse, then edit real faces via StyleGAN inversion. | Lab 5; project proposal with compute budget |
| 6 | Ch 33 (Sections 33.1 to 33.3) | DDPM from scratch at MNIST scale: forward process, denoiser training, and the full sampling loop. | Lab 6; paper-reading response 2 |
| 7 | Ch 33 (Sections 33.4 to 33.6) | Sampler comparison (DDIM and modern solvers) and a classifier-free guidance scale sweep with diffusers. | Lab 7: sampler and guidance study |
| 8 | Ch 34 | Dissect Stable Diffusion component by component (text encoder, VAE, U-Net or DiT backbone); run a structured prompt-engineering study. | Take-home midterm; project checkpoint 1 |
| 9 | Ch 35 (Sections 35.1 to 35.3) | ControlNet spatial conditioning plus LoRA personalization on a subject of your choice. | Lab 8: personalization gallery with failure cases |
| 10 | Ch 35 (Sections 35.4 to 35.6) | Real-image inversion and instruction-based editing; compose a multi-step editing workflow that preserves identity. | Lab 9; project checkpoint 2: working prototype |
| 11 | Ch 36 | Run a video-diffusion or image-to-3D pipeline; analyze temporal-consistency failures frame by frame. | Paper-reading response 3 |
| 12 | Ch 37 | Build an evaluation harness: FID, KID, precision-recall, and CLIPScore on project outputs; verify provenance metadata on generated files. | Lab 10: evaluation report; project checkpoint 3 |
| 13 | Ch 38 (reference); presentations | Studio time: final experiments and dry-run talks. | Final presentation and paper-style report |
Week 11 is the fastest-moving slot in any of the three tracks. Chapter 36 teaches the stable concepts (temporal attention, 3D-aware generation, world models as interactive generators), but the flagship systems change yearly: the 2024 to 2026 window alone moved from Sora-class text-to-video previews through open video-diffusion ecosystems to playable world models such as Genie-style interactive environments. A practical pattern is to keep the lab fixed (run one video pipeline, analyze its temporal failures) while swapping in whichever open-weights system is current that semester, and to assign the week's paper-reading response on a paper published within the previous twelve months.
5. Track 4: Generative AI, From Variational Autoencoders to World Models Graduate
This track is the syllabus for "Generative AI: From Variational Autoencoders to World Models", a thirteen-week graduate course taught by Dr. Alexander Apartsin at the Holon Institute of Technology (HIT). Where Track 3 surveys the breadth of Part IV, this track takes a probability-first spine: it builds the field from the latent-variable view forward, deriving the evidence lower bound, score matching, the diffusion variational bound, and flow-matching objectives in sequence, then carries those tools all the way into world models, the generative simulators that learn to roll out an environment's dynamics. It draws on the chapters that were recently expanded with self-contained derivations, PyTorch labs, and graduate-level exercises, so the book can serve as the course's primary text. Table C.5 maps the thirteen weeks to the exact sections that support them.
The course has historically been taught against two excellent external references: Simon Prince's Understanding Deep Learning and Kevin Murphy's Probabilistic Machine Learning: Advanced Topics. With the expanded derivations now in place, this book provides each result from first principles, so it can stand as the primary text with Prince and Murphy as supplementary reading rather than required ones: Prince for additional geometric intuition on the diffusion and flow material, Murphy for the broader probabilistic-modeling context around variational inference and energy-based models.
| Week | Topic | Book Sections |
|---|---|---|
| 1 | Foundations of deep generative modeling: the landscape, divergences, evaluation | 30.1, 30.2, 30.5, 30.6 (Unified View), 37.1 (FID, precision-recall) |
| 2 | VAEs I: latent-variable models, two ELBO derivations, reparameterization, amortization, rate-distortion | 31.1, 31.2, 31.3 |
| 3 | VAEs II: posterior collapse, beta-VAE, hierarchical NVAE, VQ-VAE discrete latents | 31.4, 31.5, 31.6 |
| 4 | Energy-based models and score matching: EBMs, score and denoising score matching, Langevin, NCSN and annealed Langevin | 30.4 |
| 5 | Project proposals (no reading): scope projects against the dataset catalog | Studio; reference: Appendix B (Datasets and Benchmarks) |
| 6 | DDPM: forward and reverse processes, the variational bound, schedules, parameterizations, DDIM | 33.1, 33.2, 33.4 |
| 7 | Score SDEs, samplers, guidance: VE and VP SDEs, reverse-time SDE, probability-flow ODE, predictor-corrector, EDM, classifier and classifier-free guidance, inverse problems | 33.3, 33.4, 33.6 |
| 8 | Interim presentations (no reading) | Studio |
| 9 | Diffusion at scale: latent diffusion, DiT, flow matching, rectified flow, consistency models, text-to-image | 33.5, 33.7, 34.1, 34.2 |
| 10 | World models I: latent dynamics, RSSM, PlaNet, the Dreamer line, TD-MPC2 | 36.5 |
| 11 | World models II: generative world simulators (GAIA-1, Genie, UniSim, DIAMOND, Sora, Cosmos) | 36.6 |
| 12 | World models III: predictive embeddings (JEPA), evaluation, open problems | 36.7, 36.8 |
| 13 | Final presentations (no reading) | Studio |
Each section listed in Table C.5 now contains from-scratch derivations, runnable PyTorch labs, and end-of-section exercises pitched at the level of graduate assignments, so an instructor can build problem sets and lab sheets directly from the assigned reading without authoring parallel material.
This track presumes the deep-learning background it builds on: convolutional networks, transformers, and the optimization and training mechanics behind them. That material lives in Part III (especially Chapter 18 on PyTorch and training loops, Chapter 19 on CNNs, Chapter 21 on training recipes, and Chapter 22 on transformers) and in the mathematical refresher of Appendix A. Students arriving without it should treat those as a pre-course reading assignment.
The course is organized around seven learning outcomes. Table C.6 maps each outcome to the sections where the book develops it, so the assessment plan can be tied directly to the reading rather than asserted independently of it.
| Learning Outcome | Where Developed |
|---|---|
| Derive the core training objectives of generative models: the ELBO, score-matching, the DDPM variational bound, and flow-matching losses | 31.3, 30.4, 33.2, 33.5 |
| Build and train variational autoencoders and reason about latent structure, posterior collapse, and discrete latents | 31.1, 31.2, 31.4, 31.5, 31.6 |
| Connect energy-based models, score matching, and the diffusion SDE and ODE formulations into one continuous-time view | 30.4, 33.3, 30.6 |
| Implement, sample from, and control diffusion models, including DDIM, modern solvers, and classifier-free guidance | 33.1, 33.4, 33.6 |
| Scale generative models with latent diffusion, transformer backbones, flow matching, and text conditioning | 33.5, 33.7, 34.1, 34.2 |
| Build and evaluate world models, from latent-dynamics agents to generative world simulators and predictive embeddings | 36.5, 36.6, 36.7, 36.8 |
| Evaluate generative models credibly with distributional and sample-quality metrics, avoiding common measurement pitfalls | Chapter 37 (starting with 37.1) |
6. Track 5: Building Vision AI with Foundation and Generative Models Undergraduate Graduate
This track is a thirteen-week, code-first survey of modern vision AI that runs across engineering, digital-health, and computer-science cohorts and can be taught to undergraduate or graduate students alike. Where the earlier tracks march through one part of the book in order, this one cuts a fast diagonal: it opens with the image-processing fundamentals every practitioner needs, climbs straight into CNNs, detection, and segmentation, then spends its second half on the foundation and generative models (ViT and DINOv2, CLIP and BLIP-2, GANs, Stable Diffusion, ControlNet) that define the current toolset. It is built around production libraries (PyTorch, OpenCV, Hugging Face Diffusers and Transformers, Stable Diffusion, YOLO, SAM, and CLIP) and a project that runs the length of the course. Because the book is self-contained, every topic first develops the concept from scratch and then shows the library one-liner that does it in practice, so this book can be the course's sole textbook with no external reference required. Table C.7 maps the thirteen weeks to the exact sections that support them.
Every section listed in Table C.7 pairs a from-scratch explanation with the production library that ships it (PyTorch, OpenCV, Diffusers, Transformers, YOLO, SAM, or CLIP), and each one carries end-of-section exercises plus a runnable lab. That pairing is what makes the track project-based and code-first without an external assignment bank: students read the derivation, run the library call on the same page, and extend it in lab the same week.
| Week | Topic | Book Sections |
|---|---|---|
| 1 | Image processing fundamentals: OpenCV and NumPy, point operations and histograms, spatial filtering, the frequency domain, and edges | Chapter 0, Chapter 2, Chapter 3, Chapter 4, 9.1, 9.2 (edges) |
| 2 | CNNs and image classification: neural networks and PyTorch, convolutional networks, ResNet-family architectures, training recipes and transfer learning | Chapter 18, Chapter 19, Chapter 20, Chapter 21 |
| 3 | Object detection: bounding boxes, YOLO, and the dataset, annotation, and experiment-tracking workflow (Roboflow-style) | Chapter 23, Chapter 29 (datasets, annotation, experiment tracking) |
| 4 | Semantic segmentation: U-Net, masks, and the Segment Anything Model with open-vocabulary detection and segmentation | Chapter 24, 25.5 (open-vocabulary detection and segmentation, SAM) |
| 5 | Project proposal (no new reading): scope the project, choose datasets and baselines, fix the evaluation metrics | Studio; reference: Appendix B (Datasets and Benchmarks), 37.1 (evaluation metrics) |
| 6 | Vision transformers: ViT, attention for images, and self-supervised foundation models (DINOv2 and DINOv3) | Chapter 22, 25.1, 25.2, 25.3, 25.6 |
| 7 | Multimodal models and CLIP: contrastive vision-language pretraining, BLIP-2, and visual question answering | 25.4 (CLIP), 25.6 (generative vision-language models and visual question answering) |
| 8 | Interim presentations (no reading) | Studio |
| 9 | Generative models, GANs: adversarial training, StyleGAN, and image-to-image translation | Chapter 32 |
| 10 | Diffusion models: the denoising process, text-to-image, and Stable Diffusion with Hugging Face Diffusers and ComfyUI | Chapter 33, Chapter 34, Chapter 38 (Diffusers, ComfyUI) |
| 11 | Image editing and ControlNet: inpainting, ControlNet spatial conditioning, and img2img | Chapter 35 |
| 12 | Video understanding and 3D vision: optical flow, video models, monocular depth, and NeRF | Chapter 15 (motion and optical flow), Chapter 26, Chapter 27 |
| 13 | Final presentations (no reading) | Studio |
The track's project component, in which teams generate their own unique data and fine-tune task-specific models, is supported directly by the book's infrastructure chapters. Appendix B supplies the dataset and benchmark catalog for sourcing and licensing data, and the four "Tools of the Trade" chapters carry the workflow: Chapter 8 for the imaging stack, Chapter 17 for classical pipelines, Chapter 29 for dataset curation, annotation, and experiment tracking, and Chapter 38 for the generative-model tooling (Diffusers and ComfyUI) used to synthesize task-specific training data.
7. Grading Schemes That Match the Material
A grading scheme is a statement about what the course values, and the three tracks value different things. Track 1 is skills-dense and lab-driven, so labs dominate. Track 2 balances labs against a substantial project, because training and evaluating a real model is the competence the course exists to certify. Track 3 is project-heavy with a literature component, as graduate courses should be. Track 4 follows the same graduate grading philosophy as Track 3, with a project and presentations carrying most of the weight. Table C.4 summarizes the recommended weights for the three lab-driven tracks; each row sums to 100 percent.
| Component | Track 1 (UG) | Track 2 (Upper-Level) | Track 3 (Graduate) |
|---|---|---|---|
| Labs (best n minus 1 of n) | 40% | 35% | 30% |
| Quizzes | 10% | 10% | 0% |
| Paper-reading responses | 0% | 0% | 10% |
| Midterm exam | 20% | 15% | 15% (take-home) |
| Final project | 25% | 40% | 45% |
| Participation | 5% | 0% | 0% |
Three policies have proven worth adopting alongside the weights in Table C.4. First, drop each student's single lowest lab score ("best n minus 1"); it removes the bad-week incentive to submit plagiarized work. Second, split the project grade explicitly across milestones (Section 7) rather than awarding it all at the end, so that procrastination is priced in early and cheaply. Third, publish the weights as executable code on day one. The snippet below is the entire policy for Track 2, and handing it to students as code removes a whole category of end-of-semester disputes.
WEIGHTS = {"labs": 0.35, "quizzes": 0.10, "midterm": 0.15, "project": 0.40}
scores = {"labs": 90.0, "quizzes": 80.0, "midterm": 80.0, "project": 88.0}
assert abs(sum(WEIGHTS.values()) - 1.0) < 1e-9 # weights must sum to one
final = sum(WEIGHTS[k] * scores[k] for k in WEIGHTS)
print(f"Final grade: {final:.1f}") # Final grade: 86.7
Grading schemes obey a conservation law suspiciously similar to the quality-diversity-speed trilemma of Chapter 30: you can have rigorous exams, ambitious projects, or a manageable grading workload, and the literature contains no confirmed sighting of all three in one semester.
8. Project Milestones and the Capstone as Final Project
All three tracks share the same four-milestone project arc, visible in the deliverable columns of Tables C.1 to C.3: a one-page proposal around week 5 or 6, a checkpoint with data and a baseline around week 7 to 9, a working-prototype checkpoint around week 10 to 12, and a final demonstration with a written report in week 13. The milestone weights within the project grade are roughly 10 percent proposal, 20 percent per checkpoint, and 50 percent final deliverable. The grading question at every checkpoint is deliberately narrow: not "is this impressive?" but "does the pipeline run end to end on real data today?" A project that runs badly at checkpoint 1 nearly always finishes; a project that promises to run brilliantly later often does not.
The natural final project for any of these tracks is the book's capstone: an end-to-end vision system spanning classical preprocessing and geometry, a fine-tuned detector or segmenter, a generative synthetic-data engine, and honest evaluation with deployment. The full capstone is intentionally larger than one semester's project slot, so each track adopts the slice that matches its material:
- Track 1 assigns the classical slice: the acquisition, preprocessing, calibration, and geometric stages of the capstone, ending in a measured, reproducible classical pipeline (for example a measurement or inspection system built entirely from Parts I and II machinery).
- Track 2 assigns the learned-perception slice: fine-tune the capstone's detector or segmenter on a custom dataset, evaluate it honestly with the metrics of Chapter 23 and Chapter 24, and ship it through the export and benchmarking workflow of Chapter 28.
- Track 3 assigns the generative slice: build the synthetic-data engine of Chapter 37 around a controllable generator from Chapter 35, then demonstrate with a rigorous evaluation harness whether the synthetic data improves a downstream perception model.
For a two-semester sequence (Track 1 followed by Track 2, or Track 2 followed by Track 3), the full capstone works as a year-long project: the first semester's final deliverable becomes the second semester's checkpoint 1, and students experience the rare pedagogical pleasure of building on their own prior work instead of starting over.
Who: An instructor running Track 2 with 42 students in 14 project teams.
Situation: In a previous run with a single end-of-semester deadline, five teams had arrived in week 13 with elaborate slide decks and no working model.
Problem: The failure was invisible until it was unfixable: nothing in the course structure forced a running pipeline before the final week.
Decision: The instructor adopted the milestone arc above and graded checkpoint 1 on exactly one criterion: a script that downloads the data, trains the baseline, and prints a metric, executed live during lab.
Result: In the milestone-based run, every team had a working baseline by week 7; the weakest final project still demonstrated a functioning detector with an honest error analysis.
Lesson: Milestones do not make projects more ambitious; they make failure cheap and early, which is what actually raises the quality floor.
9. Adapting the Tracks
The thirteen-week grid compresses or stretches without redesign. For a ten-week quarter, remove the quiz weeks' second meetings, fold the two Tools-of-the-Trade reference readings into adjacent weeks, and cut one chapter per track: Track 1 drops Chapter 14 to survey depth, Track 2 drops Chapter 26, and Track 3 compresses the prerequisite sprint into a single week with a mandatory pre-quarter reading assignment. For a fifteen-week semester, give the recovered weeks to Chapter 13 and project studio time in Track 1, to Chapter 25 and deployment in Track 2, and to a second week of Chapter 34 in Track 3.
Self-study learners can run any track solo by treating the lab column as the weekly contract and the deliverable column as a public commitment (a blog post, a repository tag, a short demo video). The reading-path guides in Appendix D complement these schedules for readers who want a goal-directed path through the book without the semester structure, and the dataset catalog in Appendix B supplies drop-in replacements for any lab dataset that does not fit a reader's domain or license constraints.
Every lab in Tables C.1 to C.3 names the section whose code it extends, and that is the support contract: a student who is stuck should be pointed first to the chapter's worked example, then to the relevant exercises, and only then to TA-written hints. Keeping the labs anchored in the book's own code means office hours debug the student's understanding, not a third codebase nobody in the room has read.