"A photograph tells you where everything is. A video tells you where everything is going, provided you can keep up with thirty deadlines per second."
An Overcommitted Motion Vector
Video is not a stack of unrelated photographs; it is a measurement of motion, and this chapter builds the classical instruments that read it. Two questions organize everything. First, how does each pixel move from one frame to the next? That is optical flow, and it powers stabilization, frame interpolation, motion segmentation, and the feature tracks that fed Chapter 14's reconstructions. Second, how does an object move across hundreds of frames, through occlusions, lighting changes, and lookalike distractors? That is tracking, and it requires not just measurement but memory: a motion model, an uncertainty estimate, and a policy for deciding which detection belongs to which identity. Both questions return, learned end to end, in Chapter 26, and the state-estimation machinery built here grows into the world models of Chapter 36.
Chapter Overview
Every chapter so far has treated the image as a frozen instant. Chapter 14 came closest to motion: it moved a camera through a static world and recovered geometry from the parallax. This chapter inverts the arrangement. Now the world itself moves, and the question is no longer "where is the camera?" but "what is moving, where is it going, and which thing is it?" Those are genuinely different problems. A pixel that brightens might be an object arriving, a shadow leaving, or a cloud uncovering the sun, and nothing in a single frame can tell you which. Time is a new measurement axis, and like every measurement axis it comes with its own noise, its own aliasing, and its own beautiful, treacherous ambiguities.
The first half of the chapter is about pixel motion. Section 15.1 lays the foundations: the distinction between the motion field (the geometric truth) and optical flow (what brightness patterns appear to do), the brightness constancy assumption that links them, and the aperture problem that makes flow locally unknowable in one direction. Section 15.2 turns one unsolvable equation into a solvable system by assuming flow is constant over a small window: the Lucas-Kanade method, whose normal-equations matrix turns out to be exactly the structure tensor of Chapter 10. Corners, it emerges, are not just matchable; they are trackable, and the KLT tracker built on this insight has run for four decades. Section 15.3 goes dense: Horn-Schunck's global energy, the smoothness prior that fills in flow where data is silent, and the variational lineage that leads to the modern OpenCV workhorses.
The second half is about object motion. Section 15.4 exploits the easiest special case, a static camera, where modeling each pixel's history separates the boring background from the interesting foreground: frame differencing, running Gaussians, and the mixture-of-Gaussians models that survive waving trees and flickering lamps. Section 15.5 follows a single chosen object through the frame: mean-shift climbing a color-histogram likelihood, correlation filters matching templates at Fourier speed, and the drift-versus-adaptation dilemma that haunts every template update. Section 15.6 adds the missing ingredients for multi-object tracking: the Kalman filter, which predicts where each object will be and how sure to be about it, and data association, which decides via the Hungarian algorithm which detection belongs to which track. The SORT family assembled from these pieces still underpins production trackers today.
One theme runs through all six sections. Motion estimation is ill-posed: the data never fully determines the answer, so every method is an assumption about constancy plus a prior about smoothness, wrapped around the same least-squares core. Lucas-Kanade assumes local constancy; Horn-Schunck assumes global smoothness; background subtraction assumes temporal stationarity; the Kalman filter assumes dynamic linearity. When the assumption holds, the method works; when it breaks, the failure is diagnosable. Learning that diagnosis is the real skill this chapter teaches, and it transfers intact to the deep methods of Chapter 26, which replace the hand-chosen priors with learned ones but inherit every one of the underlying ambiguities.
Prerequisites
This chapter leans hard on image gradients and convolution from Chapter 3: Spatial Filtering & Convolution; the optical flow constraint is built from spatial and temporal derivatives. Pyramids from Chapter 4: The Frequency Domain & Multi-Scale Analysis reappear in pyramidal Lucas-Kanade and coarse-to-fine flow, and Chapter 4's convolution theorem explains why correlation-filter trackers run at hundreds of frames per second. Histograms and back-projection from Chapter 2: Point Operations, Histograms & Thresholding drive mean-shift tracking, and the morphological cleanup of Chapter 6: Morphology, Binary Images & Shape turns raw foreground masks into usable blobs. The structure tensor and Shi-Tomasi detector from Chapter 10: Keypoints, Descriptors & Matching are reused directly in Section 15.2. Familiarity with Chapter 14: Structure from Motion & Visual SLAM helps motivate feature tracks but is not required.
Chapter Roadmap
- 15.1 Motion Fields & the Brightness Constancy Assumption What motion looks like from inside an image: the motion field versus optical flow, the brightness constancy assumption, the optical flow constraint equation, and the aperture problem.
- 15.2 Sparse Flow: Lucas-Kanade & Feature Tracking Solving the unsolvable by assuming local constancy: least-squares flow in a window, the structure tensor as trackability test, pyramidal coarse-to-fine refinement, and the KLT tracker.
- 15.3 Dense Flow: Horn-Schunck to Variational Methods A flow vector for every pixel: the Horn-Schunck energy, smoothness priors that fill in untextured regions, robust penalties at motion boundaries, and the Farneback and DIS workhorses.
- 15.4 Background Subtraction & Change Detection Static cameras and per-pixel memory: frame differencing, running averages, mixture-of-Gaussians background models, shadow handling, and morphological mask cleanup into object blobs.
- 15.5 Object Tracking: Mean-Shift, Correlation Filters & Re-Detection Following one object through the frame: histogram back-projection and mean-shift, MOSSE and KCF correlation filters at Fourier speed, drift, and re-detection strategies.
- 15.6 Kalman Filters & Multi-Object Data Association Tracking as state estimation: the predict-update cycle, uncertainty propagation, gating, the Hungarian algorithm, and the SORT family of detection-based multi-object trackers.
What's Next?
With motion, the classical toolkit is nearly complete: Part II has found edges, keypoints, regions, geometry, and now trajectories. What it has not yet done is name things. Chapter 16: Classical Recognition Pipelines takes up recognition the pre-deep-learning way: bag-of-visual-words built from Chapter 10's descriptors, HOG templates, deformable part models, and the classifiers that powered a decade of detection. It is the chapter where classical vision reaches for semantics and discovers its limits, which is exactly the cliffhanger that Part III's neural networks resolve.
Bibliography & Further Reading
Foundational Papers
Recent Research (2020-2026)
Books
filterpy library is used in Section 15.6's library shortcut.