Chapter 17: Tools of the Trade: The Classical CV Stack

"Everyone hands me their correspondences swearing they are inliers. I keep a Huber loss for exactly this kind of optimism."
A Bundle Adjuster With Trust Issues

Chapter Overview

Part II turned the question "how do I improve this image?" into "what is in it, and where is it in space?". Chapter 10 taught us to find and match features, Chapter 12 to calibrate the cameras that saw them, Chapter 13 and Chapter 14 to triangulate them into three-dimensional structure, Chapter 15 to follow them through time, and Chapter 16 to assemble them into recognizers. Each chapter validated its theory with library calls and standard datasets, but never paused to survey the toolbox itself. This chapter is that pause: the workshop inventory at the end of Part II, the classical-vision counterpart of Chapter 8.

It is a reference, not a narrative, and it has four shelves. Section 17.1 maps the OpenCV modules that powered this part: features2d for detectors, descriptors, and matchers, calib3d for everything from chessboard calibration to semi-global stereo matching, and video for optical flow, trackers, and Kalman filtering, along with the packaging decisions (opencv-python versus opencv-contrib-python) that silently decide which of those functions you can import at all.

Section 17.2 moves from single function calls to whole-pipeline tools: COLMAP and its Python bindings, the global structure-from-motion (SfM) challenger GLOMAP, the OpenMVG and OpenMVS pair, AliceVision's Meshroom, the simultaneous localization and mapping (SLAM) systems descended from ORB-SLAM, and the optimization engines (Ceres, g2o, GTSAM) that all of them stand on. Section 17.3 gathers the experimental infrastructure: the datasets and benchmarks for matching, stereo, multi-view reconstruction, optical flow, tracking, and SLAM, together with the metrics (end-point error, D1, ATE, HOTA) and the tooling that computes them honestly. Section 17.4 closes with a curated reading map: the two books that anchor the field, one canonical paper per chapter of this part, the lecture series worth your evenings, and a sustainable workflow for staying current.

Use the chapter the way you use a workshop: walk in when something is needed. Reach for Section 17.1 when an OpenCV import fails or a geometric function exists and you did not know it; for Section 17.2 the first time a project says "we need a 3D model of this"; for Section 17.3 before you trust any number, yours or a paper's; and for Section 17.4 when you want depth the book cannot fit. Like its siblings, Chapter 29 for deep learning and Chapter 38 for generative models, it consolidates a part of the book into something you can keep open in a second tab for years.

Big Picture

Classical computer vision in production is a three-layer stack: OpenCV's geometry and tracking modules at the bottom, whole-pipeline reconstruction tools like COLMAP in the middle, and benchmark datasets with agreed metrics on top to keep every layer honest. The mathematics of Chapters 9 through 16 does not change; knowing this stack decides whether applying it takes an afternoon or a quarter. The three-word ladder to keep is verbs, pipelines, scoreboards: Section 17.1 holds the verbs (single OpenCV calls), Section 17.2 the pipelines that chain them (COLMAP and its ecosystem), and Section 17.3 the scoreboards that keep both honest, with Section 17.4 the reading map behind all three.

Learning Objectives

Navigate OpenCV's features2d, calib3d, and video modules: know which function solves which geometric problem and which pip wheel ships it.
Run a full image-to-3D-model pipeline with COLMAP and pycolmap, and know when GLOMAP, OpenMVG, Meshroom, or a SLAM system is the better tool.
Match every Part II task (matching, stereo, multi-view reconstruction, flow, tracking, SLAM) to its standard benchmark and its standard metric.
Evaluate trajectories and flow fields with the right tooling (evo, official devkits) and avoid the classic evaluation traps: alignment, scale, sparse ground truth, and benchmark overfitting.
Build a personal further-reading map for classical vision: which book, lecture series, or paper to consult for each chapter of Part II, and how to stay current as learned geometry arrives.

Prerequisites

This chapter consolidates all of Part II, so any chapter from Chapter 9 onward enriches it, but three carry most of the weight. Chapter 10: Keypoints, Descriptors & Matching supplies the vocabulary of Section 17.1's feature tour; Chapter 12: Camera Models & Calibration and Chapter 14: Structure from Motion & Visual SLAM explain what the reconstruction tools of Section 17.2 actually compute. The evaluation discussion in Section 17.3 assumes the stereo metrics of Chapter 13 and the flow and tracking formulations of Chapter 15. Library mechanics (NumPy conventions, dtype traps) were settled in Chapter 8 and are not repeated here.

Chapter Roadmap

17.1 OpenCV Beyond the Basics: features2d, calib3d & video The modules behind Part II: detector and matcher inventories, the calib3d geometry verbs, USAC robust estimation, stereo and tracking APIs, and the pip packaging traps.
17.2 Reconstruction Tooling: COLMAP, OpenMVG & Friends Whole-pipeline tools for images-to-3D: COLMAP and pycolmap in practice, GLOMAP's global SfM, OpenMVG/OpenMVS, Meshroom, SLAM systems, and the Ceres-class optimizers underneath.
17.3 Datasets & Benchmarks for Geometry, Flow & Tracking The proving grounds: HPatches to ETH3D, Middlebury to KITTI, Sintel to Spring, TUM RGB-D to TartanAir, with their metrics, devkits, and the hygiene rules for using them.
17.4 Curated References & Further Reading An annotated map of the books, canonical papers, and lecture series behind Part II, plus a low-effort workflow for staying current as geometry goes neural.

Fun Fact

The classical-vision toolchain is unusually long-lived. The RANSAC paper that powers every findHomography call was published in 1981, the Lucas-Kanade tracker the same year, and the Kalman filter in 1960, before digital images were common enough to track anything in. Meanwhile COLMAP, released in 2016, has become the de facto data factory for the newest methods in the book: most NeRF and Gaussian-splatting papers of 2023 through 2026 begin with the sentence "camera poses were estimated with COLMAP". The average age of the code in a modern 3D pipeline is older than many of its users.

What's Next?

With the Part II workshop inventoried, the book crosses its biggest divide. Chapter 18: Neural Networks & PyTorch for Vision opens Part III, where features stop being engineered and start being learned. The arc is deliberate: the descriptors of Chapter 10 return as learned embeddings, the recognition pipelines of Chapter 16 return as end-to-end networks, and the geometry of this part returns in Chapter 27, where COLMAP's poses feed neural scene representations. Nothing in this chapter's toolbox becomes obsolete; it becomes infrastructure.

Bibliography & Further Reading

Foundational Papers

Fischler, M.A. and Bolles, R.C. "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography." Communications of the ACM 24(6) (1981). ACM Digital Library

The robust-estimation paradigm behind every findHomography and findEssentialMat call in Section 17.1; still the default fitting loop of classical vision 45 years on.

Zhang, Z. "A Flexible New Technique for Camera Calibration." IEEE TPAMI 22(11) (2000). Microsoft Research

The planar-target calibration method implemented by cv2.calibrateCamera, and the reason every robotics lab owns a printed chessboard.

Hirschmüller, H. "Stereo Processing by Semiglobal Matching and Mutual Information." IEEE TPAMI 30(2) (2008). IEEE Xplore

The semi-global matching algorithm behind cv2.StereoSGBM and a large fraction of industrial stereo depth before deep stereo arrived.

Schönberger, J.L. and Frahm, J.-M. "Structure-from-Motion Revisited." CVPR (2016). CVF Open Access

The COLMAP paper: the incremental SfM design that Section 17.2 treats as the reference pipeline of the reconstruction ecosystem.

Pan, L., Barath, D., Pollefeys, M., and Schönberger, J.L. "Global Structure-from-Motion Revisited." ECCV (2024). arXiv:2407.20219

GLOMAP: global SfM made as robust as incremental, at a fraction of the runtime, reusing COLMAP's database format; Section 17.2's main 2024-era update to the tooling story.

Campos, C. et al. "ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM." IEEE Transactions on Robotics 37(6) (2021). arXiv:2007.11898

The reference open-source SLAM system; the real-time counterpart to COLMAP's offline reconstruction in Section 17.2.

Wang, S. et al. "VGGT: Visual Geometry Grounded Transformer." CVPR (2025). arXiv:2503.11651

A feed-forward transformer that predicts cameras, depth, and 3D points from images in one pass; the strongest 2025 signal of where the Section 17.2 toolchain is heading.

Books

Hartley, R. and Zisserman, A. "Multiple View Geometry in Computer Vision." Cambridge University Press, 2nd ed. (2004). Book website

The definitive reference for Chapters 12 through 14; Section 17.4 maps its parts onto this book's and explains why it remains the debugging manual even for learned geometry.

Szeliski, R. "Computer Vision: Algorithms and Applications." Springer, 2nd ed. (2022). Free online edition

Free, current, and encyclopedic; its feature, stereo, and SfM chapters are the natural second pass over all of Part II.

Tools & Libraries

OpenCV 4.x documentation: features2d, calib3d, and video modules. docs.opencv.org

The authoritative reference for every function Section 17.1 inventories, including the USAC robust-estimation flags and the tracker API.

COLMAP documentation and repository. colmap.github.io

Installation, CLI reference, camera models, and the FAQ that answers most reconstruction failures; pycolmap ships from the same repository.

Moulon, P., Monasse, P., Perrot, R., and Marlet, R. "OpenMVG: Open Multiple View Geometry." RRPR Workshop (2016). GitHub repository

The cleanest readable implementation of the multi-view geometry in Part II; Section 17.2 recommends its source code as a study companion to Hartley and Zisserman.

Agarwal, S., Mierle, K., and the Ceres Solver team. "Ceres Solver." ceres-solver.org

The nonlinear least-squares engine inside COLMAP and much of the mapping industry; the practical face of the bundle adjustment math from Chapter 14.

Grupp, M. "evo: Python package for the evaluation of odometry and SLAM." GitHub repository

The standard tool for ATE/RPE trajectory evaluation in Section 17.3, with alignment, scale correction, and publication-quality plots built in.

Datasets & Benchmarks

Geiger, A., Lenz, P., and Urtasun, R. "Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite." CVPR (2012). KITTI benchmark site

The benchmark suite that anchors Section 17.3: stereo, optical flow, odometry, and tracking with LiDAR-derived ground truth and a withheld test server.

Butler, D.J., Wulff, J., Stanley, G.B., and Black, M.J. "A Naturalistic Open Source Movie for Optical Flow Evaluation (MPI Sintel)." ECCV (2012). Benchmark site

Dense, perfect synthetic ground truth for optical flow; the complement to KITTI's sparse real-world labels and still a standard table column in 2026 flow papers.

Sturm, J., Engelhard, N., Endres, F., Burgard, W., and Cremers, D. "A Benchmark for the Evaluation of RGB-D SLAM Systems." IROS (2012). TUM RGB-D dataset

The dataset that standardized ATE and RPE for SLAM evaluation; its file formats are the lingua franca that tools like evo speak.