Section 12.5: Calibration Workflows, Targets & Quality Checks

"My calibration was perfect in June. Since then I have ridden in three delivery trucks, baked on a dashboard, and someone 'just adjusted' my focus ring. I no longer know who I am."
A Slightly Decalibrated Production Camera

Big Picture

The quality of a calibration is decided before the solver runs, by the target you chose, the views you captured, and the discipline of the rig, and it is verified after the solver runs by diagnostics that look past the single RMS number. Section 12.3 gave you the estimator; this section gives you the craft around it: which pattern to print, how to wave it, how to read residuals like an engineer rather than a gambler, and how to keep a calibration honest for the months of vibration, temperature, and well-meaning colleagues that follow. Calibration is not a step in your pipeline; it is an asset with a lifecycle.

A weary cartoon camera looks dizzy amid heat shimmer, a bumpy truck road, and a helpful hand nudging its focus ring, while its formerly crisp photo beside it has gone subtly skewed, illustrating how heat, vibration, transport, and well-meaning adjustments silently decalibrate a production camera over time. — A calibration is perishable: heat, vibration, transport, and a colleague who just adjusted the focus ring quietly turn yesterday's measurement instrument into modern art, which is why production systems monitor and recalibrate.

The illustration above is the failure this whole section is built to prevent: a once-crisp camera worn out of calibration by the months of vibration, temperature, and well-meaning colleagues that follow installation.

Everything so far in this chapter has assumed the calibration data was good. In practice, data quality is where calibrations are won and lost: the same solver, fed a lazy capture session, returns parameters that are confidently wrong in ways that surface weeks later as bent point clouds in Chapter 13 or drifting maps in Chapter 14. This section is the field manual: targets, capture protocol, diagnostics, and operations.

1. Choosing a Target Basic

Three families of planar targets dominate practice, compared in Figure 12.5.1. The classic checkerboard offers the best localization primitive in vision: a saddle point between four squares, locatable to a few hundredths of a pixel and unbiased under perspective and modest distortion. Its weakness is bookkeeping: the standard detector requires the entire board visible in every frame, which fights directly against the advice to push the board into the image corners. The circle grid detects fast via blob centroids, but a circle images as an ellipse whose centroid is not the projected circle center; under strong perspective or distortion this bias contaminates exactly the measurements calibration cares about. The modern default is the ChArUco board: a checkerboard whose white squares carry small ArUco markers (from Section 12.4), so every visible corner is individually identified. Partial views work, occlusion works, and corners keep checkerboard-grade subpixel accuracy.

Figure 12.5.1 Target trade-offs. Checkerboards localize best but demand full visibility; circle grids detect easily but their ellipse centroids are biased under perspective; ChArUco boards identify every corner individually, allowing partial views into frame corners, which is precisely where distortion calibration needs data. Whatever you print, mount it flat: glass, aluminum composite, or at minimum stiff foam board.

The physical print matters as much as the pattern. Paper taped to a wall ripples; a 1 mm bow on a 30 cm board is a 1-part-in-300 violation of the planarity assumption at the heart of Zhang's math, comfortably enough to corrupt the third decimal of your focal length. Mount targets on glass or aluminum composite panel, verify the printed square size with calipers (printers rescale silently, as noted in Section 12.3), and prefer matte lamination to glossy, which blooms under lights.

2. A Capture Protocol That Constrains Every Parameter Intermediate

Each parameter of the camera model is constrained by a specific kind of view, so a capture session is not "take 20 pictures", it is "buy information for every parameter." The working checklist:

15 to 30 accepted views. Below 10, the LM optimum is noise-shaped; beyond 30, returns diminish unless coverage is still improving.
Tilt the board, up to about 45 degrees, in both axes. Tilted views are what separate focal length from distance (the degeneracy from Section 12.3); a session of fronto-parallel views leaves $f_x, f_y$ nearly unobservable, however many images you take.
Cover the full frame, especially corners. Distortion coefficients are fitted where corners are observed; a center-only dataset extrapolates the polynomial into the corners, and extrapolated polynomials do what extrapolated polynomials always do. With ChArUco, deliberately hang the board partially out of frame.
Vary the distance over at least a factor of two around the working distance of your application.
Lock focus, lock exposure, kill autofocus and stabilization. Autofocus changes the focal length between frames (focus breathing); optical stabilization moves the lens, shifting the principal point per frame. Both make "the" intrinsics a moving target.
Reject blur. Motion blur shifts detected corners by more than the subpixel accuracy you are paying for. Hold the board still, or light the scene to allow short exposures; a variance-of-Laplacian sharpness gate from Chapter 1's toolbox makes rejection automatic.

Key Insight: Every Parameter Is Paid For by a Region of Capture Space

The capture checklist above is easier to remember as a four-line price list, because each parameter is bought by one kind of view:

Tilt buys focal length ($f_x, f_y$).
Corners buy the distortion coefficients.
Spread buys the principal point ($c_x, c_y$).
Count buys noise resilience.

This is why a single RMS number cannot certify a calibration: a dataset can be excellent for the parameters it exercised and silent about the rest, and the optimizer will happily report a tight fit on the silent ones too. When a downstream system misbehaves (a stereo rig in Chapter 13 that triangulates bent walls, say), ask first which parameter would cause it, then ask whether your capture set actually paid for that parameter (tilt, corners, spread, or count).

3. ChArUco Calibration with the Modern API Intermediate

OpenCV 4.7 reorganized the ArUco interfaces around detector objects (the same release that introduced the ArucoDetector of Section 12.4), and ChArUco calibration became pleasantly short: a CharucoDetector finds and identifies corners, and the board object itself converts them into matched 3D-2D arrays for the standard calibrateCamera. No manual corner-ID bookkeeping survives in user code.

# ChArUco calibration with the OpenCV 4.7+ object API: CharucoDetector finds
# identified corners and board.matchImagePoints turns them into 3D-2D arrays.
# Because corners are individually identified, partial board views still count.
import glob
import cv2
import numpy as np

aruco = cv2.aruco
dictionary = aruco.getPredefinedDictionary(aruco.DICT_5X5_100)
# 7 x 5 squares, 35 mm squares, 26 mm markers: measure your actual print.
board = aruco.CharucoBoard((7, 5), 0.035, 0.026, dictionary)
detector = aruco.CharucoDetector(board)

all_obj, all_img, used = [], [], 0
for path in sorted(glob.glob("charuco/*.png")):
    gray = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
    ch_corners, ch_ids, _, _ = detector.detectBoard(gray)
    if ch_ids is None or len(ch_ids) < 8:        # too few corners constrains nothing
        continue
    obj_pts, img_pts = board.matchImagePoints(ch_corners, ch_ids)
    all_obj.append(obj_pts)
    all_img.append(img_pts)
    used += 1

rms, K, dist, rvecs, tvecs = cv2.calibrateCamera(
    all_obj, all_img, gray.shape[::-1], None, None)
print(f"{used} usable views, RMS = {rms:.3f} px")
print("fx, fy =", round(K[0, 0], 1), round(K[1, 1], 1))
# 26 usable views, RMS = 0.164 px
# fx, fy = 1539.1 1538.6

Code Fragment 1: ChArUco calibration with the OpenCV 4.7+ object API. The len(ch_ids) < 8 guard skips views too sparse to constrain anything; because every corner carries an ID, views with the board half out of frame still contribute their visible corners (26 usable views, 0.164 px RMS here), which is exactly how the frame's edges get covered.

Library Shortcut: CharucoDetector Retires Your Correspondence Bookkeeping

Before this API, partial-board calibration meant writing the bookkeeping yourself: detect raw ArUco markers, interpolate checkerboard corners from each marker's neighborhood, maintain the corner-ID-to-board-coordinate table across views, and assemble ragged per-view arrays in exactly the layout calibrateCamera expects, a solid 80 lines of indexing logic where every off-by-one silently degrades the result. The pair detector.detectBoard + board.matchImagePoints is the same logic in 2 lines, an 80-to-2 reduction, with the corner interpolation, ID matching, and array marshaling handled (and tested) inside the library. If you need still more automation, the interactive calibration application that ships with OpenCV and the mrcal toolkit wrap the entire session, capture gating included.

4. Quality Beyond RMS: Residual Forensics Intermediate

The RMS is a single number summarizing thousands of residual vectors, and the vectors know more than their average. Three diagnostics, in increasing order of revelation. First, per-view errors (built in Section 12.3) expose individual bad frames. Second, parameter standard deviations from calibrateCameraExtended expose unconstrained parameters. Third, and most informative, the residual quiver plot: draw every corner's residual as an arrow at its image position. A healthy calibration shows structureless arrows, isotropic noise with no preferred direction. Structure is a confession: arrows swirling radially near the corners mean the distortion model has too few terms (the model-family failure from Section 12.2); arrows that flip direction across the board in one view mean that board moved or its print is not flat; a global drift pattern points at rolling shutter or synchronization issues.

# Build the residual quiver plot: collect every corner's reprojection residual
# vector and draw it as an arrow at its image position. Structureless arrows
# mean a good fit; radial swirls or per-view drift name a specific failure.
import matplotlib.pyplot as plt

pos, res = [], []
for op, ip, rv, tv in zip(all_obj, all_img, rvecs, tvecs):
    proj, _ = cv2.projectPoints(op, rv, tv, K, dist)
    pos.append(ip.reshape(-1, 2))
    res.append((ip - proj).reshape(-1, 2))
pos, res = np.vstack(pos), np.vstack(res)

plt.figure(figsize=(9, 5.5))
# scale=0.02 in xy units draws arrows 50x their true (subpixel) length.
plt.quiver(pos[:, 0], pos[:, 1], res[:, 0], res[:, 1],
           angles="xy", scale_units="xy", scale=0.02, width=0.002)
plt.gca().invert_yaxis()                      # image coordinates: y grows down
plt.title("Reprojection residuals (arrows exaggerated 50x)")
plt.savefig("residual_quiver.png", dpi=150)

print(f"mean |residual| = {np.linalg.norm(res, axis=1).mean():.3f} px")
print(f"99th percentile = {np.percentile(np.linalg.norm(res, axis=1), 99):.3f} px")
# mean |residual| = 0.139 px
# 99th percentile = 0.501 px

Code Fragment 2: The residual quiver plot built with plt.quiver, the most honest picture a calibration can give of itself. Arrows are exaggerated 50x via scale=0.02 so subpixel residuals (mean 0.139 px here) are visible; any pattern (radial swirls, per-view alignment, left-right drift) names a specific failure: insufficient distortion model, non-flat board, or rolling shutter respectively.

A final validation habit borrowed from machine learning: hold out three or four views, calibrate without them, and reproject onto the held-out corners. Held-out error close to training error means the calibration generalizes; held-out error several times larger means the model memorized a too-small or too-uniform capture set, which is overfitting in its photogrammetric costume, the same generalization logic that recurs throughout Chapter 21.

Practical Example: Eight Cameras, One Hot Stadium Roof

Who and what. A sports-analytics company tracks players with eight calibrated cameras around a stadium, triangulating positions to about 10 cm. Intrinsics and extrinsics were calibrated at installation in spring, with an exemplary 0.15 px RMS.

The problem. By midsummer, cross-camera triangulations disagreed by up to 40 cm in the afternoon but recovered overnight. Nothing had been touched; the calibration files were unchanged; each camera's video looked individually fine.

The decision. The team added a permanent monitor: every few minutes, the system reprojected surveyed, fixed features of the venue (pitch corners, line intersections, known banner edges) through each camera's stored calibration and logged the residual. The logs showed residuals tracking the roof temperature with a two-hour lag: thermal expansion of mounts was rotating cameras by tenths of a degree, and lens heating was shifting focal lengths by a few parts per thousand, immaterial alone, ruinous across a 100 m baseline.

The result and the lesson. Fixes were proportionate: extrinsics are now re-estimated hourly from the surveyed features (a PnP solve from Section 12.4, no boards needed), while intrinsics get a scheduled quarterly recalibration. Triangulation error returned under 12 cm at all hours. The lesson: a calibration is a measurement of a physical system that drifts, so production systems monitor reprojection on known geometry continuously, and they separate the fast-drifting extrinsics from the slow-drifting intrinsics in their maintenance schedule.

5. Calibration as an Operational Asset Basic

A calibration that lives in a variable inside one script dies with that script. Persist it with metadata sufficient to audit it later: the matrix and distortion vector, image size (intrinsics are resolution-specific), RMS and per-view statistics, capture date, and the identity of the physical camera and lens, because, as the gauging-station story in Section 12.3 showed, calibration binds to a serial number, not to a product model. OpenCV's FileStorage writes YAML or XML that every OpenCV binding reads back natively.

# Persist a calibration with the metadata needed to audit it later: the matrix
# and distortion, the resolution they are valid for, RMS, and the physical
# camera serial and date, then read it back to confirm a lossless round trip.
fs = cv2.FileStorage("cam_LX4407_2026-06-11.yml", cv2.FILE_STORAGE_WRITE)
fs.write("image_width", 1920); fs.write("image_height", 1080)
fs.write("K", K); fs.write("dist", dist)
fs.write("rms_px", rms)
fs.write("camera_serial", "LX-4407"); fs.write("lens", "8mm-f1.8, focus locked")
fs.write("calibrated_on", "2026-06-11")
fs.release()

fs = cv2.FileStorage("cam_LX4407_2026-06-11.yml", cv2.FILE_STORAGE_READ)
K_loaded = fs.getNode("K").mat()
dist_loaded = fs.getNode("dist").mat()
fs.release()
assert np.allclose(K_loaded, K)
print("calibration restored:", K_loaded[0, 0], "px focal")
# calibration restored: 1539.1 px focal

Code Fragment 3: Persisting a calibration with its audit trail using cv2.FileStorage, then reading it back and asserting np.allclose on the recovered K. The filename encodes camera serial and date; the payload carries the resolution it is valid for, because the same sensor streaming at a different resolution needs scaled intrinsics.

Finally, know the recalibration triggers by heart: any lens or focus change, any transport or impact, any mount adjustment, large temperature excursions, and (for intrinsics) any change of streaming resolution or crop mode. Rolling-shutter cameras deserve a special caution: if the board (or camera) moves during the row-by-row exposure, corners within one frame are sampled at different instants and the board is effectively non-rigid; calibrate rolling-shutter cameras with both the camera and the target stationary per shot. These operational habits compound: teams that version calibrations, gate them on diagnostics, and monitor reprojection in production spend their debugging time on actual vision problems instead of phantom geometry bugs in Chapter 14-style pipelines.

Fun Fact: The Most Calibrated Objects on Earth

Spacecraft cameras hold the calibration high score. The navigation cameras on Mars rovers are calibrated before launch in thermal-vacuum chambers across the wide temperature range they will face, because there is no driving to Mars with a checkerboard afterward; geometric stability across temperature is a first-class design requirement, not an afterthought. Your stadium rig, by comparison, is allowed the luxury of a quarterly recalibration visit.

Research Frontier: Richer Models, Honest Uncertainty, and the Targetless Future

The craft in this section is itself a research area. The mrcal toolkit (mrcal.secretsauce.net, with its 2.4 release in 2024) replaces low-order polynomials with rich splined lens models, then propagates calibration uncertainty into every downstream measurement, telling you not just the parameters but how much any triangulated point should be trusted, the engineering-grade version of the standard deviations met in Section 12.3. Kalibr remains the reference for joint camera-IMU calibration in robotics, where time offsets between sensors are estimated alongside geometry. On the learned side, GeoCalib (ECCV 2024, arXiv:2409.06704) and the feed-forward geometry models VGGT (CVPR 2025, arXiv:2503.11651) and DUSt3R make casual, targetless capture usable for reconstruction, and the practical frontier is hybrid: let learned models bootstrap geometry from found footage, and reserve the target-based workflow of this section for systems where millimeters are money.

Exercise 12.5.1: Design a Minimal Session Conceptual

You are allowed exactly ten board views to calibrate a 90 degree FOV camera for a parcel-measuring station. (a) Specify each view: board position in the frame, tilt, and distance, and state which parameter(s) each view chiefly pays for. (b) Explain which single view you would sacrifice first if limited to nine, and why. (c) A colleague proposes ten fronto-parallel views, each in a different part of the frame, arguing this covers the frame fully. Predict the resulting standard deviations pattern from calibrateCameraExtended: which parameters look healthy, and which are quietly unconstrained?

Exercise 12.5.2: A Coverage Monitor for Capture Sessions Coding

Write a live capture assistant: stream the webcam, run CharucoDetector.detectBoard per frame, and accumulate every accepted corner into an 8 x 6 grid of frame cells. Render the grid as a heatmap overlay, green where corner counts exceed 50 and red where below 10, and additionally track the histogram of board tilt angles (from each view's solvePnP rotation). The session ends only when every cell is green and at least a third of views exceed 25 degrees of tilt. Calibrate, and compare your RMS and focal standard deviation against a lazy 15-view handheld session.

Exercise 12.5.3: Residual Forensics on a Sabotaged Dataset Analysis

Take a good 25-view capture set and create three corrupted variants: (a) recalibrate with k2, k3 fixed to zero (CALIB_FIX_K2 | CALIB_FIX_K3) to simulate an insufficient distortion model; (b) digitally shear one view's image by 0.5 degrees to simulate a moved board; (c) drop all views whose board center is in the outer third of the frame to simulate lazy coverage. For each variant, produce the residual quiver plot and the parameter standard deviations, and write down the visual signature that identifies each failure. You are building the diagnostic lookup table this section claims exists; verify it.

The exercises above sharpen individual instincts; the Hands-On Lab below assembles the chapter's full three-beat arc, project, calibrate, and locate, into a single script you can run with no camera and no downloaded images.

Hands-On Lab: Build a Synthetic Calibration and Pose Workbench

Difficulty: Intermediate Duration: 60 to 90 minutes

Build one self-contained script, calib_workbench.py, that invents a virtual camera with known intrinsics and distortion, renders a printed checkerboard from many synthetic viewpoints, recovers the camera parameters with Zhang's method, then estimates a fresh pose with PnP and verifies it by reprojection. Because the script generates its own scene, you can compare every recovered number against the exact ground truth, the one luxury real calibration never grants, which turns the whole chapter into something you can falsify on your own machine.

What You'll Practice

Projecting 3D points through the pinhole model and intrinsic matrix $K$ with cv2.projectPoints (Section 12.1).
Adding and later recovering Brown-Conrady radial distortion (Section 12.2).
Running cv2.calibrateCamera over synthetic views and reading its RMS reprojection error (Section 12.3).
Solving the Perspective-n-Point problem with cv2.solvePnP and measuring reprojection residuals (Section 12.4).
Judging a calibration against ground truth, the diagnostic mindset of this section (Section 12.5).

Setup

pip install numpy opencv-python

No image files and no physical checkerboard are needed: the script defines the board geometry numerically and projects it, so it runs start to finish on any machine, including a headless one.

Put the section's workflow concepts into practice below. Work through the steps in order; each prints a checkpoint so you can confirm progress before moving on. A complete reference solution is folded at the end.

Step 1: Define a ground-truth camera

Invent the camera you are pretending to own: a resolution, an intrinsic matrix $K$, and a five-coefficient distortion vector. Everything downstream is judged against these numbers.

import numpy as np
import cv2

W, H = 1280, 960
K_true = np.array([[1000.0,    0.0, 640.0],
                   [   0.0, 1000.0, 480.0],
                   [   0.0,    0.0,   1.0]])
# TODO: define dist_true as a (5,) array [k1, k2, p1, p2, k3]
# Hint: a mild barrel lens has a negative k1, for example -0.25, with smaller k2, p1, p2, k3
dist_true = ...
print("ground-truth focal:", K_true[0, 0], "px")

Hint

Try dist_true = np.array([-0.25, 0.10, 0.001, -0.001, 0.02]). Keep $k_1$ dominant and negative so the barrel effect is visible but the polynomial stays well-behaved across the frame.

Step 2: Build the checkerboard's 3D object points

A planar target lives at $z=0$ in its own coordinate frame. Lay out the inner-corner grid in metric units so the recovered translations come out in meters.

COLS, ROWS = 9, 6          # inner corners
SQUARE = 0.025             # 25 mm squares

def board_points(cols, rows, square):
    pts = np.zeros((rows * cols, 3), np.float32)
    # TODO: fill columns 0 and 1 with the x, y grid (column 2 stays 0 for a planar board)
    # Hint: np.mgrid[0:cols, 0:rows].T.reshape(-1, 2) * square
    ...
    return pts

objp = board_points(COLS, ROWS, SQUARE)
print("object points:", objp.shape)   # (54, 3)

Hint

pts[:, :2] = np.mgrid[0:cols, 0:rows].T.reshape(-1, 2) then multiply by square. The board is flat, so the third coordinate is left at zero, exactly the planarity Zhang's method exploits.

Step 3: Synthesize many views by projecting the board

Replace a capture session with a loop: for each random board pose, project the object points through the ground-truth camera (distortion included) to get the pixel corners a detector would have found.

rng = np.random.default_rng(0)

def random_pose():
    rvec = rng.uniform(-0.5, 0.5, 3).astype(np.float64)        # tilt in radians
    tvec = np.array([rng.uniform(-0.1, 0.1),
                     rng.uniform(-0.1, 0.1),
                     rng.uniform(0.4, 0.8)], np.float64)        # board 0.4 to 0.8 m away
    return rvec, tvec

image_points = []
for _ in range(20):
    rvec, tvec = random_pose()
    # TODO: project objp with cv2.projectPoints using K_true and dist_true
    # keep only views whose corners all fall inside [0, W) x [0, H)
    ...
print("usable views:", len(image_points))

Hint

proj, _ = cv2.projectPoints(objp, rvec, tvec, K_true, dist_true) returns an (N,1,2) array. Accept the view only if proj is fully in bounds; discard and resample otherwise so calibration never sees off-screen corners.

Step 4: Calibrate and compare against ground truth

Run Zhang's method on the synthetic corners as if you had never seen K_true, then hold the recovered numbers up against the truth you secretly know.

obj_list = [objp] * len(image_points)
rms, K_est, dist_est, rvecs, tvecs = cv2.calibrateCamera(
    obj_list, image_points, (W, H), None, None)

print(f"RMS reprojection error: {rms:.4f} px")
# TODO: print the focal-length error |K_est[0,0] - K_true[0,0]|
#       and the principal-point error, in pixels
...

Hint

With noiseless synthetic corners the RMS should be a small fraction of a pixel and K_est should match K_true to within a pixel or two. If it does not, your Step 3 views are too similar; widen the tilt range so every parameter is constrained.

Step 5: Add corner noise and watch the error grow

Real detectors are not exact. Perturb the corners with sub-pixel Gaussian noise, recalibrate, and confirm that RMS and the parameter errors rise together, the link this section asks you to internalize.

noisy_points = [p + rng.normal(0, 0.3, p.shape).astype(np.float32)
                for p in image_points]
# TODO: recalibrate with noisy_points and report the new RMS and focal error
...

Hint

A 0.3 px corner noise typically lifts the RMS into the few-tenths-of-a-pixel range and the focal error to a handful of pixels. The takeaway: RMS tracks corner-localization quality, which is why subpixel refinement matters.

Step 6: Estimate a fresh pose with PnP and reproject

Calibration over, switch to the per-frame job. Generate one brand-new pose, project it through the recovered camera, solve PnP for the pose, and measure how far the reprojected corners land from where they should.

rvec_gt, tvec_gt = random_pose()
proj_gt, _ = cv2.projectPoints(objp, rvec_gt, tvec_gt, K_est, dist_est)

# TODO: recover the pose with cv2.solvePnP(objp, proj_gt, K_est, dist_est)
#       then reproject and print the mean pixel reprojection error
...

Hint

ok, rvec_pnp, tvec_pnp = cv2.solvePnP(objp, proj_gt, K_est, dist_est). Reproject with the recovered pose and compare to proj_gt; the mean error should be a tiny fraction of a pixel because the inputs are consistent.

Step 7: The Right Tool, the robust library shortcut

You solved a clean PnP; production data carries outliers. Corrupt a few corners and let OpenCV's RANSAC variant reject them in one call, mirroring this chapter's robust-pose theme.

corrupt = proj_gt.copy()
corrupt[[3, 17, 40]] += rng.uniform(-50, 50, (3, 1, 2)).astype(np.float64)

# A robust pose in one call: RANSAC finds the inliers for you.
ok, rvec_r, tvec_r, inliers = cv2.solvePnPRansac(objp, corrupt, K_est, dist_est)
# TODO: print how many inliers RANSAC kept and confirm the three corrupted
#       indices are absent from the inlier set
...

Hint

inliers is an index array; the corrupted rows 3, 17, and 40 should not appear in it. The single solvePnPRansac call replaces a hand-written sample-and-vote loop of dozens of lines, the same Right Tool collapse seen throughout the book.

Expected Output

The finished script prints a short report. Step 4's noiseless calibration shows an RMS well under 0.01 px and a focal-length error of at most a pixel or two, with the recovered distortion close to dist_true. Step 5's noisy run lifts the RMS to a few tenths of a pixel and the focal error to a handful of pixels, demonstrating the RMS-versus-accuracy link. Step 6's clean PnP reprojects to a small fraction of a pixel. Step 7 reports that RANSAC kept the honest corners and dropped the three you sabotaged. Seeing recovered numbers converge on the ground truth you planted is the proof that the project-calibrate-locate pipeline of this chapter actually closes.

Stretch Goals

Sweep the view count from 3 to 30 and plot focal-length error against the number of views, reproducing the diminishing-returns curve that justifies the capture protocol of this section.
Fix k3 to zero during calibration (cv2.CALIB_FIX_K3) on a barrel lens with a real k3 and watch the residuals develop the radial signature described in Exercise 12.5.3.
Replace the synthetic projection with a real webcam and a printed checkerboard, swapping projectPoints for cv2.findChessboardCorners, and confirm your workbench transfers unchanged to physical capture, the bridge to Chapter 13's stereo rig.

Complete Solution

import numpy as np
import cv2

# ---- Step 1: ground-truth camera ----
W, H = 1280, 960
K_true = np.array([[1000.0,    0.0, 640.0],
                   [   0.0, 1000.0, 480.0],
                   [   0.0,    0.0,   1.0]])
dist_true = np.array([-0.25, 0.10, 0.001, -0.001, 0.02])

# ---- Step 2: board object points ----
COLS, ROWS, SQUARE = 9, 6, 0.025
def board_points(cols, rows, square):
    pts = np.zeros((rows * cols, 3), np.float32)
    pts[:, :2] = np.mgrid[0:cols, 0:rows].T.reshape(-1, 2)
    return pts * square
objp = board_points(COLS, ROWS, SQUARE)

# ---- Step 3: synthesize views ----
rng = np.random.default_rng(0)
def random_pose():
    rvec = rng.uniform(-0.5, 0.5, 3).astype(np.float64)
    tvec = np.array([rng.uniform(-0.1, 0.1), rng.uniform(-0.1, 0.1),
                     rng.uniform(0.4, 0.8)], np.float64)
    return rvec, tvec

image_points = []
while len(image_points) < 20:
    rvec, tvec = random_pose()
    proj, _ = cv2.projectPoints(objp, rvec, tvec, K_true, dist_true)
    p = proj.reshape(-1, 2)
    if (p[:, 0] >= 0).all() and (p[:, 0] < W).all() and \
       (p[:, 1] >= 0).all() and (p[:, 1] < H).all():
        image_points.append(proj.astype(np.float32))

# ---- Step 4: calibrate and compare ----
obj_list = [objp] * len(image_points)
rms, K_est, dist_est, rvecs, tvecs = cv2.calibrateCamera(
    obj_list, image_points, (W, H), None, None)
print(f"RMS reprojection error: {rms:.4f} px")
print(f"focal error: {abs(K_est[0,0] - K_true[0,0]):.3f} px")
print(f"principal-point error: "
      f"{abs(K_est[0,2]-K_true[0,2]):.3f}, {abs(K_est[1,2]-K_true[1,2]):.3f} px")

# ---- Step 5: noisy corners ----
noisy_points = [p + rng.normal(0, 0.3, p.shape).astype(np.float32)
                for p in image_points]
rms_n, K_n, dist_n, _, _ = cv2.calibrateCamera(
    obj_list, noisy_points, (W, H), None, None)
print(f"noisy RMS: {rms_n:.4f} px, noisy focal error: "
      f"{abs(K_n[0,0]-K_true[0,0]):.3f} px")

# ---- Step 6: fresh pose with PnP ----
rvec_gt, tvec_gt = random_pose()
proj_gt, _ = cv2.projectPoints(objp, rvec_gt, tvec_gt, K_est, dist_est)
ok, rvec_pnp, tvec_pnp = cv2.solvePnP(objp, proj_gt, K_est, dist_est)
reproj, _ = cv2.projectPoints(objp, rvec_pnp, tvec_pnp, K_est, dist_est)
err = np.linalg.norm(reproj.reshape(-1, 2) - proj_gt.reshape(-1, 2), axis=1)
print(f"PnP mean reprojection error: {err.mean():.5f} px")

# ---- Step 7: robust PnP with RANSAC ----
corrupt = proj_gt.copy()
corrupt[[3, 17, 40]] += rng.uniform(-50, 50, (3, 1, 2)).astype(np.float64)
ok, rvec_r, tvec_r, inliers = cv2.solvePnPRansac(objp, corrupt, K_est, dist_est)
kept = set(inliers.ravel().tolist())
print(f"RANSAC kept {len(kept)} / {len(objp)} corners")
print(f"corrupted indices in inlier set: {[i for i in (3,17,40) if i in kept]}")

With that workbench closing the loop, the chapter's three-beat arc is complete: Section 12.1 showed how the pinhole model and the intrinsic matrix $K$ project the 3D world into pixels and throw depth away, Sections 12.2 and 12.3 showed how to calibrate that projection for your specific camera, and Section 12.4 showed how to locate the camera every frame with PnP; this section gave the craft that keeps all three honest in production. The single number you have now earned, a calibrated $K$, is what turns each pixel into a metric ray. One ray still cannot tell you depth, but two can: Chapter 13: Two-View Geometry, Stereo & Depth adds a second camera, uses this chapter's intrinsics and distortion to rectify the pair, and finally intersects the rays to win back the dimension that projection destroyed.