"I have spent my whole career looking for the same point in two photographs. The trick is to be memorable enough that the second photograph recognizes you."
A Quietly Distinctive Keypoint
This chapter solves the correspondence problem: find the same physical point in two different photographs, and most of geometric vision follows as a consequence. Panorama stitching, camera calibration, stereo depth, structure from motion, visual SLAM, and image retrieval all begin with the same four-stage pipeline built here: detect repeatable points, describe their neighborhoods as vectors, match the vectors between images, and verify the matches with a geometric model. The pipeline is classical, but it is no museum piece: it runs today inside Chapter 14's COLMAP reconstructions that feed the neural radiance fields of Chapter 27, and its design choices echo through the learned representations of Chapter 25.
If you forget everything else in this chapter, keep these four words in order. Detect (Section 10.1 to 10.2) finds points you can locate again; Describe (Sections 10.3 to 10.4) turns each neighborhood into a comparable vector; Match (Section 10.5) pairs the vectors across images; Verify (Section 10.6) keeps only the pairs a single geometry endorses. Each stage exists to cover the previous one's blind spot, and the same four words name the grammar of every geometric-vision system in the rest of Part II. The sections that follow are each one word of this sentence, expanded.
Chapter Overview
Chapter 9 extracted structure from single images: edges, lines, and curves. This chapter asks a question that involves two images at once. Given a photograph of a building taken from the street and another taken a few steps to the left, can a program point at a window corner in the first image and find the very same window corner in the second? A human does this instantly. For a machine, it is a genuinely hard problem: the second image differs in viewpoint, scale, rotation, lighting, and sensor noise, and the search space is every pixel. Solving it unlocks an astonishing amount of geometry, because two views of the same point constrain where the cameras were, which is the seed of Chapter 13's stereo and Chapter 14's structure from motion.
The classical answer decomposes the problem into stages, and the chapter walks through them in order. Section 10.1 asks which points are worth finding at all and answers with corners: locations where the image gradient varies in two directions, detected by the Harris and Shi-Tomasi operators and, at production speed, by FAST. Section 10.2 confronts the fact that a corner detector tuned to one zoom level fails at another, and builds the scale-space machinery (Gaussian pyramids, the Difference of Gaussians, canonical orientation) that makes detection survive zoom and rotation. Section 10.3 assembles these parts into SIFT, the 128-dimensional gradient-histogram descriptor that dominated computer vision for a decade and still ships inside reconstruction pipelines today.
The second half of the chapter is about engineering and trust. Section 10.4 trades SIFT's floating-point precision for binary descriptors (BRIEF, ORB, AKAZE) that match hundreds of times faster and fit real-time budgets on phones and robots. Section 10.5 turns piles of descriptors into actual correspondences, introducing brute-force and approximate nearest-neighbor search and the deceptively simple ratio test that rejects most false matches before they cause damage. Section 10.6 deals with the false matches that survive anyway: RANSAC, the hypothesize-and-verify algorithm that fits geometric models to contaminated data and, in doing so, separates inliers from outliers. RANSAC is so generally useful that it long ago escaped this chapter's topic and became a standard tool across all of engineering.
One theme deserves flagging before you begin. Every design decision in this chapter is a trade between invariance (ignoring changes you do not care about) and distinctiveness (preserving differences you do). A descriptor invariant to everything describes nothing; a descriptor sensitive to everything matches nothing. SIFT's gradient histograms, BRIEF's intensity comparisons, and the ratio test's relative threshold are all answers to this one tension. The same tension returns, with weights learned from data instead of designed by hand, when Chapter 25 trains networks to produce embeddings, and when Chapter 34's CLIP vectors act as universal descriptors for entire images. Hand-crafted features lost the recognition war to deep learning, but the vocabulary they established (detect, describe, match, verify) remains the grammar of geometric vision.
Prerequisites
This chapter leans directly on image gradients and the Sobel operator from Chapter 3: Spatial Filtering & Convolution, and on Gaussian pyramids and the Difference of Gaussians from Chapter 4: The Frequency Domain & Multi-Scale Analysis. Section 10.6 fits homographies, so the geometric transformations of Chapter 5: Geometric Transformations & Image Warping should be fresh. The discussion of robust fitting continues a thread started in Chapter 9: Edges, Lines & Curves, whose least-squares and Hough machinery is the contrast against which RANSAC makes sense. Comfort with NumPy arrays and basic linear algebra (eigenvalues of a 2x2 matrix) is assumed throughout.
Chapter Roadmap
- 10.1 Corner Detection: Harris, Shi-Tomasi & FAST Why corners are the findable points: the structure tensor, eigenvalue analysis, the Harris response, Shi-Tomasi's refinement, and the FAST segment test that runs at video rate.
- 10.2 Scale & Rotation Invariance: Scale Space Making detection survive zoom and rotation: Gaussian scale space, octaves, Difference-of-Gaussians extrema, sub-pixel refinement, and canonical orientation assignment.
- 10.3 SIFT: The Descriptor That Defined a Decade The 128-dimensional gradient-histogram descriptor: its 4x4x8 anatomy, the normalization tricks behind its illumination robustness, RootSIFT, and its legacy.
- 10.4 Fast Binary Alternatives: BRIEF, ORB & AKAZE Descriptors as bit strings: BRIEF's intensity comparisons, Hamming distance at hardware speed, ORB's oriented and de-correlated upgrade, and AKAZE's nonlinear scale space.
- 10.5 Descriptor Matching & the Ratio Test From descriptor piles to correspondences: brute-force and FLANN nearest-neighbor search, cross-checking, and Lowe's ratio test that compares best to second-best.
- 10.6 RANSAC & Robust Model Fitting Trusting matches by geometric consensus: why least squares breaks, the hypothesize-and-verify loop, fitting homographies to contaminated matches, and modern MAGSAC++ variants.
What's Next?
Keypoints answer "where is the same point?" but stay silent about regions: which pixels belong together as one object or surface? Chapter 11: Classical Segmentation & Grouping takes up that question with clustering, region growing, watersheds, graph cuts, and superpixels: the classical toolkit for carving an image into meaningful pieces. The two chapters are complementary halves of classical scene understanding: this one finds sparse, precise anchors for geometry, the next finds dense, coherent groupings for content. Both threads converge later, when Chapter 12 begins the geometric arc that consumes this chapter's correspondences. Before moving on, assemble all four verbs into one runnable tool in the Hands-On Lab below, where detect, describe, match, and verify combine into a two-photo panorama stitcher.
Hands-On Lab: Build a Two-Photo Panorama Stitcher
Objective
Assemble the four verbs of this chapter (detect, describe, match, verify) into one runnable program: a panorama stitcher that takes two overlapping photographs, finds keypoints in each, matches their descriptors, filters the matches with the ratio test, verifies them with a RANSAC homography, and warps one image onto the other to produce a single wide composite. The script synthesizes its own overlapping pair by cropping two windows from any image, so it always produces a panorama even without a curated dataset.
What You'll Practice
- Detecting and describing keypoints with SIFT, the gradient-histogram descriptor of Section 10.3.
- Matching descriptors with brute-force k-nearest-neighbor search and Lowe's ratio test from Section 10.5.
- Estimating a homography from contaminated matches with RANSAC and reading its inlier mask, the robust fitting of Section 10.6.
- Gating on the inlier count so the tool refuses pairs that do not actually overlap.
- Warping and compositing two views into one panorama, the geometric payoff that the full detect-describe-match-verify pipeline unlocks.
Setup
Two libraries and no curated dataset required; the script splits any single image into an overlapping left and right pair if you do not supply two of your own. Install with:
pip install opencv-python numpy
To stitch your own photos, drop two overlapping shots named left.jpg and right.jpg beside the script; the loader falls back to splitting a built-in test image when those files are absent.
Steps
Step 1: Load an overlapping pair, or synthesize one
Build a loader that returns two overlapping color images. When left.jpg and right.jpg are present it uses them; otherwise it crops two horizontally shifted windows from OpenCV's bundled test image so the two crops share a middle strip to match on.
import cv2
import numpy as np
def load_pair():
"""Return (left, right) overlapping BGR images."""
a = cv2.imread("left.jpg")
b = cv2.imread("right.jpg")
if a is not None and b is not None:
return a, b
# Fallback: crop two overlapping windows from a built-in sample.
full = cv2.imread(cv2.samples.findFile("lena.jpg"))
if full is None: # last resort: synthetic texture
full = np.random.default_rng(0).integers(0, 256, (512, 512, 3), np.uint8)
h, w = full.shape[:2]
# TODO: return two crops that share an overlap. Take the left crop as
# columns 0 to int(0.65*w) and the right crop as columns int(0.35*w) to w,
# so the two windows share the middle 30 percent of the image.
...
Hint
return full[:, :int(0.65*w)].copy(), full[:, int(0.35*w):].copy(). The shared middle band is what the matcher latches onto; widen the overlap if too few inliers survive Step 4.
Step 2: Detect and describe keypoints in each image
Run SIFT on the grayscale version of each image to obtain keypoints and their 128-dimensional descriptors. SIFT detection survives the moderate scale and viewpoint change between two handheld shots, which is exactly why Section 10.3 built it.
def detect_describe(bgr):
gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
sift = cv2.SIFT_create(nfeatures=4000)
# TODO: call sift.detectAndCompute(gray, None) and return the
# (keypoints, descriptors) tuple it produces.
...
Hint
return sift.detectAndCompute(gray, None). If you hit an attribute error, your OpenCV is old: pip install --upgrade opencv-python brings SIFT into the main package (it left the contrib-only days after the patent expired in 2020).
Step 3: Match descriptors and apply the ratio test
Use a brute-force L2 matcher to find the two nearest neighbors of every left descriptor among the right descriptors, then keep a match only when the best neighbor is clearly closer than the second-best. This is Lowe's ratio test from Section 10.5, the cheapest high-value filter in the pipeline.
def ratio_match(d1, d2, ratio=0.75):
bf = cv2.BFMatcher(cv2.NORM_L2)
pairs = bf.knnMatch(d1, d2, k=2) # best two neighbors each
# TODO: keep match m from each (m, n) pair only when
# m.distance < ratio * n.distance. Return the list of survivors.
...
Hint
return [m for m, n in pairs if m.distance < ratio * n.distance]. Tightening ratio toward 0.6 yields fewer but cleaner matches; loosening toward 0.8 keeps more candidates for RANSAC to sort out in Step 4.
Step 4: Verify with a RANSAC homography
Collect the matched point coordinates and fit a homography with cv2.findHomography under the cv2.RANSAC flag. The returned inlier mask is half the product: it labels each ratio-test survivor as geometrically consistent or not, the verification step of Section 10.6. Gate on the inlier count so a non-overlapping pair is refused rather than warped into garbage.
def estimate_homography(k1, k2, good, min_inliers=20):
if len(good) < 4: # need 4 pairs for a homography
return None, None
src = np.float32([k1[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)
dst = np.float32([k2[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)
# TODO: call cv2.findHomography(src, dst, cv2.RANSAC,
# ransacReprojThreshold=3.0). If the inlier mask sums to fewer than
# min_inliers, return (None, None) to refuse the pair; else return (H, mask).
...
Hint
H, mask = cv2.findHomography(src, dst, cv2.RANSAC, 3.0); then if H is None or int(mask.sum()) < min_inliers: return None, None else return H, mask. The gate is the lesson of Section 10.6's orthomosaic story: a robust estimator that can say "no" beats one that always returns a matrix.
Step 5: Warp and composite into one panorama
Map the right image into the left image's coordinate frame with the homography, allocate a canvas wide enough for both, and paste the left image on top of the warped right. The seam will be visible; that is fine, the point is that the geometry lines up.
def stitch(left, right, H):
h1, w1 = left.shape[:2]
h2, w2 = right.shape[:2]
canvas_w = w1 + w2 # room for both side by side
warped = cv2.warpPerspective(right, H, (canvas_w, max(h1, h2)))
# TODO: copy `left` into the top-left region of `warped`
# (rows 0:h1, cols 0:w1), then return the composited canvas.
...
Hint
warped[0:h1, 0:w1] = left then return warped. Pasting the un-warped left image last lets it overwrite the warped right in the overlap region, hiding the empty border the warp introduces.
Step 6: Run the full pipeline and report the attrition
Chain the five stages and print the match attrition at each step, the same shrinking funnel Section 10.6 traced: raw keypoints, ratio-test survivors, geometric inliers. Write the panorama as the artifact you keep.
left, right = load_pair()
(k1, d1) = detect_describe(left)
(k2, d2) = detect_describe(right)
good = ratio_match(d1, d2)
H, mask = estimate_homography(k1, k2, good)
if H is None:
print(f"Refused: only {len(good)} ratio survivors, too few inliers to trust.")
else:
n_in = int(mask.sum())
print(f"keypoints {len(k1)}/{len(k2)} -> ratio {len(good)} -> inliers {n_in}")
pano = stitch(left, right, H)
cv2.imwrite("panorama.jpg", pano)
print("wrote panorama.jpg")
Hint
If the stitch looks torn, your overlap is too small or the wrong image is being warped. Confirm the inlier count is comfortably above the gate (a healthy overlapping pair gives well over a hundred), and remember the homography maps right into left's frame, matching the src/dst order in Step 4.
Expected Output
One image file, panorama.jpg, wider than either input, with the two views aligned across their shared region so straight edges (a window frame, a horizon, the bookshelf in the test image) continue unbroken across the seam. The console prints the attrition funnel, for example keypoints 3200/3150 -> ratio 540 -> inliers 410, the chapter's whole story in one line: thousands of keypoints, hundreds of ratio survivors, a geometrically consistent core. If you feed it two non-overlapping photos, it prints the Refused line and writes nothing, which is the correct behavior, not a bug.
Stretch Goals
- Swap SIFT for ORB (
cv2.ORB_create) and the matcher norm tocv2.NORM_HAMMING, the binary-descriptor path of Section 10.4; time both and compare inlier counts to feel the speed-versus-distinctiveness trade. - Replace
cv2.RANSACwithcv2.USAC_MAGSACin Step 4 and stitch the same pair both ways; report how the inlier count and seam alignment change, exercising the modern estimator family of Section 10.6. - Blend the seam instead of overwriting it: feather the overlap with a linear alpha ramp so the join is invisible, the first step toward the multi-band blending real panorama tools use.
The pipeline above is roughly seventy lines and exposes every stage on purpose. OpenCV bundles the same detect-describe-match-verify-warp chain, plus exposure compensation and multi-band blending the lab skips, behind one class: stitcher = cv2.Stitcher_create() then status, pano = stitcher.stitch([left, right]) and if status == cv2.Stitcher_OK: cv2.imwrite("panorama.jpg", pano). That is a 70-to-3 reduction, and the result has seamless blending the from-scratch version lacks. Build it once by hand to understand what the three lines hide; reach for the library every time after.
Complete Solution
import cv2
import numpy as np
def load_pair():
a = cv2.imread("left.jpg")
b = cv2.imread("right.jpg")
if a is not None and b is not None:
return a, b
full = cv2.imread(cv2.samples.findFile("lena.jpg"))
if full is None:
full = np.random.default_rng(0).integers(0, 256, (512, 512, 3), np.uint8)
h, w = full.shape[:2]
return full[:, :int(0.65 * w)].copy(), full[:, int(0.35 * w):].copy()
def detect_describe(bgr):
gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
sift = cv2.SIFT_create(nfeatures=4000)
return sift.detectAndCompute(gray, None)
def ratio_match(d1, d2, ratio=0.75):
bf = cv2.BFMatcher(cv2.NORM_L2)
pairs = bf.knnMatch(d1, d2, k=2)
return [m for m, n in pairs if m.distance < ratio * n.distance]
def estimate_homography(k1, k2, good, min_inliers=20):
if len(good) < 4:
return None, None
src = np.float32([k1[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)
dst = np.float32([k2[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)
H, mask = cv2.findHomography(src, dst, cv2.RANSAC, 3.0)
if H is None or int(mask.sum()) < min_inliers:
return None, None
return H, mask
def stitch(left, right, H):
h1, w1 = left.shape[:2]
h2, w2 = right.shape[:2]
canvas_w = w1 + w2
warped = cv2.warpPerspective(right, H, (canvas_w, max(h1, h2)))
warped[0:h1, 0:w1] = left
return warped
if __name__ == "__main__":
left, right = load_pair()
k1, d1 = detect_describe(left)
k2, d2 = detect_describe(right)
good = ratio_match(d1, d2)
H, mask = estimate_homography(k1, k2, good)
if H is None:
print(f"Refused: only {len(good)} ratio survivors, too few inliers to trust.")
else:
n_in = int(mask.sum())
print(f"keypoints {len(k1)}/{len(k2)} -> ratio {len(good)} -> inliers {n_in}")
cv2.imwrite("panorama.jpg", stitch(left, right, H))
print("wrote panorama.jpg")
Bibliography & Further Reading
Foundational Papers
Recent Research (2023-2026)
Books
Tools & Libraries
kornia.feature documentation. kornia.readthedocs.io