Part I: Image Processing
Chapter 1: Digital Image Fundamentals

Color Science & Color Spaces: RGB, HSV, Lab & YCbCr

"Hue is just an angle, saturation is just a radius, and yet everyone insists I have a personality."

A Mildly Desaturated Color Channel
Big Picture

Color is not a property of light; it is a three-dimensional summary your visual system computes, and a color space is just a coordinate system for that summary, so the right space to work in depends entirely on the job: RGB for sensors and displays, HSV for intuitive selection, Lab for perceptual distances, and YCbCr for compression. Most color bugs in vision code are coordinate-system bugs: arithmetic done in a space where it has no meaning, or thresholds set in a space where the decision boundary is needlessly tangled. Learn what each space is for and the bugs become design choices.

Section 1.3 closed the books on the brightness of a single channel. This section explains the channel dimension itself: why three numbers, what they mean, and why the same pixel wears different coordinates in different spaces. The "why three" has a biological answer. Human retinas carry three cone types with broad, overlapping spectral sensitivities; every physical light spectrum, no matter how complex, is collapsed into three cone responses. Two physically different spectra that produce the same three responses (metamers) are literally the same color to us. Trichromacy is why cameras have three channels, displays have three primaries, and your arrays from Chapter 0 have shape (H, W, 3): machines copied our compression scheme.

1. From Spectra to Coordinates Beginner

Color science formalized trichromacy in 1931, when the CIE defined the XYZ color space from color-matching experiments: a device-independent reference in which any visible color has coordinates, and against which every practical color space is defined. You will rarely manipulate XYZ directly, but it is the hub through which conversions like RGB to Lab actually travel, and it pins down two ideas this section relies on. First, a white point: the XYZ coordinates declared to be "white" (daylight D65 for nearly all imaging), which anchors white balance from Section 1.1. Second, a gamut: the volume of colors a device can produce; sRGB, the default space of consumer imaging and the web, covers a modest gamut chosen to match 1990s CRT phosphors, and remains the assumed encoding of virtually every dataset you will train on.

2. RGB and the Gamma Trap Intermediate

The RGB values in an ordinary image file are not linear light measurements. As Figure 1.1.1 showed, the ISP applies a gamma curve: sRGB stores approximately the 1/2.2 power of linear intensity, spending more code values on dark tones, where human vision is most discriminating, which is quantization wisdom straight out of Section 1.2. The exact sRGB encoding is piecewise:

$$V_{\text{sRGB}} = \begin{cases} 12.92\, L, & L \le 0.0031308 \\ 1.055\, L^{1/2.4} - 0.055, & L > 0.0031308 \end{cases}$$

where $L$ is linear intensity in $[0, 1]$. The trap: arithmetic that is physically meaningful on linear light (averaging, blurring, resizing, alpha blending) is applied daily to gamma-encoded values, where it is quietly wrong. Code 1.4.1 shows the canonical symptom.

import numpy as np

def srgb_to_linear(v):
    v = v / 255.0
    return np.where(v <= 0.04045, v / 12.92, ((v + 0.055) / 1.055) ** 2.4)

def linear_to_srgb(L):
    s = np.where(L <= 0.0031308, 12.92 * L, 1.055 * L ** (1 / 2.4) - 0.055)
    return np.clip(255 * s, 0, 255)

red   = np.array([255.0, 0.0, 0.0])
green = np.array([0.0, 255.0, 0.0])

naive  = (red + green) / 2                      # averaging the CODES
proper = linear_to_srgb((srgb_to_linear(red) + srgb_to_linear(green)) / 2)

print("average of codes :", naive.round(0))     # dark, muddy olive
print("average of light :", proper.round(0))    # the yellow a camera would see
Code 1.4.1: The gamma trap in six lines of arithmetic. Averaging sRGB codes for pure red and pure green gives (128, 128, 0), a muddy olive; averaging the actual light and re-encoding gives (188, 188, 0), the bright yellow that physically mixing those lights produces.
average of codes : [128. 128.   0.]
average of light : [188. 188.   0.]
Output 1.4.1: A 60-level brightness error from one innocent-looking average. The same mechanism darkens edges when images are blurred or downsampled in gamma space.
Key Insight: Know Which Space Your Arithmetic Lives In

Filtering, resizing, blending, and brightness statistics are physically meaningful on linear light; thresholds tuned by eye, histogram views, and most pretrained models live on gamma-encoded sRGB. Neither is "correct" universally; what matters is choosing deliberately. High-quality pipelines decode to linear, compute, and re-encode at the end. Pragmatic ML pipelines stay in sRGB throughout, and that is defensible too, because the network learns the encoding; what is not defensible is mixing the two mid-pipeline without noticing. The point operations of Chapter 2 make this distinction precise with gamma correction as a tool rather than a trap.

3. HSV: Color the Way You Describe It Beginner

RGB answers "how much of each primary?", which is the wrong question for tasks like "select everything red-ish". HSV re-parameterizes the same cube into hue (which color, as an angle around a color wheel), saturation (how vivid, as a radius from gray), and value (how bright). Figure 1.4.1 shows the geometric relationship: HSV is the RGB cube stood on its black corner and described in cylindrical coordinates.

RGB: a cube of mixtures R G B black white grays run along the black-to-white diagonal cv2.cvtColor HSV: the same cube in cylinder coordinates H: angle (which color) S: radius (how vivid) V: height (how bright) OpenCV uint8 ranges: H in [0, 179], S and V in [0, 255]
Figure 1.4.1: One set of colors, two coordinate systems. The RGB cube (left) parameterizes color by primary mixtures; HSV (right) re-describes the same volume by angle, radius, and height, which is far closer to how people specify colors ("vivid red, fairly bright"). The conversion is a deterministic change of coordinates, not a change of information.

HSV's killer application is color-based selection: a "red object" occupies a narrow hue interval regardless of how bright or washed-out it appears, so a box in HSV often replaces an awkward curved region in RGB. This is the workhorse of classical color segmentation, which Chapter 11 develops fully, and of the threshold-based masks of Chapter 2. Two OpenCV quirks ambush everyone: hue is stored halved (range 0 to 179) so it fits in a uint8, and red sits at the wraparound, needing two ranges. Code 1.4.2 handles both.

import cv2
import numpy as np

# Self-contained scene: three colored disks on gray.
img = np.full((240, 320, 3), 128, np.uint8)
cv2.circle(img, (80, 120), 40, (0, 0, 220), -1)    # red disk (BGR order!)
cv2.circle(img, (160, 120), 40, (0, 200, 0), -1)   # green disk
cv2.circle(img, (240, 120), 40, (200, 80, 0), -1)  # blue disk

hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

# Red hue straddles 0 degrees, so it needs TWO inRange windows.
# Reminder: OpenCV stores hue as degrees/2, so the wheel ends at 179.
mask = (cv2.inRange(hsv, (0, 80, 60), (10, 255, 255))
        | cv2.inRange(hsv, (170, 80, 60), (179, 255, 255)))

print("red pixels selected:", int(mask.sum() // 255))
print("true disk area     :", cv2.countNonZero(
      cv2.circle(np.zeros((240, 320), np.uint8), (80, 120), 40, 255, -1)))
Code 1.4.2: Hue-window segmentation with the red wraparound handled. The two-range union selects the red disk almost exactly while ignoring green, blue, and the gray background, with thresholds a human can read and tune.

4. Lab: When Distance Must Mean Difference Advanced

Neither RGB nor HSV distances track perception: two greens 20 units apart can look identical while two grays 20 units apart look clearly different. CIE Lab was engineered so that Euclidean distance approximates perceived difference. It separates lightness $L^*$ from two opponent axes, $a^*$ (green to red) and $b^*$ (blue to yellow), echoing the opponent-color wiring of human vision, and applies a cube-root nonlinearity matching perceptual compression:

$$f(t) = \begin{cases} t^{1/3}, & t > (6/29)^3 \\ \tfrac{1}{3}\left(\tfrac{29}{6}\right)^2 t + \tfrac{4}{29}, & \text{otherwise} \end{cases}$$

with $L^* = 116 f(Y/Y_n) - 16$, $a^* = 500\,[f(X/X_n) - f(Y/Y_n)]$, and $b^* = 200\,[f(Y/Y_n) - f(Z/Z_n)]$, where $(X_n, Y_n, Z_n)$ is the white point. The payoff is $\Delta E$, the industry-standard color difference: $\Delta E_{76}$ is plain Euclidean distance in Lab, and the refined $\Delta E_{2000}$ corrects its known biases. A $\Delta E$ near 1 is a just-noticeable difference; below 2 is commercially "the same color" in most industries. Code 1.4.3 measures whether two production batches match.

import numpy as np
from skimage import color

# Two batches of "brand orange" that look identical in isolation.
batch_a = np.array([[[235, 122, 36]]], dtype=np.float64) / 255
batch_b = np.array([[[241, 117, 29]]], dtype=np.float64) / 255

lab_a = color.rgb2lab(batch_a)     # handles sRGB decoding + D65 white point
lab_b = color.rgb2lab(batch_b)

de76 = float(np.linalg.norm(lab_a - lab_b))
de2000 = float(color.deltaE_ciede2000(lab_a, lab_b)[0, 0])
print(f"Delta E 1976: {de76:.2f}   Delta E 2000: {de2000:.2f}")
# A Delta E 2000 under ~2.0 passes most print/brand color tolerances.
Code 1.4.3: Perceptual color difference as a quality gate. The two oranges differ by a representative $\Delta E_{2000}$ of about 2, right at the boundary where trained observers begin to notice under side-by-side viewing.
Library Shortcut: skimage.color Replaces a 40-Line Conversion

Writing RGB to Lab by hand means undoing the sRGB gamma (Code 1.4.1), a 3×3 matrix to XYZ, white-point normalization, the piecewise cube-root, and the $L^* a^* b^*$ assembly: roughly 40 lines with three classic bug sites (wrong white point, forgotten gamma, matrix for the wrong RGB standard). skimage.color.rgb2lab(rgb) is one line and handles all of it, with siblings (rgb2hsv, rgb2ycbcr, rgb2xyz, deltaE_ciede2000) covering this entire section. OpenCV's cv2.cvtColor(img, cv2.COLOR_BGR2LAB) is the fast uint8 route, with the gotcha that it rescales: $L^*$ to $[0, 255]$ and $a^*, b^*$ shifted by 128.

Practical Example: Grading Tomatoes in the Right Coordinates

Who: A computer vision engineer at a produce-grading equipment manufacturer.

Situation: The company's tomato line classified ripeness from RGB thresholds tuned carefully at the factory.

Problem: At customer sites, with different luminaires and aging bulbs, the same fruit landed in different grade bins; recalibrating RGB thresholds per site took a technician most of a day.

Decision: The engineer moved the decision to Lab space: ripeness became a threshold on $a^*$ (the green-to-red axis) with a mild $L^*$ gate, after a per-site white balance against a gray reference tile, applying the capture-control lesson of Section 1.1.

Result: Cross-site grade agreement rose from 81% to 96%, and site calibration shrank to photographing one gray tile.

Lesson: Do not fight a tangled decision boundary with more thresholds; change coordinates until the boundary is simple. Color spaces are features, and choosing the right one is feature engineering.

5. YCbCr: Color for Compression Intermediate

The final space exists for neither perception nor intuition but for bandwidth. Human vision resolves brightness detail far more sharply than color detail, so codecs split images into luma $Y'$ (a weighted sum of gamma-encoded R, G, B reflecting the eye's sensitivities) and two chroma differences:

$$Y' = 0.299\,R' + 0.587\,G' + 0.114\,B' \qquad \text{(BT.601)}$$

with $C_b$ and $C_r$ encoding blue-difference and red-difference. The reward is chroma subsampling: storing chroma at half resolution in each direction (the 4:2:0 scheme) discards 50% of the data before any compression begins, almost invisibly. JPEG, WebP, and essentially every video codec do this, as Section 1.5 will exploit. Code 1.4.4 simulates the round trip and measures how little is lost on a typical image.

import cv2
import numpy as np

ycc = cv2.cvtColor(img, cv2.COLOR_BGR2YCrCb)   # OpenCV channel order: Y, Cr, Cb!
Y, Cr, Cb = cv2.split(ycc)

def to_quarter_and_back(c):
    """4:2:0 simulation: halve chroma resolution, then upsample back."""
    down = cv2.resize(c, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA)
    return cv2.resize(down, (c.shape[1], c.shape[0]),
                      interpolation=cv2.INTER_LINEAR)

ycc_sub = cv2.merge([Y, to_quarter_and_back(Cr), to_quarter_and_back(Cb)])
back = cv2.cvtColor(ycc_sub, cv2.COLOR_YCrCb2BGR)

err = np.abs(back.astype(np.int16) - img.astype(np.int16))
print("mean abs error after 4:2:0 round trip:", round(float(err.mean()), 2))
print("worst error (at colored disk edges) :", int(err.max()))
Code 1.4.4: Chroma subsampling round trip on the disk scene from Code 1.4.2. Half the chroma samples vanish in each direction, yet the mean error stays near one code value; the damage concentrates at saturated color edges, exactly where JPEG's color fringing lives. Note OpenCV's YCrCb channel order, a perennial source of swapped-chroma bugs.
Research Frontier: Learning the Right Color Coordinates (2024 to 2026)

Color spaces are still being invented, now with gradients. The HVI color space (Yan et al., CVPR 2025, "You Only Need One Color Space") was designed specifically for low-light enhancement: it learns an intensity-collapsed hue-symmetric plane that suppresses the noise amplification and red-black artifacts that plague Lab- and HSV-based enhancement networks, and reports consistent gains across ten benchmarks simply by changing coordinates under an unchanged architecture. On the capture side, learned white balance has matured from the cross-camera CNN of C5 (ICCV 2021) into transformer-based auto white balance modules evaluated for in-ISP deployment in 2024 to 2026 mobile pipelines. The through-line matches this section's thesis exactly: when a vision task struggles, one of the cheapest interventions available is a better coordinate system for color.

Four spaces, four jobs: RGB stores what devices emit, HSV indexes what humans mean, Lab measures what humans see, and YCbCr packs what channels can afford. With color encoded, one question remains for this chapter: how the whole array gets squeezed into a file, and what that squeezing costs. On to Section 1.5.

Exercise 1.4.1: Pick the Space Conceptual

For each task, choose the most natural color space from this section and defend your choice in two sentences: (a) a slider that lets users shift a photo's colors toward "warmer" without changing brightness; (b) verifying that a printed logo matches the brand specification; (c) finding all yellow tennis balls in a video feed; (d) deciding how to allocate bits between channels in a new image codec.

Exercise 1.4.2: The Gamma-Aware Resizer Coding

Build two thumbnail pipelines for the same high-contrast photograph (or a synthetic checkerboard of 1-pixel black and white squares): (a) cv2.resize directly on the sRGB image; (b) convert to linear light with the functions from Code 1.4.1, resize, and convert back. Compare the mean brightness of both thumbnails to the mean linear-light brightness of the original. Which pipeline preserves it, by how many code values do they differ, and why does the checkerboard make the effect dramatic?

Exercise 1.4.3: Stress-Test Hue Segmentation Analysis

Using Code 1.4.2 as a base, degrade the scene three ways and measure the selected-pixel count after each: (a) scale the image brightness by 0.3 (dim lighting); (b) add Gaussian noise with sigma 20; (c) blend 30% gray into the disks (desaturation). Which degradation breaks the hue window first, which HSV channel's threshold is responsible, and what does this tell you about when classical color segmentation should be replaced by the learned methods of Part III?