Part I: Image Processing
Chapter 8: Tools of the Trade: The Image Processing Stack

Library Landscape: OpenCV, scikit-image, Pillow & SciPy ndimage

"OpenCV insists I am BGR, Pillow swears I am RGB, and scikit-image quietly converted me to float64 while nobody was looking. I just lie here, contiguous in memory, and let them argue."

A Diplomatically Neutral NumPy Array
Big Picture

The four major Python imaging libraries are not competitors but complements: each one optimizes a different corner of the same job, and the professional move is to know which corner you are standing in. OpenCV optimizes speed and breadth, scikit-image optimizes clarity and scientific correctness, Pillow optimizes file I/O and simplicity, and SciPy ndimage optimizes n-dimensional generality. They interoperate through one shared currency, the NumPy array, and nearly every bug at their borders is a conversion bug.

Throughout Part I we reached for whichever library had the cleanest API for the moment: Pillow to load files in Chapter 0, OpenCV for warps in Chapter 5, scikit-image for denoisers in Chapter 7. This section finally puts the four side by side: where each came from, what data model it assumes, how to translate between them safely, and a decision procedure for choosing. Treat it as the reference you return to whenever an import decision feels arbitrary.

1. Four Libraries, Four Philosophies Beginner

OpenCV (cv2) is a C++ library wearing a Python costume. Born at Intel in 1999, it grew into the largest open-source vision codebase in existence: thousands of functions spanning image processing, video I/O, calibration, tracking, and legacy machine learning. Its Python bindings hand you NumPy arrays, but the semantics underneath are C++: preallocated outputs, in-place options, integer-first dtypes, and the famous BGR channel order it inherited from early Windows bitmap conventions. When raw speed on a CPU matters, OpenCV is usually the answer; Section 8.2 quantifies why.

scikit-image (skimage) is the scientific community's counterproposal: pure-Python-plus-Cython, NumPy-native, with an API designed to read like the textbook. Functions are named after the algorithm and its author (filters.frangi, restoration.richardson_lucy), default to float images in $[0, 1]$, and favor correctness and documentation over raw throughput. Its docs double as a small image processing course, and many functions cite the paper they implement.

Pillow (PIL) is the elder of the family, the maintained fork of the 1995 Python Imaging Library. Its center of gravity is file handling: it reads and writes dozens of formats, understands EXIF orientation, color profiles, animated GIFs, and 16-bit TIFFs, and exposes a friendly Image object with methods like resize and crop. It is the default tool of web backends and the loader underneath torchvision, which is why it reappears in Chapter 18.

SciPy ndimage (scipy.ndimage) is the quiet generalist. It implements filters, morphology, interpolation, and measurements for arrays of any dimensionality, which makes it the only choice in this lineup when your "image" is a 3D microscopy stack or a 4D fMRI volume. Its API is spartan, but several scikit-image functions delegate to it internally, and its map_coordinates is the engine behind many custom warps. Figure 8.1.1 places all four on the common NumPy foundation.

OpenCV C++ core, SIMD, threads uint8 BGR by default speed + breadth video, calibration, GUI scikit-image Python + Cython float in [0, 1], RGB textbook-faithful APIs papers cited in docstrings Pillow Image object, not array RGB, modes ("L", "RGB") file I/O for 30+ formats EXIF, ICC, thumbnails SciPy ndimage n-dimensional: 2D, 3D, 4D any dtype, no color rules filters, morphology, map_coordinates GPU lane CuPy cuCIM PyTorch Kornia same array idea, device memory (Section 8.2) ndarray in/out ndarray in/out np.asarray() copy ndarray in/out NumPy ndarray shape (H, W) or (H, W, C) · dtype · strides · the shared currency of the stack Every cross-library bug in Part I is a disagreement about what the bytes in this array mean.
Figure 8.1.1: The Part I imaging stack. Four libraries with different philosophies all read and write the same NumPy arrays (Pillow via an explicit copy), while GPU libraries replicate the array model in device memory. Interoperation is cheap; interpretation (channel order, dtype, range) is where bugs live.

Table 8.1.1 compresses the comparison into the form you will actually use: a row to scan before you type import.

Table 8.1.1: The four core libraries at a glance.
LibraryNative image modelColor orderBackendStrongest atWeakest at
OpenCV (cv2)uint8 ndarray (also float32)BGRC++, SIMD, IPP, threadsspeed, video, breadthAPI consistency, docs depth
scikit-imagefloat64 ndarray in [0, 1]RGBPython + Cythonclarity, scientific rigorthroughput, video
PillowImage object with modeRGBC (libjpeg, zlib, ...)file formats, metadatanumeric processing
SciPy ndimageany-dtype, any-dim ndarraynone (no color rules)C3D/4D volumes, generic filterscolor handling, I/O

2. The Data-Model Trap: dtype, Range & Channel Order Intermediate

The libraries agree on the container and disagree on the contents. As established in Chapter 1, an image is an array plus an interpretation, and each library bakes in a different interpretation. Three axes of disagreement cause essentially all of the trouble:

The defensive pattern is a small, boring conversion layer at the borders of your code, written once and tested once. Code 8.1.1 is a version you can paste into any project.

import numpy as np
import cv2
from PIL import Image

def pil_to_cv(im: Image.Image) -> np.ndarray:
    """Pillow (RGB Image) -> OpenCV (BGR uint8 ndarray)."""
    return cv2.cvtColor(np.asarray(im.convert("RGB")), cv2.COLOR_RGB2BGR)

def cv_to_pil(arr: np.ndarray) -> Image.Image:
    """OpenCV (BGR uint8) -> Pillow (RGB Image)."""
    return Image.fromarray(cv2.cvtColor(arr, cv2.COLOR_BGR2RGB))

def cv_to_skimage(arr: np.ndarray) -> np.ndarray:
    """OpenCV (BGR uint8) -> scikit-image convention (RGB float64 in [0, 1])."""
    rgb = cv2.cvtColor(arr, cv2.COLOR_BGR2RGB)
    return rgb.astype(np.float64) / 255.0

def skimage_to_cv(arr: np.ndarray) -> np.ndarray:
    """scikit-image (RGB float in [0, 1]) -> OpenCV (BGR uint8), with safe rounding."""
    u8 = np.clip(np.rint(arr * 255.0), 0, 255).astype(np.uint8)
    return cv2.cvtColor(u8, cv2.COLOR_RGB2BGR)
Code 8.1.1: A border-control layer for the stack: four converters that make channel order, dtype, and range explicit. Keeping all conversions in one place turns "mystery tint" bugs into grep-able code review items.
Library Shortcut: skimage.util's Conversion Utilities

The dtype-and-range half of Code 8.1.1 (clipping, rounding, rescaling, handling signed inputs and 16-bit images) generalizes into about 30 lines of edge cases if you write it for every dtype pair yourself. scikit-image ships it as a family of one-liners:

from skimage import util

f = util.img_as_float(u8_image)    # uint8 [0,255]  -> float64 [0,1]
u = util.img_as_ubyte(float_img)   # float [0,1]    -> uint8, clipped + rounded
w = util.img_as_uint(float_img)    # float [0,1]    -> uint16 [0, 65535]
Code 8.1.1a: The skimage.util dtype converters: one audited line per direction in place of the clip-round-scale bookkeeping that Code 8.1.1 handles for the range half of the problem.

Roughly 30 lines of conversion bookkeeping become 1 per direction. Internally the library handles negative-value clipping, correct rounding (not truncation), 12-bit and 16-bit scaling, and warns rather than silently wrapping when a conversion would lose information. Channel order remains your job; no utility can know whether bytes are RGB or BGR.

Practical Example: The Sunburn Filter That Shipped

Who: A two-person team at a photo-sharing startup adding an "auto-enhance" feature.

Situation: The enhancement model was prototyped in a notebook with scikit-image (RGB floats), then deployed inside a service that loaded images with OpenCV (BGR uint8) for speed.

Problem: In production, every enhanced portrait came out subtly wrong: skin trended orange-red, skies dull. No crash, no warning, and unit tests passed because they used grayscale fixtures, where BGR and RGB are identical.

Decision: An engineer finally diffed channel histograms (a trick from Chapter 2) between notebook and service for the same input, saw the red and blue histograms swapped, and traced the path: cv2.imread feeding an RGB-trained model. The fix was one cvtColor call, plus a color test fixture so it could never regress silently.

Result: Output matched the notebook pixel-for-pixel. The postmortem mandated a conversion layer like Code 8.1.1 at every service boundary.

Lesson: Channel-order bugs do not throw exceptions; they degrade aesthetics. Test with color images and assert on channel statistics, not just shapes.

Key Insight: The Array Is Not the Image

A NumPy array carries shape and dtype but not meaning: nothing in (512, 512, 3) uint8 says whether channel 0 is red or blue, whether 255 means white or 1.0 does, or whether the data is gamma-encoded sRGB or linear light (a distinction Chapter 1 introduced and averaging operations quietly depend on). Every library call is therefore a claim about metadata that the array itself cannot verify. The stack's single most valuable habit is making those claims explicit at module borders, the way Code 8.1.1 does, so that the type system of your own functions carries what NumPy's cannot.

3. A Rosetta Stone: One Operation, Four APIs Intermediate

Seeing the same operation written four ways teaches more about the libraries' personalities than any prose. Code 8.1.2 applies a Gaussian blur with $\sigma = 3$ (the workhorse smoother from Chapter 3) in each library, then measures how much the results differ.

import numpy as np, cv2
from PIL import Image, ImageFilter
from scipy import ndimage
from skimage import filters, util

rng = np.random.default_rng(8)
img = rng.integers(0, 256, size=(512, 512), dtype=np.uint8)  # synthetic test card

# OpenCV: ksize=(0,0) lets sigma determine the kernel size; uint8 in, uint8 out
b_cv = cv2.GaussianBlur(img, ksize=(0, 0), sigmaX=3.0)

# SciPy ndimage: dtype-agnostic, n-dimensional; we feed floats for precision
b_nd = ndimage.gaussian_filter(img.astype(np.float64), sigma=3.0)

# scikit-image: float-in-[0,1] convention, channel_axis for color images
b_sk = filters.gaussian(util.img_as_float(img), sigma=3.0) * 255.0

# Pillow: its own Image object; radius plays the role of sigma (approximately)
b_pil = np.asarray(Image.fromarray(img).filter(ImageFilter.GaussianBlur(radius=3.0)),
                   dtype=np.float64)

print("max |OpenCV - ndimage|     :", np.abs(b_cv - b_nd).max().round(2))
print("max |ndimage - skimage|    :", np.abs(b_nd - b_sk).max().round(2))
print("max |ndimage - Pillow|     :", np.abs(b_nd - b_pil).max().round(2))
Code 8.1.2: The same $\sigma=3$ Gaussian blur in all four libraries. Note what each API makes you specify: OpenCV wants a kernel size policy, scikit-image wants its float convention honored, Pillow wants a "radius", and ndimage just wants numbers.
max |OpenCV - ndimage|     : 1.42
max |ndimage - skimage|    : 0.0
max |ndimage - Pillow|     : 3.87
Output 8.1.2a: Representative differences (exact values depend on the random image). scikit-image matches ndimage because it calls it internally; OpenCV differs slightly through kernel truncation and uint8 rounding; Pillow differs most because its "Gaussian" is an approximation built from repeated box blurs.

The output is the lesson: four correct implementations of "the same" blur disagree by up to a few gray levels, because kernel truncation radius, border policy (the Chapter 3 border modes again), and internal precision all differ. For most applications this is irrelevant; for reproducing a paper's numbers it is decisive, a theme Section 8.3 revisits with SSIM. Table 8.1.2 extends the Rosetta stone to the operations Part I used most.

Table 8.1.2: Rosetta stone: common Part I operations across the stack (grayscale case; "n/a" means no direct single-call equivalent).
OperationOpenCVscikit-imageSciPy ndimagePillow
Read filecv2.imreadio.imreadn/aImage.open
Resizecv2.resizetransform.resizendimage.zoomImage.resize
Gaussian blurcv2.GaussianBlurfilters.gaussianndimage.gaussian_filterImageFilter.GaussianBlur
Median filtercv2.medianBlurfilters.medianndimage.median_filterImageFilter.MedianFilter
Otsu thresholdcv2.threshold(..., THRESH_OTSU)filters.threshold_otsun/an/a
Affine warpcv2.warpAffinetransform.warpndimage.affine_transformImage.transform
Morphological openingcv2.morphologyExmorphology.openingndimage.binary_openingn/a
Connected componentscv2.connectedComponentsmeasure.labelndimage.labeln/a
Histogram equalizationcv2.equalizeHistexposure.equalize_histn/aImageOps.equalize

Two patterns jump out of the table. First, OpenCV and scikit-image cover nearly everything from Chapter 2 through Chapter 6, so most projects standardize on one of them and call the others at the edges. Second, the gaps are informative: Pillow has no morphology or components because it was never meant for analysis, and ndimage has no Otsu or file I/O because thresholding policy and disk formats are not n-dimensional math.

4. Choosing: A Decision Guide Intermediate

The decision procedure this book uses, distilled from the trade-offs above:

  1. Loading and saving files, EXIF, web formats? Pillow first. It is the I/O specialist, and np.asarray(im) hands the pixels to everyone else.
  2. Real-time or video pipeline, CPU-bound? OpenCV. Nothing else in the lineup touches its throughput (Section 8.2 measures it), and cv2.VideoCapture has no peer here.
  3. Research code, papers to reproduce, readability reviewed? scikit-image. The float convention removes a whole class of overflow bugs, and the docstrings cite their sources.
  4. Volumes, stacks, anything not (H, W, 3)? SciPy ndimage, which never assumed two dimensions in the first place.
  5. The operation will eventually live inside a neural network? Start thinking in PyTorch/Kornia tensors now; Chapter 18 makes the move official.

Mixing is normal; pick one primary convention per project (most teams choose either "OpenCV everywhere" or "skimage everywhere, OpenCV for hotspots"), write the border layer once, and never convert ad hoc in the middle of an algorithm.

Research Frontier: The Stack Is Migrating to Tensors and GPUs

The 2024-2026 trend is the classical stack re-platforming onto deep learning infrastructure. Kornia (Riba et al., WACV 2020, with major releases through 2025) reimplements filters, geometry, and color conversions as differentiable PyTorch operators, so a Gaussian blur can sit inside a trained model and receive gradients. cuCIM (RAPIDS) provides a scikit-image-compatible API executing on GPUs for biomedical gigapixel workloads, and NVIDIA's CV-CUDA (open-sourced 2023, actively extended through 2025) targets batched pre- and post-processing for inference services at thousands of images per second. Meanwhile the scikit-image team's "skimage2" transition plan (SKIP-4, debated 2023-2025) proposes modernizing the library's API conventions without breaking two decades of scientific code. The practical takeaway: the four-library landscape of this section is stable for CPU work, but if your pipeline feeds a neural network, the tensor-native reimplementations are becoming the default on-ramp.

Exercise 8.1.1: Library Forensics Conceptual

For each symptom, name the most likely stack bug and the one-line fix: (a) a photo displayed with Matplotlib shows blue skin tones; (b) filters.gaussian output looks pitch black when saved with cv2.imwrite; (c) a 3D CT volume raises an error in cv2.GaussianBlur but not in ndimage.gaussian_filter; (d) thumbnails saved through Pillow appear rotated 90 degrees relative to how the phone displayed them.

Exercise 8.1.2: Extend the Rosetta Stone Coding

Add two rows to Table 8.1.2 in code: implement (a) Sobel gradient magnitude and (b) bilateral filtering in every library that supports them (consult Chapter 3 and Chapter 7 for the algorithms). Where a library lacks the operation, compose it from primitives or document why it cannot be done in one call. Compare outputs numerically as Code 8.1.2 does and explain any differences larger than one gray level.

Exercise 8.1.3: The Cost of Conversion Analysis

Measure the overhead of the border layer: time pil_to_cv and cv_to_skimage from Code 8.1.1 on images of size 256×256, 1024×1024, and 4096×4096. Which conversions are allocation-bound (a full copy) and which are nearly free views? Estimate what fraction of a 30 fps video pipeline's frame budget (33 ms) each conversion would consume at 1080p, and formulate a rule for when conversion cost forces a single-library pipeline.