Part IV: Generative Vision Models
Chapter 37: Evaluation, Safety & Generative Data Engines

Deepfakes, Detection & Misuse

"I can put any face on any body and any words in any mouth. The detector that catches me today will be retired by my next checkpoint. We are not playing a game with a winner; we are playing one with a scoreboard."

A Face-Swapping Network With No Remorse
Big Picture

Once generators are realistic enough to deceive, detection becomes an adversarial arms race fought over statistical artifacts, and it is a race the defender keeps losing on out-of-distribution generators because the artifacts a detector learns are specific to the generators it was trained on. Deepfakes are produced by face-swapping autoencoders, GAN-based reenactment, and increasingly by diffusion models applied to identity and lip motion. Detectors hunt for the fingerprints these methods leave: spatial inconsistencies, telltale frequency-domain periodicities from upsampling, and physiological impossibilities. Each works well on the generators in its training set and degrades sharply on new ones. Understanding both sides, how fakes are made and why detection generalizes poorly, is what lets you build realistic defenses and reason honestly about the threat rather than trusting a detector blindly.

Every technique in this part can be turned to harm, and faces are where the harm concentrates because identity is uniquely abusable. Having put generators to constructive work as data engines in Section 37.3, we now treat the generator as an adversary and ask what a defender can actually do about a forged face. The honest answer is sobering, and this section is built to make that honesty precise rather than alarmist: by the end you will know how the major deepfake techniques work, what fingerprints detectors hunt for, and exactly why those detectors generalize so poorly to the next generator. Detection is useful, especially layered with the provenance methods of Section 37.5, but it is not a solved problem and will not become one.

1. How Deepfakes Are Made Beginner

"Deepfake" covers several distinct techniques, in rough order of how hard they are to detect:

The common structure across all of them, an encoder that compresses, a manipulation in a compact space, and a decoder that upsamples back to pixels, is exactly what leaves the artifacts detectors exploit.

2. The Artifacts Detectors Hunt Intermediate

Generation leaves three families of fingerprints. Spatial inconsistencies appear at the boundary where a swapped face is blended into the original frame: mismatched skin tone, lighting that does not match the scene, soft seams around the jaw, or eyes and teeth rendered with subtly wrong geometry. Frequency-domain fingerprints are the most reliable and the most generator-specific: the transposed convolutions and upsampling layers that decoders use to grow a small latent into a full image impose periodic patterns in the spectrum that real camera images, with their natural sensor noise from Chapter 1, do not have. This is the frequency analysis of Chapter 4 turned into a forensic tool. Physiological and temporal cues include unnatural or absent blinking, inconsistent head pose, and (for video) flicker between frames where a per-frame generator fails to maintain temporal coherence. Figure 37.4.1 maps these onto a face.

eye geometry teeth/mouth detail blend seam (jawline) frequency spectrum periodic upsampling peaks (not in real photos)
Figure 37.4.1: The artifact families a deepfake detector exploits. On the face: blend seams at the jawline, geometry errors around the eyes, and texture errors in the teeth and mouth. On the right: the frequency spectrum of a generated image shows periodic peaks from the decoder's upsampling layers, regular structure that real sensor images lack. The spectral fingerprint is the most reliable cue and the one the detector in subsection 3 targets.

3. A Frequency-Fingerprint Detector From Scratch Intermediate

The most accessible detector exploits the spectral fingerprint. Compute the 2D Fourier transform of an image (the machinery of Chapter 4), take the log-magnitude spectrum, and reduce it to a radial profile: the average magnitude at each distance from the center frequency. Real photographs have a smooth, monotonically decaying radial profile (natural images are dominated by low frequencies). Many generators introduce spikes or a raised high-frequency shelf from their upsampling. A simple classifier on the radial profile separates the two surprisingly well, on the generators it was trained on.

import numpy as np

def radial_spectrum(gray):
    """Azimuthally averaged log-magnitude spectrum of a grayscale image.

    gray: 2D float array in [0,1]. Returns a 1D radial profile.
    """
    f = np.fft.fftshift(np.fft.fft2(gray))         # center the spectrum
    mag = np.log1p(np.abs(f))                       # log magnitude
    h, w = mag.shape
    cy, cx = h // 2, w // 2
    y, x = np.indices((h, w))
    r = np.hypot(x - cx, y - cy).astype(int)        # radius of each pixel
    # Average magnitude at each integer radius -> radial profile.
    radial = np.bincount(r.ravel(), mag.ravel()) / np.bincount(r.ravel())
    return radial[: min(cy, cx)]                     # keep the valid range

def fingerprint_features(gray):
    """Compact features: high-frequency energy ratio and profile roughness."""
    prof = radial_spectrum(gray)
    half = len(prof) // 2
    hf_ratio = prof[half:].mean() / (prof[:half].mean() + 1e-8)  # raised shelf?
    roughness = np.abs(np.diff(prof)).mean()                     # spikiness
    return np.array([hf_ratio, roughness])

# Train a logistic regression on fingerprint_features of real vs fake crops:
# from sklearn.linear_model import LogisticRegression
# clf = LogisticRegression().fit(X_train, y_train)   # y: 0=real, 1=fake
# On the TRAINING generator this routinely exceeds 0.95 AUC.
Code Fragment 1: A from-scratch spectral-fingerprint detector: reduce each image to its radial frequency profile and extract a high-frequency energy ratio and a roughness measure that separate generated images from real ones, for generators resembling the training set.
Library Shortcut: Pretrained Deepfake Detectors

The hand-crafted spectral features above teach the principle, but production detectors are deep networks trained on large fake-vs-real corpora. Open implementations such as the DeepfakeBench toolkit bundle many detectors and benchmarks behind a uniform interface:

# Conceptual usage of a bundled detector (DeepfakeBench-style API).
from deepfakebench import load_detector

detector = load_detector("ucf", pretrained=True)   # a learned CNN detector
prob_fake = detector.predict(face_crop)             # in [0, 1]
print(f"P(fake) = {prob_fake:.2f}")
Code Fragment 2: A bundled pretrained detector (DeepfakeBench-style API): one model call returns a fake probability, with face cropping, alignment, and the cross-generator protocol handled internally.

This replaces the feature engineering and a full training run with a model call, and the toolkit handles face cropping, alignment, and the cross-generator evaluation protocol. The catch is the same one the from-scratch detector exposes: high accuracy on in-distribution generators, a sharp drop on unseen ones, which is the subject of the next subsection.

4. Why Detection Loses the Arms Race Advanced

A detector trained to spot one generator's artifacts learns features specific to that generator. When a new architecture appears (a new GAN, a new diffusion sampler) with different upsampling and different artifacts, the detector's accuracy collapses toward chance, often falling from above 0.95 in-distribution AUC (area under the ROC curve, where 1.0 is perfect and 0.5 is chance) to near 0.6 cross-generator. Sit with those two numbers: a detector that was right essentially every time on the generator it trained against is, on a generator one checkpoint newer, barely distinguishable from flipping a coin (0.6 AUC is most of the way from perfect, 1.0, back down to pure guessing, 0.5). The detector did not get worse at its job; the job silently changed underneath it. This is the central, repeatedly confirmed finding of deepfake-detection research.

The structural reason is adversarial: the generator and detector form the same minimax dynamic you saw inside a single GAN (Chapter 32), except the loop spans the entire research community and runs on the timescale of publication. Any artifact a detector keys on becomes a loss term the next generator is trained to remove. Worse, common post-processing (JPEG recompression, resizing, the lossy transforms every social platform applies) destroys exactly the high-frequency fingerprints the detector relies on, so a fake that survives one upload round may be undetectable. The illustration below captures the trap: publishing a detector hands the answer key to an adversary who always moves last.

A cartoon detector character proudly pins up a poster of the telltale spectral fingerprint it learned to catch, while a sly generator robot peeks over its shoulder, copies that exact pattern onto a checklist, and erases it from its next clean output as a calendar leaf flutters past, illustrating why deepfake detection loses the arms race when every published artifact becomes a loss term the next generator removes.
Publishing a deepfake detector is handing over the answer key: every artifact you announce becomes the very thing the next generator is trained to erase.
Key Insight: Detection Is Necessary but Not Sufficient

The arms-race structure means passive detection can never be a complete defense. It is genuinely useful as one layer (catching low-effort fakes, flagging suspicious content for human review, raising the cost of attacks) but it cannot provide a guarantee, because the adversary always moves last. This is the argument for the proactive complement in Section 37.5: rather than trying to detect every fake after the fact, watermark and cryptographically sign content at creation so authenticity is asserted rather than inferred. Detection asks "is this fake?" and can be defeated; provenance asks "where did this come from?" and shifts the burden. A serious system uses both, and trusts neither alone.

You Could Build This: A Cross-Generator Robustness Dashboard Advanced

The arms-race claim of this subsection is not an opinion, it is a measurement you can reproduce in an afternoon, and reproducing it makes a striking portfolio project. Using the fingerprint_features detector from subsection 3, build a small dashboard that trains the detector on real images versus one generator (say a single StyleGAN or diffusion checkpoint), then evaluates it unchanged against a grid of held-out generators and a grid of post-processing transforms (JPEG quality 90, 60, and 30, a half-resolution rescale, a center crop). Render the result as a heatmap of AUC: rows are test generators, columns are perturbations, color is how far the detector fell from its in-distribution score. The diagonal glows and everything off it collapses toward 0.6, the visual proof that detection generalizes poorly. This is the same cross-generator protocol the DeepfakeBench benchmark formalizes, and a clean heatmap with a written paragraph on what it implies for trusting any single detector is the kind of honest, well-scoped artifact that stands out in an interview precisely because it reports a limitation rather than a win.

Fun Note

Publishing a deepfake detector is like publishing the answer key to a test the other side gets to rewrite. Every artifact you announce becomes a loss term in the next generator's training run, so the detector that wins the benchmark this spring is the cautionary footnote by autumn. The defender's special curse is that the adversary always submits last. Signature phrase for the section: detection asks "is this fake?" and can be defeated; provenance asks "where did this come from?" and changes the question.

5. Benchmarks and the Threat Model Intermediate

Progress is measured on standard benchmarks: FaceForensics++ (manipulated videos across several methods with quality levels), the large-scale DeepFake Detection Challenge dataset, Celeb-DF (higher-quality swaps that defeated earlier detectors), and newer diffusion-era sets like DiffusionForensics. The crucial evaluation protocol is cross-dataset: train on one, test on another, because in-dataset numbers are optimistic exactly as subsection 4 warns. On the harm side, the realistic threat model spans non-consensual intimate imagery (the largest documented category of real-world deepfake abuse), fraud (voice and video impersonation for financial scams), political disinformation, and identity theft. The technical detail of how fakes are made matters less than situating it: this is a societal harm with a technical surface, and the responsible-deployment framework of Section 37.6 treats it as such.

Practical Example: A Newsroom's Verification Desk Learns the Limits

Who: the visual-verification team at an international news organization, 2024, vetting user-submitted video during a fast-moving event. Situation: they had licensed a state-of-the-art deepfake detector that scored above 0.95 on its published benchmark. Problem: a viral clip flagged as authentic by the detector turned out to be a diffusion-based fake from a generator newer than the detector's training set, and a genuine clip was flagged as fake because platform recompression had scrambled its frequency statistics. Decision: they demoted the detector from a verdict to one input, and built a layered workflow: detector score plus provenance checks (C2PA manifests where present, from Section 37.5), reverse-image search, source contact, and physical-consistency analysis by a trained human. Result: the layered process caught both the false negative and the false positive the detector alone had gotten wrong, at the cost of slower verification. Lesson: a detector's benchmark number is its best case on its own distribution; in the wild, treat any single detector as one fallible signal among several, never as ground truth, exactly as the arms-race analysis predicts.

Research Frontier: Generalizable and Diffusion-Era Detection

The frontier in 2024 to 2026 is detection that generalizes across generators rather than overfitting to one. Promising directions include detectors built on the features of large pretrained backbones rather than narrow fake-vs-real CNNs (Ojha et al., "Towards Universal Fake Image Detectors that Generalize Across Generative Models," CVPR 2023, arXiv:2302.10174, showed that a frozen CLIP feature space detects unseen generators far better than a trained-from-scratch detector), reconstruction-based detection that flags images a diffusion model can reconstruct suspiciously well (Wang et al.'s DIRE, ICCV 2023, arXiv:2303.09295), and benchmarks such as GenImage and DiffusionForensics built specifically to test cross-generator transfer. The honest consensus across this work is that robustness to unseen generators and to real-world post-processing remains the hard, unsolved core, which keeps the field's center of gravity shifting from after-the-fact detection toward the at-creation provenance of Section 37.5.

Exercise 37.4.1: Why Cross-Dataset Matters Conceptual

A vendor advertises 98 percent accuracy for their deepfake detector. Explain in a short paragraph what question you must ask about how that number was computed before trusting it, why an in-dataset 98 percent can coexist with a cross-dataset 60 percent, and how the adversarial structure of subsection 4 makes the cross-dataset number the only one that predicts real-world performance.

Exercise 37.4.2: Build and Break a Spectral Detector Coding

Using fingerprint_features from subsection 3, train a logistic-regression detector on real images versus images from one generator (for example StyleGAN or one diffusion checkpoint), and confirm high AUC on a held-out split of the same generator. Then evaluate the same detector, unchanged, on images from a different generator and report the AUC drop. Finally, JPEG-recompress the test images at quality 60 and measure how much further accuracy falls. Summarize what each experiment demonstrates about the limits of frequency-based detection.

Exercise 37.4.3: Design a Layered Verification Pipeline Analysis

You are asked to design the verification workflow for a platform that receives user-uploaded images. Sketch a pipeline that combines at least three independent signals (for example a learned detector, a provenance/C2PA check from Section 37.5, and a reverse-image-search step), specify how you would combine them into an action (allow, flag for human review, block), and explain how your design degrades gracefully when the detector is wrong, using the newsroom example as a reference point.