Section 37.5: Watermarking & Content Provenance

"They hid a secret in my pixels and signed a note saying where I came from. Crop me, compress me, screenshot me: the note may tear, but the secret was woven into the weave, not stapled to the corner."
A Watermarked Image, Quietly Confident

Big Picture

Where detection asks "is this fake?" after the fact and loses the arms race, provenance asks "where did this come from?" at the moment of creation, and answers it with two complementary tools: invisible watermarks woven into the pixels and cryptographically signed manifests attached to the file. An invisible watermark embeds a recoverable signal into an image so imperceptibly that it survives compression, resizing, and cropping, with the strongest variants baked directly into the generator so every output is marked. The C2PA standard takes the opposite approach, attaching a tamper-evident, cryptographically signed record of an image's origin and edit history. Neither is unbreakable alone (a determined adversary can strip both), so they are deployed as overlapping layers that raise the cost of deception and let honest content prove itself.

Section 37.4 ended on a hard truth: passive detection cannot win against an adversary who always moves last. This section presents the proactive answer. Instead of inferring authenticity from artifacts, we assert it at creation. The shift in burden is the whole point. A detector must catch every fake to be useful; a provenance system only has to let genuine content carry a verifiable claim, which is a far more tractable goal. The methods here connect directly to two earlier threads: invisible watermarking lives in the frequency domain of Chapter 4, and in-generation watermarking modifies the diffusion decoder of Chapter 33.

1. Invisible Watermarking: The Idea Beginner

An invisible watermark hides a payload (often just a few bits, or a model identifier) inside an image so that the human eye sees no change but a detector with the right key can recover it. The two requirements pull against each other: imperceptibility demands the change be tiny, while robustness demands the signal survive the transformations real images undergo (JPEG compression, the resizing and cropping of Chapter 5, screenshotting, mild color edits). The classic engineering answer is to embed in a transform domain rather than directly in pixels. The discrete cosine transform (DCT), the same transform that underlies JPEG itself, is a natural host: nudging mid-frequency DCT coefficients changes the image imperceptibly but in a way that compression tends to preserve, because compression also operates in the DCT domain. Embedding in pixels is fragile; embedding in frequency is the start of robustness.

2. A DCT-Domain Watermark From Scratch Intermediate

The minimal scheme embeds one bit by raising or lowering a chosen mid-frequency DCT coefficient of an 8x8 block, then recovers the bit by inspecting that coefficient. The trick the code uses is parity: divide the coefficient into steps of a fixed size and snap it to an even-numbered step to store a 0 or an odd-numbered step to store a 1, so reading the bit back is just asking whether the nearest step is even or odd. Because the snap moves the coefficient by at most half a step, the pixel change stays tiny while the stored bit is unambiguous. Real systems spread many bits across many blocks with error-correcting codes, but the one-block version shows the mechanism cleanly. The code embeds and extracts a single bit; Figure 37.5.1 shows where in the spectrum the perturbation lands and why it survives compression.

import numpy as np
from scipy.fftpack import dct, idct

def dct2(b):  return dct(dct(b.T, norm="ortho").T, norm="ortho")
def idct2(b): return idct(idct(b.T, norm="ortho").T, norm="ortho")

COEF = (3, 3)        # a mid-frequency coefficient: imperceptible yet JPEG-stable
STRENGTH = 12.0      # embedding strength; larger = more robust, less invisible

def embed_bit(block, bit):
    """Embed one bit in an 8x8 grayscale block via a mid-freq DCT coeff."""
    c = dct2(block.astype(float))
    # Quantize the coefficient to an even (bit 0) or odd (bit 1) multiple.
    q = round(c[COEF] / STRENGTH)
    if q % 2 != bit:
        q += 1
    c[COEF] = q * STRENGTH
    return np.clip(idct2(c), 0, 255)

def extract_bit(block):
    """Recover the embedded bit from an 8x8 block."""
    c = dct2(block.astype(float))
    return int(round(c[COEF] / STRENGTH)) % 2

# Round-trip on one block:
blk = np.random.randint(80, 160, (8, 8)).astype(float)
marked = embed_bit(blk, 1)
print("recovered:", extract_bit(marked))            # recovered: 1
print("max pixel change:", np.abs(marked - blk).max())  # a few gray levels

Code Fragment 1: A single-bit DCT-domain watermark using quantization index modulation: embed by snapping a mid-frequency coefficient to an even or odd quantization level, extract by reading its parity. The pixel change is a few gray levels, below perceptual threshold.

Try This: Slide STRENGTH and Watch Invisibility Fight Robustness

The one knob that controls the whole imperceptibility-versus-robustness tradeoff is STRENGTH. Embed a bit at several values, say 2.0, 12.0, and 40.0, and print np.abs(marked - blk).max() each time: the maximum pixel change climbs roughly in step with STRENGTH, so a louder watermark is a more visible one. Then push each marked block through a coarse round-trip that mimics compression (round the pixels, add small noise) before calling extract_bit, and watch the small-STRENGTH bit start flipping while the large-STRENGTH bit holds. That single sweep is the entire watermark design problem in your terminal: too quiet and noise erases it, too loud and your eye catches it, and the mid-frequency host coefficient of Figure 37.5.1 is what buys you room in between.

Figure 37.5.1: Why the watermark lives in mid frequencies. Perturbing low-frequency DCT coefficients (top-left, blue) is visible as blotchy artifacts; perturbing high frequencies (bottom-right, brown) is invisible but wiped out by JPEG compression. The mid-frequency host coefficient (orange) is the sweet spot: imperceptible to the eye yet preserved through the compression that destroys the high band. This is the tradeoff the code in subsection 2 navigates.

3. In-Generation Watermarking: Stable Signature and SynthID Intermediate

A watermark applied after generation can be skipped by anyone running the model themselves. The stronger approach bakes the watermark into the generator so every output is marked, with no optional post-step to remove. Stable Signature (Fernandez et al., 2023) fine-tunes the latent-diffusion decoder of Chapter 33 so that a trained extractor network recovers a fixed bit-string from any image it produces, embedding the watermark in the very weights that turn latents into pixels. Google DeepMind's SynthID takes the same route, integrating marking into the generation pipeline and pairing it with a learned detector robust to common edits. It is deployed across Google's image, audio, and text products, with the text-watermarking variant open-sourced in October 2024, and by 2025 the image variant reported watermarking on the order of ten billion images and video frames across Google's services. The crucial design property both share is that the watermark is a property of the model, not an afterthought, so a user of the official model cannot produce unmarked output, which is exactly the policy lever regulators have begun to require.

Key Insight: Watermark in the Decoder, Not After It

The difference between a post-hoc watermark and an in-generation one is the difference between an opt-in and a default. A post-hoc tool watermarks the images you choose to run through it; anyone bypassing the tool produces clean images. An in-generation watermark like Stable Signature is welded into the decoder weights, so the model physically cannot emit an unmarked image without retraining. That makes it the right primitive for the policy goal of "all images from this service are identifiable as AI-generated," and it is why the regulatory push lands on in-generation watermarking rather than optional post-processing: the EU AI Act's Article 50 transparency obligations, which require providers to mark synthetic image, audio, video, and text output in a machine-readable format, become enforceable on 2 August 2026.

Fun Note

A post-hoc watermark is a sticker on the bumper; an in-generation watermark is the VIN stamped into the chassis. You can peel a sticker; removing a VIN takes a grinder and intent, which is precisely the friction the design is buying. The mid-frequency band, meanwhile, is the Goldilocks zone of the spectrum: the low frequencies are too loud to hide in, the high frequencies are too fragile to survive JPEG, and the middle is just right. Mnemonic for the section: weave the secret into the weave, not the corner; and remember that no manifest proves nothing, while a valid manifest proves something.

Two cartoon cars compare a peeling paper bumper sticker being lifted away against an identifying mark stamped deep into the chassis and glowing from within, while above them a chain of three padlock-linked signed-document cards for capture, edit, and publish leads to an inspector with a checkmark stamp, contrasting a removable post-hoc watermark with an in-generation watermark and a tamper-evident C2PA provenance manifest. — A post-hoc watermark peels off like a bumper sticker; an in-generation watermark is stamped into the chassis, and a signed manifest adds a tamper-evident chain of custody on top.

4. C2PA: Cryptographically Signed Provenance Advanced

Watermarking hides a signal in the pixels. C2PA (the Coalition for Content Provenance and Authenticity standard, the technology behind the "Content Credentials" mark in deployed tools) takes the orthogonal approach of attaching metadata: a manifest recording what created the image, what edits were applied, and by whom, all sealed with a cryptographic signature from the producing software's certificate. Any later tampering breaks the signature, so the manifest is tamper-evident: you cannot silently alter the recorded history without invalidating it. The manifest travels with the file and can be verified offline against the signer's certificate chain. Figure 37.5.2 shows the structure: a chain of signed assertions from camera capture through each editing step to the final published image.

Figure 37.5.2: The C2PA provenance chain. Each stage of an image's life (capture or generation, each edit, publication) appends a cryptographically signed assertion to the manifest that travels with the file. A verifier checks the signature chain against trusted certificates; any tampering after signing breaks the chain. Unlike a watermark, C2PA records the full edit history rather than a hidden bit, and unlike detection it asserts origin rather than inferring it.

Library Shortcut: Reading C2PA Manifests

You do not implement the cryptography yourself; the C2PA project ships the c2pa library (Rust core with Python bindings) that reads, verifies, and writes manifests. Inspecting an image's provenance is a few lines:

from c2pa import Reader

# Read and verify the embedded manifest of an image file.
with Reader.from_file("published_image.jpg") as reader:
    manifest = reader.json()      # full provenance: actions, signer, ingredients
    print(manifest)               # includes "claim_generator", edit assertions

# Validation status (signature checks) is included; a broken chain is flagged.

Code Fragment 2: Reading a C2PA manifest with the official c2pa library: one reader returns the full signed provenance and flags a broken signature chain, hiding the certificate and JUMBF handling.

This replaces hundreds of lines of certificate parsing, JUMBF box handling, and signature verification with a single reader, and it is the same library that deployed cameras and editing tools use to write those manifests in the first place.

5. The Robustness Reality Advanced

Neither tool is a guarantee, and it is important to be precise about why. Watermarks face removal attacks: regeneration (passing the image through another generator), strong cropping and rescaling, and purpose-built adversarial perturbations can erase or scramble the embedded signal, and 2023 to 2024 research showed several deployed watermarks can be removed or even forged with modest effort. C2PA manifests can simply be stripped (a screenshot or a re-encode that discards metadata produces a clean file with no manifest at all), so absence of a manifest proves nothing, only presence of a valid one proves something. The honest framing is that provenance is a layered, probabilistic defense, not a cryptographic certainty: it raises the cost and friction of deception, makes honest content easy to verify, and is most powerful when watermarking and signed manifests reinforce each other and combine with the detection of Section 37.4. No single layer is the answer; the stack is.

Practical Example: A Stock-Photo Platform Rolls Out Content Credentials

Who: the trust-and-safety engineering team at a large stock-imagery marketplace, 2024, required by enterprise customers to label AI-generated content. Situation: contributors were uploading a growing share of generated images, and buyers wanted to filter or knowingly license them. Problem: contributors could not be trusted to self-declare accurately, and a pure detector (from Section 37.4) was too unreliable cross-generator to gate payments on. Decision: they adopted a layered scheme: accept C2PA manifests at upload and preserve them through their pipeline, run an in-generation watermark check for the major model providers they integrated, and fall back to a learned detector only as a soft flag for human review. Result: images arriving with valid C2PA manifests or recognized watermarks were labeled with high confidence and required no review, which covered the majority of generated uploads from cooperating tools; the unreliable detector was relegated to triage rather than verdict. Lesson: provenance worked precisely where the producers cooperated and the layers reinforced each other, and the team designed for the realistic case (most generation comes from a handful of major, watermark-cooperating providers) rather than the adversarial worst case, which detection alone could never have covered.

Research Frontier: Robust and Regulated Watermarking

The 2023 to 2026 frontier is robustness under adversarial pressure and the policy machinery being built on top. On robustness, Zhao et al. ("Invisible Image Watermarks Are Provably Removable Using Generative AI," 2023, arXiv:2306.01953) and the Tree-Ring watermark line (Wen et al., NeurIPS 2023, arXiv:2305.20030, which embeds the mark not in the pixels but as concentric ring patterns in the Fourier transform of the diffusion initial noise, then recovers it by running DDIM inversion, the same noise-recovery machinery as the editing inversions of Section 35.5, to map a suspect image back to its seed noise and read the rings; marking the seed rather than the output is what makes it survive cropping, compression, and regeneration far better than pixel or DCT marks) define the attack-and-defense edge, while SynthID's 2024 Nature publication (focused on the text-watermarking variant) and the partial open-sourcing of its detector tooling pushed production-grade marking into the open. The same push has scaled to detection infrastructure: Google announced a SynthID Detector verification portal at its I/O conference in May 2025 (rolling out first to early testers), and the SynthID-Image system (Gowal et al., 2025, arXiv:2510.09263) reports watermarking on the order of ten billion images and video frames, evidence that in-generation marking is now operating at internet scale rather than as a research prototype. On policy, the EU AI Act's Article 50 transparency provisions (enforceable from 2 August 2026) require that AI-generated media be marked in a machine-readable, detectable format, which is turning in-generation watermarking and C2PA from optional features into compliance requirements; the Act's own guidance notes that no single marking technique suffices, so a layered metadata-plus-watermark strategy is expected. The open technical question driving the field is whether any watermark can be simultaneously imperceptible, robust to regeneration attacks, and unforgeable; the current consensus, consistent with subsection 5, is that no single scheme achieves all three, so layered provenance plus regulation, rather than a perfect watermark, is the realistic trajectory.

Exercise 37.5.1: Detection Versus Provenance Conceptual

Explain in a short paragraph why "absence of a C2PA manifest" cannot be treated as evidence that an image is fake, while "presence of a valid manifest" can be treated as evidence of origin. Then contrast this asymmetry with deepfake detection from Section 37.4, and state which of the two (detection or provenance) shifts the burden onto honest content and which onto suspicious content.

Exercise 37.5.2: Measure Watermark Robustness Coding

Extend the single-bit scheme from subsection 2 to embed an 8-bit payload by spreading bits across multiple 8x8 blocks of a real grayscale image. Embed a known byte, then measure the bit-error rate after each of: JPEG compression at quality 50, downscaling to half resolution and back, and a 10 percent center crop. Report which transformations the watermark survives and which destroy it, and relate your findings to the mid-frequency tradeoff in Figure 37.5.1.

Exercise 37.5.3: Design a Layered Provenance Stack Analysis

You are designing the content-authenticity system for a news platform. Specify how you would combine in-generation watermarking, C2PA manifests, and post-hoc detection into a single trust score, stating what each layer contributes and how the system behaves when (a) an image arrives with a valid manifest and a recognized watermark, (b) an image arrives with no manifest at all, and (c) a manifest is present but its signature fails to verify. Justify your handling of each case using the robustness limits of subsection 5.