Watermarking

Layer 2 embeds an imperceptible payload in the asset’s signal data. The watermark survives common transformations that would strip or corrupt an embedded C2PA manifest (re-encoding, cropping, screenshots, format conversion). When the manifest is gone, the watermark ID points back to the manifest record in the soft-binding index.

Engines

Modality	Engine	Licence	Payload capacity	Notes
Image	Adobe TrustMark	MIT	48 bits	Arbitrary resolution; C2PA soft-binding compatible
Audio	Meta AudioSeal	MIT	16 bits/segment	Localised — detects watermarked seconds, not entire file
Video	Meta VideoSeal	MIT	16 bits/frame-group	Temporal-propagation; survives re-encode at reasonable bitrates
Text	Google SynthID-Text	Apache 2.0	Green/red token logits	Generation-time hook only; cannot watermark existing text

Engine selection is set by the recipe field watermark.engine. Mixed-modality assets (e.g. video with audio) embed both VideoSeal and AudioSeal when the recipe requests it.

Payload format

Each watermark encodes a 16-byte ULID (watermark_id). The ULID maps to a manifest record in Postgres.

The raw payload includes a 4-byte truncated HMAC-SHA-256 over (tenant_id, watermark_id). This prevents cross-tenant payload spoofing: a watermark ID decoded from an asset cannot be claimed by a different tenant.

[ watermark_id: 16 bytes ][ hmac_truncated: 4 bytes ]

Quality targets

Modality	Metric	Target	Measurement
Image	PSNR	≥ 42 dB	vs. original pre-watermark
Audio	PESQ	≥ 3.0 (speech)	MOS-LQO on speech corpus
Video	VMAF	≥ 90	vs. source frame sequence

Quality is measured automatically in CI for each model update. A regression below threshold fails the build.

Robustness table

The following transformations are covered in the Public Beta robustness benchmark (make bench.image, make bench.audio, make bench.video).

Image (TrustMark)

Attack	Detection rate
JPEG re-encode (quality ≥ 70)	≥ 99%
PNG round-trip	100%
Resize (0.5× to 2×)	≥ 97%
Crop (≥ 50% area preserved)	≥ 92%
Screenshot (desktop, 1× DPI)	≥ 95%
WebP conversion	≥ 98%
Brightness/contrast ±20%	≥ 96%
JPEG quality 50–70	≥ 90%

Audio (AudioSeal)

Attack	Detection rate
MP3 re-encode (≥ 128 kbps)	≥ 99%
AAC re-encode	≥ 97%
Resample (44.1→22 kHz)	≥ 95%
Noise addition (SNR ≥ 20 dB)	≥ 96%
Speed change ±10%	≥ 90%

Video (VideoSeal)

Attack	Detection rate
H.264 re-encode (CRF ≤ 28)	≥ 98%
H.265 re-encode (CRF ≤ 28)	≥ 97%
Resolution change (720p→480p)	≥ 94%
Screen-record at 1×	≥ 93%

What watermarking does not cover

High-effort adversarial removal attacks (e.g. adversarial perturbation specifically targeting TrustMark)
Extreme re-encoding at very low bitrate (JPEG quality < 50, MP3 < 64 kbps)
Image attacks that reduce PSNR below 30 dB (the watermark may persist but image quality is unusable)
Audio pitch-shifting > ±20%
SynthID-Text: any post-generation transformation (paraphrase, translation, abbreviation)

Robustness benchmarks are gated in CI. Results are published in docs/WATERMARK-POLICY.md.

SynthID-Text: generation-time only

SynthID-Text works by modifying token logit distributions at generation time. It must be integrated as a hook inside the LLM inference pipeline — it cannot watermark existing text. The recipe text-genai-disclosure-v1 documents the required hook interface.

If your use case requires watermarking text that already exists (e.g. extracted OCR), use a C2PA manifest sidecar instead.

Watermark as a backup signal

The watermark is one layer of the provenance stack — not a standalone guarantee. If the watermark is present and the manifest is also present, the verification result is verified_manifest_and_watermark_match (highest confidence). If only the watermark is present, the result is watermark_only (lower confidence). If neither is present, the result is no_provenance.

See Verification States for the full signal-to-state mapping.