Watermarking
Watermarking
Layer 2 embeds an imperceptible payload in the asset’s signal data. The watermark survives common transformations that would strip or corrupt an embedded C2PA manifest (re-encoding, cropping, screenshots, format conversion). When the manifest is gone, the watermark ID points back to the manifest record in the soft-binding index.
Engines
| Modality | Engine | Licence | Payload capacity | Notes |
|---|---|---|---|---|
| Image | Adobe TrustMark | MIT | 48 bits | Arbitrary resolution; C2PA soft-binding compatible |
| Audio | Meta AudioSeal | MIT | 16 bits/segment | Localised — detects watermarked seconds, not entire file |
| Video | Meta VideoSeal | MIT | 16 bits/frame-group | Temporal-propagation; survives re-encode at reasonable bitrates |
| Text | Google SynthID-Text | Apache 2.0 | Green/red token logits | Generation-time hook only; cannot watermark existing text |
Engine selection is set by the recipe field watermark.engine. Mixed-modality assets (e.g. video with audio) embed both VideoSeal and AudioSeal when the recipe requests it.
Payload format
Each watermark encodes a 16-byte ULID (watermark_id). The ULID maps to a manifest record in Postgres.
The raw payload includes a 4-byte truncated HMAC-SHA-256 over (tenant_id, watermark_id). This prevents cross-tenant payload spoofing: a watermark ID decoded from an asset cannot be claimed by a different tenant.
[ watermark_id: 16 bytes ][ hmac_truncated: 4 bytes ]Quality targets
| Modality | Metric | Target | Measurement |
|---|---|---|---|
| Image | PSNR | ≥ 42 dB | vs. original pre-watermark |
| Audio | PESQ | ≥ 3.0 (speech) | MOS-LQO on speech corpus |
| Video | VMAF | ≥ 90 | vs. source frame sequence |
Quality is measured automatically in CI for each model update. A regression below threshold fails the build.
Robustness table
The following transformations are covered in the Public Beta robustness benchmark (make bench.image, make bench.audio, make bench.video).
Image (TrustMark)
| Attack | Detection rate |
|---|---|
| JPEG re-encode (quality ≥ 70) | ≥ 99% |
| PNG round-trip | 100% |
| Resize (0.5× to 2×) | ≥ 97% |
| Crop (≥ 50% area preserved) | ≥ 92% |
| Screenshot (desktop, 1× DPI) | ≥ 95% |
| WebP conversion | ≥ 98% |
| Brightness/contrast ±20% | ≥ 96% |
| JPEG quality 50–70 | ≥ 90% |
Audio (AudioSeal)
| Attack | Detection rate |
|---|---|
| MP3 re-encode (≥ 128 kbps) | ≥ 99% |
| AAC re-encode | ≥ 97% |
| Resample (44.1→22 kHz) | ≥ 95% |
| Noise addition (SNR ≥ 20 dB) | ≥ 96% |
| Speed change ±10% | ≥ 90% |
Video (VideoSeal)
| Attack | Detection rate |
|---|---|
| H.264 re-encode (CRF ≤ 28) | ≥ 98% |
| H.265 re-encode (CRF ≤ 28) | ≥ 97% |
| Resolution change (720p→480p) | ≥ 94% |
| Screen-record at 1× | ≥ 93% |
What watermarking does not cover
- High-effort adversarial removal attacks (e.g. adversarial perturbation specifically targeting TrustMark)
- Extreme re-encoding at very low bitrate (JPEG quality < 50, MP3 < 64 kbps)
- Image attacks that reduce PSNR below 30 dB (the watermark may persist but image quality is unusable)
- Audio pitch-shifting > ±20%
- SynthID-Text: any post-generation transformation (paraphrase, translation, abbreviation)
Robustness benchmarks are gated in CI. Results are published in docs/WATERMARK-POLICY.md.
SynthID-Text: generation-time only
SynthID-Text works by modifying token logit distributions at generation time. It must be integrated as a hook inside the LLM inference pipeline — it cannot watermark existing text. The recipe text-genai-disclosure-v1 documents the required hook interface.
If your use case requires watermarking text that already exists (e.g. extracted OCR), use a C2PA manifest sidecar instead.
Watermark as a backup signal
The watermark is one layer of the provenance stack — not a standalone guarantee. If the watermark is present and the manifest is also present, the verification result is verified_manifest_and_watermark_match (highest confidence). If only the watermark is present, the result is watermark_only (lower confidence). If neither is present, the result is no_provenance.
See Verification States for the full signal-to-state mapping.