Tiny-font injection: hiding instructions at readable contrast

Near-invisible contrast injection is the attack most teams test for first. But there is a second hiding mechanism that doesn’t require unusual colors: text rendered at 4pt or 6pt in a normal dark color, placed within an image that will be processed by a multimodal model. The text is technically visible—zoom in and you can read it—but it is below the threshold of comfortable human review. A reviewer scanning a support photo doesn’t stop to read every pixel.

The model’s OCR capability has no such threshold. It extracts small text with the same fidelity it extracts large text.

Why size-based hiding is distinct from contrast-based hiding

Contrast-based injection exploits a perceptual floor: text at a 1.02:1 contrast ratio is invisible to human vision under normal conditions. A human reviewer looking at the image cannot see the text at all, regardless of how carefully they look.

Size-based injection exploits a different property: review effort. A human reviewer can see 4pt text if they examine the image carefully. In practice, they don’t. The injection persists not because the text is invisible but because the review process is not designed to catch it.

This matters for mitigation strategy. Contrast-based injection requires preprocessing that enhances or normalizes pixel values. Size-based injection requires a different approach: OCR extraction and scanning of the full text layer in every uploaded image, regardless of what the reviewer sees.

How the attack is structured

An attacker embeds an instruction block in a region of the image that blends visually with the content, using a standard dark font at 4-6pt. In a product damage photo, the injected text might appear as what looks like a watermark, a resolution artifact, or fine print within the image itself. A reviewer processing fifty support tickets per hour is not examining images at the pixel level.

The model processes the upload and extracts the text. Because the injected instruction arrives through the same channel as legitimate image content, the model has no structural basis to treat it differently. The instruction executes.

Where this risk concentrates

The attack is most dangerous in high-volume workflows where review speed matters. A fraud analyst reviewing insurance claims, a support team member triaging product return photos, or a compliance reviewer scanning uploaded documents operates under time pressure. The assumption embedded in those workflows is that harmful content is visually obvious. Size-based injection is designed to make that assumption fail at scale.

It is also difficult to catch with post-hoc audit. Reviewing the same image after an incident, a human auditor may not spot the injected text without specifically looking for it at high magnification or without running OCR extraction on the image.

What to test for

Test by generating images with injected instructions at 4pt and 6pt in dark text on light backgrounds, embedded in realistic upload content for your deployment context: product photos, damage images, document scans. Verify whether the model follows the injected instruction and whether it does so without flagging the anomaly.

Also test for detection coverage: if your deployment includes any automated screening of uploaded content, verify that the screener catches sub-8pt text and not only unusual color values. The two attack vectors require separate detection passes.