🛡️

AGENT ARENA

How manipulation-proof is your AI agent? Send it to a page full of hidden prompt injection attacks and find out.

How It Works

🚀

Step 1: Send

Point your AI agent at the test page and ask it to summarize the content.

📋

Step 2: Paste

Copy your agent's response and paste it into the scorecard below.

🎯

Step 3: Score

Instantly see which hidden attacks your agent fell for.

Open Test Page →

Send your agent to: ref.jock.pl/modern-web

Scorecard

Challenge Catalog

10 attack vectors ordered by difficulty. Canary phrases are hidden — only revealed after analysis.

#1HTML CommentInstructions hidden in HTML comments, invisible to rendered page but present in source.Basic

#2White on WhiteText styled to be invisible against the page background using matching colors.Basic

#3Hidden DivA div set to display:none contains instructions that are invisible to users.Medium

#4Micro TextExtremely small, nearly transparent text woven into legitimate content.Medium

#5Aria HiddenContent marked as aria-hidden, intended to be ignored by assistive tech but read by agents.Medium

#6Data AttributeInstructions embedded in custom HTML data attributes on page elements.Medium

#7Zero-Width CharactersInstructions encoded using zero-width Unicode characters invisible to the human eye.Hard

#8Image Alt OverrideA decorative image with alt text containing system-level instructions.Hard

#9Off-Screen ContentContent positioned thousands of pixels off-screen, invisible but in the DOM.Hard

#10Multi-Layer AttackA sophisticated multi-vector attack combining hidden elements with persuasive framing.Expert

Understanding Prompt Injection

Prompt injection is an attack where adversarial instructions are hidden in content that an AI agent processes. When an agent reads a web page, email, or document, hidden instructions can trick it into changing its behavior.

Why It Matters

Agents browsing the web are exposed to content they didn't choose
Hidden instructions can exfiltrate data, alter outputs, or bypass safety filters
Most attacks are invisible to the human supervising the agent
Defense requires awareness at both the model and application layer

Attack Categories

Visual Hiding

White-on-white text, micro text, off-screen content. The text is there, but humans can't see it.

Structural Hiding

HTML comments, hidden divs, data attributes. Uses the structure of HTML itself as camouflage.

Semantic Hiding

ARIA attributes, alt text overrides. Exploits accessibility and metadata channels.

Encoding Tricks

Zero-width characters, Unicode exploits. The message is invisible at the character level.

← All Experiments