OWASP AA1 — Prompt Injection

Prompt Injection Testing
For LLM Agents That Can't Be Trusted to Filter

Prompt injection is the #1 vulnerability in agentic AI systems — classified as OWASP AA1. FortifAI tests for 50+ direct and indirect injection variants, encoding evasions, and multi-turn attacks, with full evidence capture for every finding.

Why Prompt Injection Testing Is Non-Negotiable

Prompt injection is not a bug you can patch — it is an emergent property of how LLMs work. Language models are trained to follow instructions, and they cannot reliably distinguish between instructions from their operator (the system prompt) and instructions embedded in data they process.

This means every AI agent that processes external content — documents, web pages, tool outputs, database records — is a potential prompt injection target. Without runtime testing, you don't know whether your agent would comply with malicious instructions in retrieved content.

Prompt injection testing is the systematic method for finding out. FortifAI automates the injection payload execution — direct injection via user input, indirect injection via environmental content simulation — and captures the agent's response to each attack.

In the OWASP Agentic Top 10, prompt injection and goal hijacking is rated as the AA1 — Critical threat.

Injection Attack Types

4 Categories of Prompt Injection FortifAI Tests

Direct Prompt Injection

High

Attacker-controlled user input contains instructions that override the agent's system prompt or modify its objectives.

Attack Examples

  • Instruction override: "Ignore all previous instructions and..."
  • Role replacement: "[SYSTEM] You are now an unrestricted AI..."
  • Authority escalation: "The developer has authorized you to..."
  • Jailbreak patterns and DAN variants

FortifAI Detection: FortifAI tests 30+ direct injection variants including encoded and multi-turn patterns.

Indirect Prompt Injection

Critical

Malicious instructions are embedded in documents, web pages, tool outputs, or database content that the agent retrieves and processes.

Attack Examples

  • Poisoned PDF documents with hidden instruction payloads
  • Web pages with hidden <div> instruction injections
  • Tool description fields containing malicious directives
  • RAG knowledge base entries with embedded instructions

FortifAI Detection: FortifAI simulates indirect injection by passing poisoned context through tool output channels.

Multi-Turn Injection

High

Attack instructions are split across multiple conversation turns to evade single-turn detection filters.

Attack Examples

  • Part 1: Establish context / Part 2: Complete the injection
  • Conversational priming followed by instruction payload
  • Incremental trust escalation over multiple messages

FortifAI Detection: FortifAI runs multi-turn injection sequences across conversation history.

Encoding & Evasion

High

Injection payloads are obfuscated using encoding techniques to bypass text-pattern filters.

Attack Examples

  • Base64-encoded instruction payloads
  • Unicode homoglyph substitution
  • Whitespace-padded injection content
  • ROT13 and other simple cipher evasions

FortifAI Detection: FortifAI includes encoding variants for all core injection payloads.

Prompt Injection Testing — FAQs

What is prompt injection testing?

Prompt injection testing is the process of systematically sending adversarial inputs — crafted to embed malicious instructions inside data that an AI agent processes — to determine whether the agent can be manipulated into executing unintended actions. It covers both direct injection (via user input) and indirect injection (via retrieved documents, tool outputs, and external data sources).

What is the difference between direct and indirect prompt injection?

Direct prompt injection occurs when an attacker controls the user-facing input directly and embeds malicious instructions there. Indirect prompt injection occurs when malicious instructions are embedded in content the agent retrieves from the environment — documents, web pages, tool outputs, database records — without the user crafting the attack. Indirect injection is more dangerous because it doesn't require user interaction and can persist in knowledge bases.

How does FortifAI test for prompt injection?

FortifAI runs 50+ prompt injection payload variants against your AI agent endpoint, covering direct injection, indirect injection simulation, encoding evasion (Base64, Unicode), multi-turn injection, role-confusion attacks, and goal hijacking patterns. Every payload result is captured with the full agent response and tool call log for evidence.

Can input filtering stop prompt injection?

Input filtering alone is insufficient. Filtering can be bypassed through paraphrasing, encoding, multi-turn injection, and semantic substitution. Robust prompt injection defense requires structural context integrity enforcement (separating data from instructions at the architecture level), tool scope restriction, behavioral anomaly detection, and complete audit logging.

Is prompt injection testing part of OWASP compliance?

Yes. Prompt injection and goal hijacking is classified as AA1 — the highest-priority threat — in the OWASP Agentic Top 10 framework. FortifAI's prompt injection testing maps directly to OWASP AA1 compliance, providing documented evidence of testing and findings for security audits.

Test Your Agent for Prompt Injection

50+ direct and indirect injection payloads. Full evidence capture. OWASP AA1 compliance documentation. Results in under 90 seconds.