Purple Firefish

The local-first security gateway for private LLMs and tool-using AI agents.

Block prompt injection, govern AI tool use, scan retrieved content and outputs, and keep enforcement local by default.

USER_PROMPT RAG_CHUNK TOOL_OUTPUT REQUIRE_APPROVAL

Firefish shows exactly where an AI agent crosses a security boundary — and stops it before it acts.

Protection pipeline

1Prompt
2RAG Content
3Model Output
4Streaming
5Tool Calls
6Audit Trail

Prompt → RAG Content → Model Output → Streaming → Tool Calls → Audit Trail

Problem

AI risk appears between user intent, untrusted text, and agent action.

AppSec leads, platform engineers, AI builders, and conference teams need controls that understand where content came from and what an agent is about to do.

Prompts can become instructions

Attackers can ask models to ignore policy, reveal hidden prompts, or treat malicious text as a higher-priority command.

Retrieval expands the boundary

RAG chunks, webpages, PDFs, email, tool output, and memory can carry indirect instructions into trusted workflows.

Agents can create side effects

Email, webhooks, APIs, file export, delete actions, and shell-like tools need policy before they run.

Use cases

Built for teams shipping private AI systems.

Secure internal copilots

Inspect user prompts and model outputs before sensitive context, internal policy, or private runtime behavior leaks.

Protect RAG/document assistants

Apply stricter controls to retrieved documents, webpages, PDFs, email, tool output, and agent memory.

Govern tool-using AI agents

Validate proposed actions against user goals, destination risk, secrets, reversibility, and approval policy.

Product architecture

A gateway, policy engine, tool firewall, and audit trail in one boundary.

Firefish sits between apps, models, retrieved content, agent tools, and audit storage so risky instructions are evaluated before they become trusted action.

Ingress

Normalize prompts, classify source type, and preserve strict handling for untrusted content.

Detection

Combine deterministic rules, lexical signals, optional anomaly checks, and configured judge routing.

Policy

Map risk into allow, sandbox, approval, block, or redact decisions without globally lowering thresholds.

Tool firewall

Validate proposed sends, deletes, exports, network calls, and shell-like actions before execution.

AI Asset Graph

Map the AI attack surface Firefish protects.

The operator app shows a redacted asset graph from users to apps, prompts, RAG sources, models, agents, tools, external destinations, and audit evidence.

Users → Apps → Prompts → RAG → Models → Agents → Tools → Destinations → Audit

Each node carries trust zone, risk level, policy coverage, and last-seen context without exposing raw prompts, API keys, credentials, or full payloads.

TRUST_BOUNDARY UNTRUSTED_CONTEXT EXTERNAL_DESTINATION POLICY_COVERAGE
What Firefish protects

What Firefish protects.

Firefish protects the full AI boundary: prompts, retrieved content, outputs, streaming, tools, and audit.

Prompts

Detect prompt injection, jailbreaks, system prompt leakage attempts, and obfuscated instructions before model execution.

Retrieved content

Keep RAG chunks, webpages, PDFs, email, tool output, and agent memory stricter than direct user prompts.

Model outputs

Redact secrets, check unsafe completions, and preserve reviewable context without exposing raw sensitive text by default.

Streaming responses

Evaluate partial output while the response is still moving through the gateway.

Tool calls

Require clear user intent before sending email, posting webhooks, calling APIs, uploading files, or touching shell-like tools.

Audit trails

Capture decisions, reason codes, layers, and redacted previews for operators and reviewers.

Agent/tool governance

Agent actions need policy before execution.

Firefish validates proposed tool calls against user intent, sensitive data, and action risk before an agent can create outside effects.

  • Block or require approval for exfiltration, unrequested sends, webhook posts, shell execution, and credential exposure.
  • Keep validation separate from detection so tool policy remains understandable and auditable.
  • Expose plain-English review traces that explain what was stopped and why it mattered.
Tool Firewall

Firefish stops agents before they take unsafe actions.

See the full chain from user goal to proposed tool call, risk analysis, policy verdict, and safer alternative before any send, delete, webhook, shell, or API action can run.

Goal mismatch

Block an email send when the user only asked for a report summary.

Unsafe side effects

Require approval or block destructive, external, secret-bearing, or irreversible agent actions.

Local-first deployment

Local-first by default.

Firefish is designed for local enforcement. Hosted judge calls stay inactive unless explicitly configured, and local-only mode remains the safe default.

No surprise hosted calls

Production defaults avoid enabling external LLM judgment by accident.

Strict untrusted sources

Retrieved content, webpages, PDFs, emails, tools, and memory do not inherit user-prompt softening.

Trust and proof

Controls a reviewer can inspect.

Firefish is designed to make security decisions explainable without exposing raw sensitive text.

  • Local-only defaults Enforcement stays local unless external analysis is explicitly configured.
  • Fail-closed production checks Runtime configuration must be deliberate before production operation.
  • Redacted audit trails Operators get useful context without dumping secrets into review views.
  • Source-aware policy User prompts and untrusted retrieved content are not treated the same.
  • Tool-call validation Agent actions are evaluated before email, webhooks, APIs, exports, or destructive tools run.
  • Strict streaming support Streaming responses can be inspected while output is still in motion.
  • Reproducible benchmark suite Security tuning is backed by local benchmark artifacts and profile comparisons.
Benchmark proof

Measure tuning without weakening security.

Benchmark reporting compares profiles, attack recall, false positives, latency, judge usage, and anomaly routing while hashing unsafe text in summaries.

Recall

Track active injection, indirect injection, exfiltration, system prompt leakage, tool hijacking, and obfuscation coverage.

FPR

Separate benign hard negatives from active attacks so education and defensive reports are not treated like instructions.

p95

Show latency by fast, balanced, and strict profiles alongside expensive-layer routing rates.

How Firefish is different

More than a prompt filter.

Not just prompt filtering

Firefish also inspects retrieved content, outputs, streaming responses, tool calls, and audit evidence.

Not cloud-only moderation

Local-first defaults keep enforcement private unless a team explicitly configures hosted services.

Not just observability

The gateway can block, sandbox, redact, or require approval before unsafe actions proceed.

Not just a framework

Firefish is an operational boundary with API routes, policy decisions, traces, and dashboard review surfaces.

Full gateway posture

Gateway + policy + tool governance + audit gives AI teams a single place to reason about enforcement.

Built for review

Reason codes, redaction, benchmark summaries, and source-type breakdowns make security claims easier to inspect.

Demo mode

Built for a live security story.

Run deterministic synthetic scenarios that show an attack entering the pipeline, the gateway decision, and the customer-safe outcome.

Demo flow

Threat lab examples, fire drill walkthroughs, architecture proof points, runtime status, and decision traces are available in the operator app.

Docs / quickstart

Start with the local path, then wire in the gateway.

The docs walk through setup, architecture, threat model, benchmarking, and integration patterns for platform teams.

Quickstart

Run Firefish locally, keep defaults private, and test the API routes before connecting production traffic.

Documentation

Review architecture, threat model, benchmarking methodology, and operator routes.

Contact / CTA

Put a security gateway between AI intent and AI action.

Use Firefish for an AppSec review, platform evaluation, conference demo, or private AI gateway pilot.