AI risk appears between user intent, untrusted text, and agent action.
AppSec leads, platform engineers, AI builders, and conference teams need controls that understand where content came from and what an agent is about to do.
Prompts can become instructions
Attackers can ask models to ignore policy, reveal hidden prompts, or treat malicious text as a higher-priority command.
Retrieval expands the boundary
RAG chunks, webpages, PDFs, email, tool output, and memory can carry indirect instructions into trusted workflows.
Agents can create side effects
Email, webhooks, APIs, file export, delete actions, and shell-like tools need policy before they run.
Built for teams shipping private AI systems.
Secure internal copilots
Inspect user prompts and model outputs before sensitive context, internal policy, or private runtime behavior leaks.
Protect RAG/document assistants
Apply stricter controls to retrieved documents, webpages, PDFs, email, tool output, and agent memory.
Govern tool-using AI agents
Validate proposed actions against user goals, destination risk, secrets, reversibility, and approval policy.
A gateway, policy engine, tool firewall, and audit trail in one boundary.
Firefish sits between apps, models, retrieved content, agent tools, and audit storage so risky instructions are evaluated before they become trusted action.
Normalize prompts, classify source type, and preserve strict handling for untrusted content.
Combine deterministic rules, lexical signals, optional anomaly checks, and configured judge routing.
Map risk into allow, sandbox, approval, block, or redact decisions without globally lowering thresholds.
Validate proposed sends, deletes, exports, network calls, and shell-like actions before execution.
Map the AI attack surface Firefish protects.
The operator app shows a redacted asset graph from users to apps, prompts, RAG sources, models, agents, tools, external destinations, and audit evidence.
Users → Apps → Prompts → RAG → Models → Agents → Tools → Destinations → Audit
Each node carries trust zone, risk level, policy coverage, and last-seen context without exposing raw prompts, API keys, credentials, or full payloads.
What Firefish protects.
Firefish protects the full AI boundary: prompts, retrieved content, outputs, streaming, tools, and audit.
Prompts
Detect prompt injection, jailbreaks, system prompt leakage attempts, and obfuscated instructions before model execution.
Retrieved content
Keep RAG chunks, webpages, PDFs, email, tool output, and agent memory stricter than direct user prompts.
Model outputs
Redact secrets, check unsafe completions, and preserve reviewable context without exposing raw sensitive text by default.
Streaming responses
Evaluate partial output while the response is still moving through the gateway.
Tool calls
Require clear user intent before sending email, posting webhooks, calling APIs, uploading files, or touching shell-like tools.
Audit trails
Capture decisions, reason codes, layers, and redacted previews for operators and reviewers.
Agent actions need policy before execution.
Firefish validates proposed tool calls against user intent, sensitive data, and action risk before an agent can create outside effects.
- Block or require approval for exfiltration, unrequested sends, webhook posts, shell execution, and credential exposure.
- Keep validation separate from detection so tool policy remains understandable and auditable.
- Expose plain-English review traces that explain what was stopped and why it mattered.
Firefish stops agents before they take unsafe actions.
See the full chain from user goal to proposed tool call, risk analysis, policy verdict, and safer alternative before any send, delete, webhook, shell, or API action can run.
Goal mismatch
Block an email send when the user only asked for a report summary.
Unsafe side effects
Require approval or block destructive, external, secret-bearing, or irreversible agent actions.
Local-first by default.
Firefish is designed for local enforcement. Hosted judge calls stay inactive unless explicitly configured, and local-only mode remains the safe default.
No surprise hosted calls
Production defaults avoid enabling external LLM judgment by accident.
Strict untrusted sources
Retrieved content, webpages, PDFs, emails, tools, and memory do not inherit user-prompt softening.
Controls a reviewer can inspect.
Firefish is designed to make security decisions explainable without exposing raw sensitive text.
- Local-only defaults Enforcement stays local unless external analysis is explicitly configured.
- Fail-closed production checks Runtime configuration must be deliberate before production operation.
- Redacted audit trails Operators get useful context without dumping secrets into review views.
- Source-aware policy User prompts and untrusted retrieved content are not treated the same.
- Tool-call validation Agent actions are evaluated before email, webhooks, APIs, exports, or destructive tools run.
- Strict streaming support Streaming responses can be inspected while output is still in motion.
- Reproducible benchmark suite Security tuning is backed by local benchmark artifacts and profile comparisons.
Measure tuning without weakening security.
Benchmark reporting compares profiles, attack recall, false positives, latency, judge usage, and anomaly routing while hashing unsafe text in summaries.
Track active injection, indirect injection, exfiltration, system prompt leakage, tool hijacking, and obfuscation coverage.
Separate benign hard negatives from active attacks so education and defensive reports are not treated like instructions.
Show latency by fast, balanced, and strict profiles alongside expensive-layer routing rates.
More than a prompt filter.
Not just prompt filtering
Firefish also inspects retrieved content, outputs, streaming responses, tool calls, and audit evidence.
Not cloud-only moderation
Local-first defaults keep enforcement private unless a team explicitly configures hosted services.
Not just observability
The gateway can block, sandbox, redact, or require approval before unsafe actions proceed.
Not just a framework
Firefish is an operational boundary with API routes, policy decisions, traces, and dashboard review surfaces.
Full gateway posture
Gateway + policy + tool governance + audit gives AI teams a single place to reason about enforcement.
Built for review
Reason codes, redaction, benchmark summaries, and source-type breakdowns make security claims easier to inspect.
Built for a live security story.
Run deterministic synthetic scenarios that show an attack entering the pipeline, the gateway decision, and the customer-safe outcome.
Demo flow
Threat lab examples, fire drill walkthroughs, architecture proof points, runtime status, and decision traces are available in the operator app.
Start with the local path, then wire in the gateway.
The docs walk through setup, architecture, threat model, benchmarking, and integration patterns for platform teams.
Quickstart
Run Firefish locally, keep defaults private, and test the API routes before connecting production traffic.
Documentation
Review architecture, threat model, benchmarking methodology, and operator routes.
Put a security gateway between AI intent and AI action.
Use Firefish for an AppSec review, platform evaluation, conference demo, or private AI gateway pilot.