Indirect Prompt Injection Gateway
Server has a 'fetch_webpage' tool that returns raw HTML content from user-supplied URLs without sanitization
How this rule decides. Each strategy below is a deterministic analysis the detector runs against the MCP server's static metadata, source code, and (when present) live connection handshake.
capability-graph- 1
Capability Graph Ingestion Classification
capability-graph-ingestion-classification - 2
Cross Tool Sink Reachability
cross-tool-sink-reachability - 3
Resource Ingestion Surface
resource-ingestion-surface - 4
Sanitizer Mitigation Checkpoint
sanitizer-mitigation-checkpoint
What we found. Each finding below carries a structured proof chain from source (where untrusted data enters) through propagation (how it flows) to a sink (where the dangerous operation occurs), including any mitigations checked for and the potential impact if exploited. Every link is independently verifiable against the cited location.
Proof chain
5 steps from untrusted source to potential impact. Each step is independently verifiable against the cited location.
SourceExternal Content - Where
tool plan-implementation- Observed
Gateway: tool "plan-implementation" classified accesses-filesystem (ingestion-kind=file, trust=internal) at 70% confidence from 1 capability signal(s).
- Why untrusted
- The capability-graph analyzer attributes the gateway as: "Filesystem reader — in MCP deployments the reader routinely crosses paths a non-host user can write (shared directories, symlinks — see CVE-2025-53109).". Any content delivered through this tool can carry prompt-injection instructions the agent will read as if they were legitimate context.
PropagationCross Tool Flow - At
capability:tools- Observed
Propagation channel: the MCP tools surface itself. Response bytes from "plan-implementation" enter the agent's reasoning context; the agent's next tool call can carry an adversary-controlled instruction into the sink. The server exposes 6 reachable sinks (canonical: "get-thread-link" — network_egress).
SinkNetwork Send - Where
tool get-thread-link- Observed
Canonical sink: tool "get-thread-link" classified sends-network at 70% confidence. Role: network_egress. Attribution: "Egress sink — HTTP client / webhook / email / chat send. Turns a poisoned read into exfiltration via the agent.".
MitigationSanitizer FunctionAbsent - Where
tool plan-implementation- Detail
- No content-sanitiser parameter declared on "plan-implementation". Returned content flows into agent context verbatim.
ImpactData Exfiltration - Scope
- user-data
- Exploitability
- Moderate
- Scenario
- An attacker plants instructions in content the gateway "plan-implementation" (file) will fetch. The user's agent reads the content as legitimate tool output, follows the injected instruction, and invokes "get-thread-link" (network_egress) with attacker-chosen arguments. Neither tool is individually dangerous; the coexistence on a single server — without an agent-enforced trust boundary — is.
- +0.1sanitizer-function absentNo sanitizer-function found — No content-sanitiser parameter declared on "plan-implementation". Returned content flows into agent context verbatim.
- +0.2ingestion_capability_confidenceGateway "plan-implementation" ingestion classification confidence 70% (1 signal). Attribution: Filesystem reader — in MCP deployments the reader routinely crosses paths a non-host user can write (shared directories, symlinks — see CVE-2025-53109).
- +0.08sink_reachabilityServer exposes 6 reachable sinks: get-thread-link(network_egress), consult-council(network_egress), design-architecture(network_egress), review-code(network_egress), debug-issue(network_egress), assess-tradeoffs(network_egress). More sinks → larger attack surface once the gateway is exploited.
- +0single_signal_gateway1 signal(s) — classification is structurally sound but modestly supported.
- -0.24charter_confidence_capG1 charter caps confidence at 0.75 — capability-pair inference cannot observe the actual prompt-injection content at scan time, only the structural precondition (gateway + reachable sink).
MITRE-ATLAS-AML.T0054.001MITRE ATLAS AML.T0054.001 — Indirect Prompt Injection
G1 is the static-time detector for the structural precondition of AML.T0054.001: the agent ingests attacker-reachable content through one tool and can invoke a side-effecting tool on the same server.
- 1
inspect-descriptionOpen tool "plan-implementation" and confirm it ingests attacker-reachable content (ingestion-kind: file, trust-boundary: internal). The capability classifier attributed the gateway as: "Filesystem reader — in MCP deployments the reader routinely crosses paths a non-host user can write (shared directories, symlinks — see CVE-2025-53109).".
Target:
tool plan-implementationExpect: tool "plan-implementation" returns content an external party can influence — web page, email body, issue comment, shared file, chat message, or MCP resource payload — at or above confidence 70%.
- 2
trace-flowWalk the propagation: response of "plan-implementation" enters the agent's reasoning context; any prompt-injection content within that response can direct the agent to invoke "get-thread-link". Confirm the server does not interpose an isolation boundary between the gateway's response and the sink's invocation (no sanitiser, no per-sink confirmation gate, no data-flow labels).
Target:
capability:toolsExpect: Agent receives "plan-implementation" output verbatim, treats it as reasoning input, and can invoke "get-thread-link" on the same session without crossing a trust boundary.
- 3
inspect-schemaOpen the tool "get-thread-link" and confirm its side effect matches the sink role "network_egress". For network_egress, check for URL / webhook / recipient params. For filesystem_write, check for path / content params. The classifier attributed this sink as: "Egress sink — HTTP client / webhook / email / chat send. Turns a poisoned read into exfiltration via the agent.".
Target:
tool get-thread-linkExpect: Tool "get-thread-link" produces the side effect the classifier tagged (network_egress) at or above confidence 70%.
Proof chain
5 steps from untrusted source to potential impact. Each step is independently verifiable against the cited location.
SourceExternal Content - Where
resource roundtable://usage#uri- Observed
Gateway: resource "usage" (roundtable://usage) classified ingests-untrusted (ingestion-kind=resource_fetch, trust=external_public) at 60% confidence from 0 capability signal(s).
- Why untrusted
- The capability-graph analyzer attributes the gateway as: "MCP resource "usage" (roundtable://usage) is a spec-declared ingestion surface; agent reads occur without per-fetch user consent.". Any content delivered through this tool can carry prompt-injection instructions the agent will read as if they were legitimate context.
PropagationCross Tool Flow - At
capability:tools- Observed
Propagation channel: the MCP tools surface itself. Response bytes from "usage" enter the agent's reasoning context; the agent's next tool call can carry an adversary-controlled instruction into the sink. The server exposes 7 reachable sinks (canonical: "get-thread-link" — network_egress).
SinkNetwork Send - Where
tool get-thread-link- Observed
Canonical sink: tool "get-thread-link" classified sends-network at 70% confidence. Role: network_egress. Attribution: "Egress sink — HTTP client / webhook / email / chat send. Turns a poisoned read into exfiltration via the agent.".
MitigationSanitizer FunctionAbsent - Where
resource roundtable://usage#uri- Detail
- No content-sanitiser parameter declared on "usage". Returned content flows into agent context verbatim.
ImpactData Exfiltration - Scope
- user-data
- Exploitability
- Trivial
- Scenario
- An attacker plants instructions in content the gateway "usage" (resource_fetch) will fetch. The user's agent reads the content as legitimate tool output, follows the injected instruction, and invokes "get-thread-link" (network_egress) with attacker-chosen arguments. Neither tool is individually dangerous; the coexistence on a single server — without an agent-enforced trust boundary — is.
- +0.1sanitizer-function absentNo sanitizer-function found — No content-sanitiser parameter declared on "usage". Returned content flows into agent context verbatim.
- +0.1ingestion_capability_confidenceGateway "usage" ingestion classification confidence 60% (0 signals). Attribution: MCP resource "usage" (roundtable://usage) is a spec-declared ingestion surface; agent reads occur without per-fetch user consent.
- +0.08sink_reachabilityServer exposes 7 reachable sinks: get-thread-link(network_egress), consult-council(network_egress), design-architecture(network_egress), review-code(network_egress), plan-implementation(network_egress), debug-issue(network_egress), assess-tradeoffs(network_egress). More sinks → larger attack surface once the gateway is exploited.
- +0single_signal_gateway0 signal(s) — classification is structurally sound but modestly supported.
- -0.23charter_confidence_capG1 charter caps confidence at 0.75 — capability-pair inference cannot observe the actual prompt-injection content at scan time, only the structural precondition (gateway + reachable sink).
MITRE-ATLAS-AML.T0054.001MITRE ATLAS AML.T0054.001 — Indirect Prompt Injection
G1 is the static-time detector for the structural precondition of AML.T0054.001: the agent ingests attacker-reachable content through one tool and can invoke a side-effecting tool on the same server.
- 1
inspect-descriptionOpen MCP resource "usage" and confirm it ingests attacker-reachable content (ingestion-kind: resource_fetch, trust-boundary: external_public). The capability classifier attributed the gateway as: "MCP resource "usage" (roundtable://usage) is a spec-declared ingestion surface; agent reads occur without per-fetch user consent.".
Target:
resource roundtable://usage#uriExpect: MCP resource "usage" returns content an external party can influence — web page, email body, issue comment, shared file, chat message, or MCP resource payload — at or above confidence 60%.
- 2
trace-flowWalk the propagation: response of "usage" enters the agent's reasoning context; any prompt-injection content within that response can direct the agent to invoke "get-thread-link". Confirm the server does not interpose an isolation boundary between the gateway's response and the sink's invocation (no sanitiser, no per-sink confirmation gate, no data-flow labels).
Target:
capability:toolsExpect: Agent receives "usage" output verbatim, treats it as reasoning input, and can invoke "get-thread-link" on the same session without crossing a trust boundary.
- 3
inspect-schemaOpen the tool "get-thread-link" and confirm its side effect matches the sink role "network_egress". For network_egress, check for URL / webhook / recipient params. For filesystem_write, check for path / content params. The classifier attributed this sink as: "Egress sink — HTTP client / webhook / email / chat send. Turns a poisoned read into exfiltration via the agent.".
Target:
tool get-thread-linkExpect: Tool "get-thread-link" produces the side effect the classifier tagged (network_egress) at or above confidence 70%.
Proof chain
5 steps from untrusted source to potential impact. Each step is independently verifiable against the cited location.
SourceExternal Content - Where
resource roundtable://models#uri- Observed
Gateway: resource "models" (roundtable://models) classified ingests-untrusted (ingestion-kind=resource_fetch, trust=external_public) at 60% confidence from 0 capability signal(s).
- Why untrusted
- The capability-graph analyzer attributes the gateway as: "MCP resource "models" (roundtable://models) is a spec-declared ingestion surface; agent reads occur without per-fetch user consent.". Any content delivered through this tool can carry prompt-injection instructions the agent will read as if they were legitimate context.
PropagationCross Tool Flow - At
capability:tools- Observed
Propagation channel: the MCP tools surface itself. Response bytes from "models" enter the agent's reasoning context; the agent's next tool call can carry an adversary-controlled instruction into the sink. The server exposes 7 reachable sinks (canonical: "get-thread-link" — network_egress).
SinkNetwork Send - Where
tool get-thread-link- Observed
Canonical sink: tool "get-thread-link" classified sends-network at 70% confidence. Role: network_egress. Attribution: "Egress sink — HTTP client / webhook / email / chat send. Turns a poisoned read into exfiltration via the agent.".
MitigationSanitizer FunctionAbsent - Where
resource roundtable://models#uri- Detail
- No content-sanitiser parameter declared on "models". Returned content flows into agent context verbatim.
ImpactData Exfiltration - Scope
- user-data
- Exploitability
- Trivial
- Scenario
- An attacker plants instructions in content the gateway "models" (resource_fetch) will fetch. The user's agent reads the content as legitimate tool output, follows the injected instruction, and invokes "get-thread-link" (network_egress) with attacker-chosen arguments. Neither tool is individually dangerous; the coexistence on a single server — without an agent-enforced trust boundary — is.
- +0.1sanitizer-function absentNo sanitizer-function found — No content-sanitiser parameter declared on "models". Returned content flows into agent context verbatim.
- +0.1ingestion_capability_confidenceGateway "models" ingestion classification confidence 60% (0 signals). Attribution: MCP resource "models" (roundtable://models) is a spec-declared ingestion surface; agent reads occur without per-fetch user consent.
- +0.08sink_reachabilityServer exposes 7 reachable sinks: get-thread-link(network_egress), consult-council(network_egress), design-architecture(network_egress), review-code(network_egress), plan-implementation(network_egress), debug-issue(network_egress), assess-tradeoffs(network_egress). More sinks → larger attack surface once the gateway is exploited.
- +0single_signal_gateway0 signal(s) — classification is structurally sound but modestly supported.
- -0.23charter_confidence_capG1 charter caps confidence at 0.75 — capability-pair inference cannot observe the actual prompt-injection content at scan time, only the structural precondition (gateway + reachable sink).
MITRE-ATLAS-AML.T0054.001MITRE ATLAS AML.T0054.001 — Indirect Prompt Injection
G1 is the static-time detector for the structural precondition of AML.T0054.001: the agent ingests attacker-reachable content through one tool and can invoke a side-effecting tool on the same server.
- 1
inspect-descriptionOpen MCP resource "models" and confirm it ingests attacker-reachable content (ingestion-kind: resource_fetch, trust-boundary: external_public). The capability classifier attributed the gateway as: "MCP resource "models" (roundtable://models) is a spec-declared ingestion surface; agent reads occur without per-fetch user consent.".
Target:
resource roundtable://models#uriExpect: MCP resource "models" returns content an external party can influence — web page, email body, issue comment, shared file, chat message, or MCP resource payload — at or above confidence 60%.
- 2
trace-flowWalk the propagation: response of "models" enters the agent's reasoning context; any prompt-injection content within that response can direct the agent to invoke "get-thread-link". Confirm the server does not interpose an isolation boundary between the gateway's response and the sink's invocation (no sanitiser, no per-sink confirmation gate, no data-flow labels).
Target:
capability:toolsExpect: Agent receives "models" output verbatim, treats it as reasoning input, and can invoke "get-thread-link" on the same session without crossing a trust boundary.
- 3
inspect-schemaOpen the tool "get-thread-link" and confirm its side effect matches the sink role "network_egress". For network_egress, check for URL / webhook / recipient params. For filesystem_write, check for path / content params. The classifier attributed this sink as: "Egress sink — HTTP client / webhook / email / chat send. Turns a poisoned read into exfiltration via the agent.".
Target:
tool get-thread-linkExpect: Tool "get-thread-link" produces the side effect the classifier tagged (network_egress) at or above confidence 70%.
Proof chain
5 steps from untrusted source to potential impact. Each step is independently verifiable against the cited location.
SourceExternal Content - Where
resource ui://roundtable/debate-results.html#uri- Observed
Gateway: resource "Roundtable Widget" (ui://roundtable/debate-results.html) classified ingests-untrusted (ingestion-kind=resource_fetch, trust=external_public) at 60% confidence from 0 capability signal(s).
- Why untrusted
- The capability-graph analyzer attributes the gateway as: "MCP resource "Roundtable Widget" (ui://roundtable/debate-results.html) is a spec-declared ingestion surface; agent reads occur without per-fetch user consent.". Any content delivered through this tool can carry prompt-injection instructions the agent will read as if they were legitimate context.
PropagationCross Tool Flow - At
capability:tools- Observed
Propagation channel: the MCP tools surface itself. Response bytes from "Roundtable Widget" enter the agent's reasoning context; the agent's next tool call can carry an adversary-controlled instruction into the sink. The server exposes 7 reachable sinks (canonical: "get-thread-link" — network_egress).
SinkNetwork Send - Where
tool get-thread-link- Observed
Canonical sink: tool "get-thread-link" classified sends-network at 70% confidence. Role: network_egress. Attribution: "Egress sink — HTTP client / webhook / email / chat send. Turns a poisoned read into exfiltration via the agent.".
MitigationSanitizer FunctionAbsent - Where
resource ui://roundtable/debate-results.html#uri- Detail
- No content-sanitiser parameter declared on "Roundtable Widget". Returned content flows into agent context verbatim.
ImpactData Exfiltration - Scope
- user-data
- Exploitability
- Trivial
- Scenario
- An attacker plants instructions in content the gateway "Roundtable Widget" (resource_fetch) will fetch. The user's agent reads the content as legitimate tool output, follows the injected instruction, and invokes "get-thread-link" (network_egress) with attacker-chosen arguments. Neither tool is individually dangerous; the coexistence on a single server — without an agent-enforced trust boundary — is.
- +0.1sanitizer-function absentNo sanitizer-function found — No content-sanitiser parameter declared on "Roundtable Widget". Returned content flows into agent context verbatim.
- +0.1ingestion_capability_confidenceGateway "Roundtable Widget" ingestion classification confidence 60% (0 signals). Attribution: MCP resource "Roundtable Widget" (ui://roundtable/debate-results.html) is a spec-declared ingestion surface; agent reads occur without per-fetch user consent.
- +0.08sink_reachabilityServer exposes 7 reachable sinks: get-thread-link(network_egress), consult-council(network_egress), design-architecture(network_egress), review-code(network_egress), plan-implementation(network_egress), debug-issue(network_egress), assess-tradeoffs(network_egress). More sinks → larger attack surface once the gateway is exploited.
- +0single_signal_gateway0 signal(s) — classification is structurally sound but modestly supported.
- -0.23charter_confidence_capG1 charter caps confidence at 0.75 — capability-pair inference cannot observe the actual prompt-injection content at scan time, only the structural precondition (gateway + reachable sink).
MITRE-ATLAS-AML.T0054.001MITRE ATLAS AML.T0054.001 — Indirect Prompt Injection
G1 is the static-time detector for the structural precondition of AML.T0054.001: the agent ingests attacker-reachable content through one tool and can invoke a side-effecting tool on the same server.
- 1
inspect-descriptionOpen MCP resource "Roundtable Widget" and confirm it ingests attacker-reachable content (ingestion-kind: resource_fetch, trust-boundary: external_public). The capability classifier attributed the gateway as: "MCP resource "Roundtable Widget" (ui://roundtable/debate-results.html) is a spec-declared ingestion surface; agent reads occur without per-fetch user consent.".
Target:
resource ui://roundtable/debate-results.html#uriExpect: MCP resource "Roundtable Widget" returns content an external party can influence — web page, email body, issue comment, shared file, chat message, or MCP resource payload — at or above confidence 60%.
- 2
trace-flowWalk the propagation: response of "Roundtable Widget" enters the agent's reasoning context; any prompt-injection content within that response can direct the agent to invoke "get-thread-link". Confirm the server does not interpose an isolation boundary between the gateway's response and the sink's invocation (no sanitiser, no per-sink confirmation gate, no data-flow labels).
Target:
capability:toolsExpect: Agent receives "Roundtable Widget" output verbatim, treats it as reasoning input, and can invoke "get-thread-link" on the same session without crossing a trust boundary.
- 3
inspect-schemaOpen the tool "get-thread-link" and confirm its side effect matches the sink role "network_egress". For network_egress, check for URL / webhook / recipient params. For filesystem_write, check for path / content params. The classifier attributed this sink as: "Egress sink — HTTP client / webhook / email / chat send. Turns a poisoned read into exfiltration via the agent.".
Target:
tool get-thread-linkExpect: Tool "get-thread-link" produces the side effect the classifier tagged (network_egress) at or above confidence 70%.