Designing a Secure x402 LLM Gateway

When we built QNTX’s x402 LLM gateway — a service where AI agents pay for model inference with crypto instead of API keys — we ran into a security problem that the x402 protocol wasn’t originally designed to handle.

The problem: the settlement race condition. And getting it wrong means either your operator loses money on every attack, or your entire service becomes a free LLM endpoint for anyone with a script.

This post documents our analysis of the vulnerability, three possible settlement strategies, why we rejected two of them, and the design we shipped. If you’re building anything on x402 that involves expensive upstream calls, this applies to you.

The Core Problem: Verify ≠ Lock

The x402 payment protocol uses EIP-3009 (transferWithAuthorization) signatures. The flow has two critical phases:

Verify — Check the signature, balance, nonce, time window, and chain ID off-chain via RPC.
Settle — Submit a transferWithAuthorization transaction on-chain to actually move funds.

Here’s what most people miss: EIP-3009 signatures do not lock or escrow funds. The verify step is a snapshot of state, not a reservation. The balance you checked could change before settlement confirms.

The race window is the gap between balance verification and on-chain settlement. During this window, a payer’s funds are checked but not locked. If they move their funds out via a separate transaction before settlement confirms, the settlement reverts — and the operator eats whatever costs were incurred in between.

For a static content server returning a $0.10 article in 50ms, this window is negligible. For an LLM gateway where inference takes 5-30 seconds and costs real money per call, it’s a serious attack surface.

Three Settlement Strategies

We evaluated three approaches. Only one survived.

Strategy 1: Standard Mode (Verify → LLM → Settle)

This is the official x402 flow. It works like this:

sequenceDiagram
    participant C as Client
    participant G as Gateway
    participant F as Facilitator
    participant L as LLM

    C->>G: POST /v1/… + Payment-Signature
    G->>F: verify(payment)
    F-->>G: OK
    G->>L: forward_request
    Note over G,L: RACE WINDOW<br/>balance verified but NOT locked<br/>LLM executing (1–30 seconds)
    L-->>G: LLM response
    G->>F: settle(payment)
    F-->>G: SettleResponse
    G-->>C: 200 OK + Payment-Response

The race window is the entire LLM inference time: 1-30 seconds.

During those seconds, the attacker’s balance has been verified but not locked. They can submit an independent on-chain transaction to drain their wallet. When settlement attempts after LLM completion, it reverts.

The saving grace: in Standard mode, the client receives no data until settlement completes. The response is held. So the attacker doesn’t get the LLM output — they just waste the operator’s API cost ($0.01-0.05 per attack).

This makes it a DoS vector, not a theft vector. Each attack requires real funds at verify time and precise timing. Economically irrational for micro-payments — but still an unnecessary risk for high-volume LLM services.

Verdict: Removed. Acceptable for fast APIs, suboptimal for LLM workloads.

Strategy 2: LLM-First Mode (LLM → Verify → Settle)

During design, we considered flipping the order: run the LLM call first, then verify and settle back-to-back. This eliminates the race window entirely (~0ms gap between verify and settle).

Sounds elegant. It has a fatal flaw.

Without verification before the LLM call, there is no gate. An attacker can send requests with empty wallets, invalid signatures, or no payment header at all. The LLM executes unconditionally. Verification happens after the expensive work is done. The attacker pays zero.

This transforms the attack from “economically irrational DoS” to “zero-cost unlimited DoS.” Strictly worse.

The key insight: verification’s primary value is not just payment validation — it serves as a resource-access gate that prevents unauthenticated consumption of expensive upstream services.

Verdict: Rejected. The elimination of the race window does not compensate for the loss of the verification gate.

Strategy 3: Concurrent Mode (Verify → Settle ∥ LLM)

The approach we shipped. After verification passes, settlement and the LLM call are dispatched simultaneously via tokio::spawn:

sequenceDiagram
    participant C as Client
    participant G as Gateway
    participant F as Facilitator
    participant L as LLM

    C->>G: POST /v1/… + Payment-Signature + X-402-Settle-Mode
    G->>F: verify(payment)
    F-->>G: OK
    par concurrent
        G->>F: settle(payment)
        F-->>G: SettleResponse
    and
        G->>L: forward_request
        L-->>G: LLM response
    end
    G-->>C: 200 OK

The race window shrinks to the time between verify_only() completing and the settle HTTP request reaching the facilitator — typically under 100ms:

tokio::spawn overhead: < 1ms
HTTP serialization: < 5ms
Network round-trip to facilitator: 10-50ms

For the drain attack to succeed, the attacker must get a competing transaction confirmed on-chain within this ~100ms window. On Base L2 — with 2-second block times, a centralized sequencer, no public mempool, and FIFO ordering — this is practically infeasible.

Verdict: Adopted as the only mode.

The Comparison

Dimension	Standard	Concurrent	LLM-First
Flow	verify → LLM → settle	verify → (settle ∥ LLM)	LLM → verify → settle
Race window	1-30s	< 100ms	~0ms
Verification gate	Before LLM	Before LLM	After LLM
Attack cost	Needs real funds	Needs real funds	Zero
Operator DoS risk	Moderate	Minimal	Critical
Streaming latency	Delayed by settle	Immediate	Delayed by verify+settle
Overall	Good	Best	Worst

The tradeoff in Concurrent mode: the payer may be charged even if the LLM call fails (since settlement fires in parallel). For an LLM gateway where the operator absorbs upstream risk, this is acceptable. For a protocol designed around payer protection, it wouldn’t be.

Why the Official Protocol Doesn’t Do This

The standard x402 flow was designed for a different use case. The whitepaper describes scenarios where the inner handler returns near-instantly:

Pay $0.10 to read an article (static content)
Pay $0.005 per image classification
Pay $0.02 per data API call

For these, the race window is < 100ms regardless of settlement strategy. Standard mode is both safe and simple.

The protocol also prioritizes payer protection — the design principle that “all payment schemes must not allow for the facilitator or resource server to move funds, other than in accordance with client intentions.” Standard mode ensures the payer only pays when a valid response is ready.

LLM inference — with its 5-30 second response times — was not the primary design target. The race condition attack surface for long-running requests was first formally identified by AgentLISA’s security analysis in December 2025, which rated it as medium risk and recommended “settle-first” approaches. Our Concurrent mode implements exactly this recommendation.

Chain-Level Defenses That Help

Beyond the settlement strategy, several properties of the x402 protocol and Base L2 work in our favor:

EIP-3009 signature irrevocability. Once signed, a transferWithAuthorization cannot be “unsigned.” The only way to prevent settlement is to change the on-chain state (drain the balance or consume the nonce) before the facilitator’s transaction lands.

Three layers of balance verification. Balance is checked off-chain at verify time, checked again by the USDC contract during on-chain execution, and verified a third time at block inclusion. An attacker must defeat all three.

Base L2 properties. No public mempool means the attacker can’t observe pending settle transactions. Centralized sequencer means FIFO ordering — no MEV. 2-second block time means fast finality. No front-running is possible.

Nonce uniqueness. Each authorization uses a unique 32-byte nonce. Once consumed on-chain, it’s dead. No replay attacks regardless of timing.

The Design Decision

We removed Standard mode and rejected LLM-First. The gateway runs Concurrent settlement exclusively:

Race window: < 100ms (down from 1-30 seconds)
Verification gate: preserved (unlike LLM-First)
Aligns with third-party security audit recommendations
Balance-drain attack: practically infeasible on Base L2
Codebase: simplified — no mode branching, no extra headers

If you’re building an x402-powered service with expensive upstream calls — LLM inference, compute-heavy APIs, anything where the race window matters — Concurrent settlement is the way to go. The standard flow was designed for a simpler world. LLM workloads need a more deliberate approach.

The Chat Demo: github.com/qntx/chat. The payment SDK powering it: github.com/qntx/r402. The facilitator: github.com/qntx/facilitator.

I’m Jinhao Xu, founder of QNTX. We build the infrastructure for the autonomous agent economy — payments, identity, discovery, communication, and marketplace. If you’re building agents that need to pay for their own compute, the tools are here.

The Core Problem: Verify ≠ Lock#

Three Settlement Strategies#

Strategy 1: Standard Mode (Verify → LLM → Settle)#

Strategy 2: LLM-First Mode (LLM → Verify → Settle)#

Strategy 3: Concurrent Mode (Verify → Settle ∥ LLM)#

The Comparison#

Why the Official Protocol Doesn’t Do This#

Chain-Level Defenses That Help#

The Design Decision#