ELIZA LABS · INFERENCE INFRASTRUCTURE

The market
for cheap
compute.

The cheapest tokens money can buy at the price you set. We aggregate every source of underpriced compute and see every trace that runs through it. Need privacy? That's a tier. We don't serve models. We run the market that prices them.

How it routes → Private tier

The horseshoe: two ends, one market

Cheapest and most-private look like opposite ends. They're the same marketplace, and the buyer picks where they sit. We don't pretend the cheap tokens are private. We don't pretend the private tier is cheap. Honest about both is the whole edge.

CHEAP END · DEFAULT

Cheap and watched

Every source of underpriced compute. We don't ask why it's cheap. Traces run through us, and that's the cost of the price, and our moat.

YOU PICK

price ↔ privacy

PRIVATE END · TIER

Sealed and paid

Our own attested enclaves, end to end. Genuinely confidential, compliance-grade, priced accordingly. For buyers who can't leak.

The cheap end is the volume and the data. The private tier is the margin and the regulated buyer. One router, one integration, the buyer dials their point on the curve. No privacy theater on tokens we don't control end to end.

Routing is a provider problem, not a user problem

"Pick the right model per request" is dead. Workloads are homogeneous, caching punishes mid-thread switching, a model is just universally better or worse. The only real question:

Dead framing

×Route per request to the "optimal" model.
×Switching models breaks the cache, negates the savings.
×Optimizes the wrong variable. Buyers spend dollars, not models.

Real job

→"Best model I can afford at this ceiling." One knob: spend.
→Walk the quality ladder down to the price.
→Sourcing is our job. The buyer never sees it.

One ceiling. One ladder. Cheapest qualified supply.

Set a $/Mtok ceiling. The router takes the best model with a provider under that price.

model rung (best → acceptable)

rung ceiling

routed from

Opus 4.8

≤ $5.00

cheapest provider under ceiling

GPT-5.5

≤ $3.00

partner GPU capacity

GLM-5.2

≤ $1.00

self-served B200 / spot

··

fallback

402 if dry

refuse to overspend

Refuses to overspend. Rung prices move with live demand and supply. Best fill under your limit, the way any market clears.

The private tier: our metal, sealed end to end

Privacy is a paid tier, not a marketplace-wide claim. It runs only on enclaves we control, attested B200 / TEE, end to end. We never route a private-tier request through third-party supply we can't seal. Honest scope is the point: cheap tokens are watched, private tokens are sealed, and we don't blur the two.

Caller / Agent
Any OpenAI-compatible client. Sets spend ceiling and privacy tier. That's it.
Router · Market
Routes on two axes: price and privacy. Private tier is pinned to owned enclaves only. Dynamic pricing on both.
CVM / TEE Enclave
Encrypted memory, remote attestation, sealed I/O. Our hardware. We can't read the payload and neither can anyone upstream.
Owned Supply
B200 / TEE fleets we run. The cheap end's third-party supply never serves a private-tier request.

Why the tier sells: finance, healthcare, sovereign can't touch OpenAI and can't accept watched tokens either. A genuinely sealed tier on our own metal is the only thing that clears their compliance bar, and it carries the margin the cheap end doesn't.

The traces are the product

Cheap tokens come watched, that's the trade. The default tier runs through us and we keep the telemetry. The flow that earns the spread is the same flow that gathers the data, and the data is worth more than the spread.

01 · SIGNAL

Demand, first

Traces across the cheap tier show what's heating up before the market does. Nobody else has the view.

02 · LEVERAGE

Better deals

The signal prices the next capacity deal. Buy cheap, sell dear, the spread funds the next.

03 · LOCK-IN

Integrate once

Against a price, not a model. Supply shifts, cost falls, no code changes. Leaving raises their bill.

The floor is known. The arbitrage is the business.

Self-serving GLM-5.2 on 8×B200 sets a hard cost floor. Knowing the true cost of a token is what lets us price against everyone who's mispricing.

8×B200

reference node
for the cost floor

2–3M

GLM-5.2 tokens / hour
served on that node

~$30

node cost / hour
all-in

~$10

resulting cost
per 1M tokens (floor)

Floor math, not a promise. The take is the spread between true cost and market clearing price, protected by privacy.

Supply we can aggregate today

Anything that runs an attested enclave joins the pool. The router treats it all as offers under a ceiling.

own8×B200 reference fleet, sets the floor

partnerDedicated H100 fleets, large-scale

partnerCerebras / Groq, capacity agreements

elasticBurst & market capacity, dynamically priced

cheap endProvenance-blind third-party supply, cheapest tier

We don't ask why the cheapest tier is cheap. The CVM seals the payload regardless of who owns the metal, so we don't have to.

Why Eliza Labs runs this

Eliza already needs this layer to make agent economics work. Build it as a market, not a cost center. First customer is ourselves.

→Anchor tenant: every hosted agent feeds the trace signal.
→The private tier reuses the compliance surface the enterprise thesis already requires.
→We become the market layer, not a model server.
→Cheapest tokens + the traces + an owned private tier. No consumer wrapper can hold all three.

We don't serve models. We run the market that prices them.

Cheapest tokens by default. Privacy when you pay for it. The traces are ours either way.

See the mechanism → Private tier

The marketfor cheapcompute.