Plans

Start with per-token billing. Upgrade to a plan when you need higher throughput.

Pay-as-you-go

$0/mo

Hobby

$5/mo

Pro

$10/mo

Intermediate

$20/mo

Scale

$50/mo

Max

$100/mo

LLM Pricing

Per-token rates for every model. Input, cached input, and output priced separately.

Quant	Context / Max out↕	Req	$ / M in↕	$ / M cache↕	$ / M out↕	Speed↕
Q4_0	164k / 164k	1x	$0.28	$0.06	$0.38	~72 t/s
Q4_0	1M / 131k	1x	$0.12	$0.02	$0.21	~62 t/s
Q4_0	1M / 131k	1x	$0.40	$0.00	$0.85	~61 t/s
Q8_0	1M / 131k	2x	$1.25	$0.25	$2.50	~64 t/s
Q4_0	262k / 262k	1x	$0.10	$0.02	$0.30	~44 t/s
Q8_0	203k / 203k	1x	$0.25	$0.05	$1.10	~44 t/s
fp8	203k / 131k	1x	$0.00	$0.00	$0.00	~135 t/s
Q4_0	203k / 203k	1x	$0.48	$0.10	$1.90	~66 t/s
Q6_K	203k / 203k	1x	$0.45	$0.09	$2.10	~55 t/s
Q8_0	203k / 203k	2x	$0.75	$0.15	$2.90	~61 t/s
greg	200k / 200k	1x	$0.30	$0.06	$0.30	~157 t/s
Q4_K_M	262k / 262k	1x	$0.35	$0.07	$1.70	~141 t/s
530b-int4	131k / 33k	1x	$1.00	$0.20	$3.00	~948 t/s
Q3_K_L	262k / 262k	1x	$0.50	$0.10	$1.99	~61 t/s
int4	262k / 262k	2x	$0.55	$0.11	$2.70	~57 t/s
awq	205k / 131k	1x	$0.11	$0.02	$0.95	~143 t/s
Q4_0	262k / 262k	1x	$0.35	$0.07	$1.75	~177 t/s
fp8	262k / 262k	1x	$0.00	$0.00	$0.00	~167 t/s
fp8	262k / 262k	1x	$0.04	$0.01	$0.15	~172 t/s
Q4_0	262k / 262k	1x	$0.20	$0.04	$1.50	~174 t/s