Plans

Start with per-token billing. Upgrade to a plan when you need higher throughput.

Pay-as-you-go

$0/mo

  • Pay only for usage
  • No recurring monthly charge

Hobby

$5/mo

  • 500 daily requests
  • Access to all models

Pro

$10/mo

  • All Hobby benefits
  • 1,000 daily requests
  • Priority Support

Intermediate

$20/mo

  • All Pro benefits
  • 2,500 daily requests

Scale

$50/mo

  • All Intermediate benefits
  • 7,500 daily requests

Max

$100/mo

  • All Scale benefits
  • 15,000 daily requests

LLM Pricing

Per-token rates for every model. Input, cached input, and output priced separately.

ModelQuantContext / Max outReq$ / M in$ / M cache$ / M outSpeed
Q4_0164k / 164k1x$0.28$0.06$0.38~72 t/s
Q4_01M / 131k1x$0.12$0.02$0.21~62 t/s
Q4_01M / 131k1x$0.40$0.00$0.85~61 t/s
Q8_01M / 131k2x$1.25$0.25$2.50~64 t/s
Q4_0262k / 262k1x$0.10$0.02$0.30~44 t/s
Q8_0203k / 203k1x$0.25$0.05$1.10~44 t/s
fp8203k / 131k1x$0.00$0.00$0.00~135 t/s
Q4_0203k / 203k1x$0.48$0.10$1.90~66 t/s
Q6_K203k / 203k1x$0.45$0.09$2.10~55 t/s
Q8_0203k / 203k2x$0.75$0.15$2.90~61 t/s
greg200k / 200k1x$0.30$0.06$0.30~157 t/s
Q4_K_M262k / 262k1x$0.35$0.07$1.70~141 t/s
530b-int4131k / 33k1x$1.00$0.20$3.00~948 t/s
Q3_K_L262k / 262k1x$0.50$0.10$1.99~61 t/s
int4262k / 262k2x$0.55$0.11$2.70~57 t/s
awq205k / 131k1x$0.11$0.02$0.95~143 t/s
Q4_0262k / 262k1x$0.35$0.07$1.75~177 t/s
fp8262k / 262k1x$0.00$0.00$0.00~167 t/s
fp8262k / 262k1x$0.04$0.01$0.15~172 t/s
Q4_0262k / 262k1x$0.20$0.04$1.50~174 t/s