The same NVIDIA H100 GPU rents for $1.99/hour at one provider and $5.00/hour at another in 2026.
Same silicon. Same FP8 throughput. Same 80GB HBM3. Same NVLink. Different cost structures, different sales motions, different margin expectations — and a 2.5× pricing spread that determines whether your AI infrastructure bill is reasonable or two-and-a-half times too high.
This is the canonical guide to cloud GPU pricing across all major providers in 2026. It covers the 4-tier provider landscape, current hourly rates for H100, A100, H200, and emerging Blackwell SKUs, the reserved-vs-on-demand math that institutional buyers should be running, and the seven hidden costs that turn published hourly rates into actual bills 30–50% higher.
For the own-side equivalent (cluster TCO when you buy hardware instead of rent), see The 3-Year TCO of Owning 100 H100 GPUs. For specific GPU SKU comparisons, see A100 vs H100 vs H200. For the buyer-side decision framework: Buy vs Rent GPUs.
TL;DR
Cloud GPU pricing in May 2026 across providers, on-demand $/hour:
| GPU | Hyperscaler | Specialty | Long-tail | Spread |
|---|---|---|---|---|
| A100 80GB | $1.50–$3.00 | $1.10–$1.50 | $0.80–$1.10 | 3.75× |
| H100 SXM5 | $3.50–$5.00 | $2.50–$3.50 | $1.99–$2.50 | 2.5× |
| H200 SXM5 | $4.50–$7.00 | $3.50–$4.50 | $2.50–$3.50 | 2.8× |
Three rules:
Never accept hyperscaler on-demand pricing for these GPUs unless you have a specific reason. Long-tail providers are 40–60% cheaper for identical hardware.
For predictable workloads, use reserved capacity, not on-demand. Reservation cuts 30–50% off the hourly rate.
The headline GPU rate is roughly 60–70% of your real cloud bill. Egress, storage, management, and ecosystem services add the rest. Plan for 30–50% above the GPU sticker.
Mercatus GPU Index tracks live cloud GPU pricing across 22+ providers in real time — the comparison tool buyers use to find competitive rates without contacting providers individually.
The 4-tier provider landscape
Cloud GPU providers fall into four broad categories with very different pricing logic. Understanding which tier a provider sits in tells you most of what you need to know about their pricing.
Tier 1: Hyperscalers (AWS, Azure, GCP, Oracle)
The traditional cloud giants. Their GPU offerings are bundled into broader cloud ecosystems with deep integration into existing services (storage, networking, identity, ML platforms).
Hyperscaler positioning: premium pricing for ecosystem integration, enterprise sales relationships, and contractual reliability.
Pricing approach:
- On-demand: 50–100% above market median
- Reserved (1–3 year): 25–35% below their on-demand
- Spot/preemptible: 50–80% below on-demand, with eviction risk
- Volume contracts: negotiable at very high spend levels
Who actually wins with hyperscalers:
- Enterprise customers with existing cloud commitments (use up your AWS spend)
- Workloads heavily dependent on bundled services (S3 + GPU compute on same network)
- Compliance-bound deployments where the hyperscaler’s certifications matter
- Customers willing to negotiate enterprise contracts at $1M+/year
For most other use cases, hyperscaler GPU on-demand pricing is structurally non-competitive.
Tier 2: Specialty GPU clouds (CoreWeave, Lambda, RunPod, Crusoe)
Purpose-built GPU cloud providers. Their entire business is renting GPUs, often with better-than-hyperscaler reliability for AI workloads specifically.
Specialty positioning: middle-tier pricing with hyperscaler-grade reliability and AI-specific optimizations.
Pricing approach:
- On-demand: ~40% below hyperscaler on-demand
- Reserved: 30–40% below their on-demand
- Spot: 30–50% below on-demand, lower eviction rates than hyperscalers
Where specialty providers excel:
- Mid-market AI buyers (companies spending $100K–$10M/year on GPU compute)
- Workloads needing reliability without hyperscaler markup
- Multi-region AI deployments without hyperscaler cloud commitments
This is where most of the institutional AI compute market sits in 2026.
Tier 3: Long-tail and regional providers (DataCrunch, Vultr, Hetzner, Vast.ai, plus dozens of regional)
A growing ecosystem of mid-size GPU operators, often regional, often founder-led, often running on lower-cost regional power.
Long-tail positioning: aggressive pricing, lean operations, regional advantages.
Pricing approach:
- On-demand: 50–60% below hyperscaler on-demand
- Reserved: 30–50% below their on-demand
- Spot: aggressive — sometimes 70%+ below hyperscaler on-demand
Tradeoffs:
- Less brand recognition (procurement teams may push back)
- Variable reliability (best providers match specialty tier, worst are flaky)
- Limited or no enterprise sales relationships
- Geographic constraints (regional providers serve specific regions well)
This is where the best GPU pricing lives if you’re willing to do provider research and vetting. For comparative analysis, see Cheapest GPU Cloud Providers.
Tier 4: Decentralized inference networks (Akash, Bittensor, io.net, Render)
DePIN-style GPU networks aggregating supply from many individual operators.
Decentralized positioning: sometimes-aggressive pricing through aggregated supply; quality and reliability varies by network.
Pricing approach:
- On-demand only (most networks): pricing varies by network and operator
- Often 30–60% below hyperscaler on-demand, but with reliability tradeoffs
- Reserved capacity not standard yet
Tradeoffs:
- Reliability heavily varies — best networks now match Tier 3, worst are unsuitable for production
- Integration complexity higher than centralized providers
- Compliance and data-residency stories are weaker
For the right workloads (batch processing, non-time-sensitive), Tier 4 can be cost-effective. For production serving with SLAs, most teams stay in Tiers 2–3.
H100 cloud pricing — current rates by provider tier
The H100 is the 2026 default for AI training and inference. Cloud rates as of May 2026 across the four tiers:
| Provider tier | On-demand $/hr | Reserved 1yr $/hr | Reserved 3yr $/hr | Spot $/hr |
|---|---|---|---|---|
| Hyperscaler (AWS p5, Azure ND H100 v5, GCP A3) | $3.50 – $5.00 | $2.80 – $3.80 | $2.20 – $3.00 | $1.50 – $3.00 |
| Specialty (CoreWeave, Lambda) | $2.50 – $3.50 | $2.00 – $2.80 | $1.70 – $2.30 | $1.20 – $2.00 |
| Long-tail | $1.99 – $2.50 | $1.60 – $2.10 | $1.30 – $1.80 | $0.80 – $1.50 |
| Decentralized | $1.50 – $2.50 | n/a | n/a | $0.60 – $1.50 |
The most useful number in this table: at the long-tail tier, reserved 3-year H100 capacity runs $1.30–$1.80/hour — within 15–25% of owned-cluster effective cost (~$1.40–$1.80/hour at 70% utilization for a single GPU; ~$3.00 at cluster scale before optimization).
This is the reason cloud rentals dominate the institutional buyer decision in 2026: the cost gap between owning and renting at the right provider is small enough that the operational simplicity of cloud almost always wins.
For continuously updated H100 pricing across all 22+ tracked providers, see Mercatus GPU Index.
For the full single-H100 economics story: H100 GPU Cost.
A100 and H200 pricing — what changes vs H100
A100 cloud pricing
The A100 is one generation behind H100 (Ampere vs Hopper, 2020 vs 2022) and prices accordingly. As of May 2026:
| Provider tier | A100 80GB on-demand $/hr | Notes |
|---|---|---|
| Hyperscaler | $1.50 – $3.00 | Some hyperscalers have phased out A100 entirely |
| Specialty | $1.10 – $1.50 | Mid-tier sweet spot |
| Long-tail | $0.80 – $1.10 | Aggressive pricing on aging hardware |
The A100 still wins for cost-sensitive workloads (fine-tuning, smaller-model inference, research) where H100’s FP8 advantage doesn’t activate. See A100 vs H100 for the workload-by-workload framework.
H200 cloud pricing
H200 launched in 2024 and reached broad availability through 2025. By 2026, its pricing tracks H100 with a 25–30% premium:
| Provider tier | H200 SXM5 on-demand $/hr |
|---|---|
| Hyperscaler | $4.50 – $7.00 |
| Specialty | $3.50 – $4.50 |
| Long-tail | $2.50 – $3.50 |
The H200 premium reflects 76% more memory and 43% more memory bandwidth at identical compute. For long-context inference and large model serving, the premium pays back. For training and short-context workloads, H100 remains the better economic choice. See H200 Price and H100 vs H200 for the decision framework.
Blackwell (B100, B200) pricing
NVIDIA’s Blackwell generation is shipping in early-cohort volumes through 2026. Public pricing is still emerging, but the directional view:
- B100 on-demand: $7.00–$12.00/hr at hyperscalers; $5.00–$7.00/hr at specialty providers (where available); long-tail availability remains limited
- B200: even higher, with limited public availability
For most teams in 2026, buying H100 or H200 today and upgrading on a 2-year cycle is the right call — Blackwell supply is constrained, prices are high, and the marginal advantage over H200 is narrower than the H200 advantage over H100. Track Blackwell availability and pricing through GPU Index.
Why the same GPU prices 2.5× differently across providers
The 2.5× cross-provider spread for identical hardware is the central fact of cloud GPU pricing in 2026. It’s not a market inefficiency that will resolve quickly — it’s a structural feature of the closed market.
A breakdown of where the cost differences come from:
| Cost component (per GPU-hour) | Long-tail provider | Hyperscaler |
|---|---|---|
| Hardware amortization (3yr, 75% util) | $0.55 | $0.55 |
| Power | $0.10 – $0.25 | $0.10 – $0.20 |
| Colocation | $0.08 – $0.15 | $0.05 – $0.10 |
| Networking + storage | $0.05 – $0.10 | $0.10 – $0.20 |
| Ops + customer support | $0.05 – $0.15 | $0.30 – $0.60 |
| Sales + marketing overhead | $0.05 – $0.10 | $0.50 – $1.00 |
| Margin | $0.20 – $0.50 | $1.00 – $2.00 |
| Total $/hr | $1.99 – $2.50 | $3.50 – $5.00 |
Three observations:
Hardware costs are ~identical across tiers. Both hyperscalers and long-tail providers pay similar capex for H100 SXM5. The hardware amortization line is roughly the same regardless of provider.
Hyperscaler power and colocation are actually cheaper per kW than most long-tail providers because of scale. What drives hyperscaler pricing up is not infrastructure cost.
The hyperscaler premium is sales overhead and margin. Enterprise sales motion costs real money. Public-company shareholder margin expectations require it. Long-tail providers run leaner sales operations and accept lower margins in exchange for volume and growth.
This isn’t a temporary inefficiency. It’s a structural feature of the closed market. For the deeper analysis, see Why GPU Prices Differ by 30%+ for the Same Hardware.
The implication for buyers: always shop providers. The savings dwarf any conceivable switching cost.
On-demand vs reserved vs spot — the pricing model decision
Beyond provider tier, the pricing model you select moves cost by 30–80%.
On-demand
Pay-as-you-go, no commitment. Highest flexibility, highest price.
Use on-demand when: workload is unpredictable, scale is uncertain, or you’re testing a new provider before committing.
Reserved (1–3 year)
Commit to use a specified GPU capacity for a set period. Cuts cost 25–50% vs on-demand at the same provider.
| Reservation length | Typical discount vs on-demand |
|---|---|
| 1-year reserved | 25–35% |
| 3-year reserved | 35–50% |
| Custom (5+ year) | 50%+ in some cases |
Use reserved when: you have 12+ months of predictable workload that will sustain reasonable utilization (>40% of the reserved capacity).
The reservation math is straightforward: if you’d use the capacity at >50% utilization regardless, reserved beats on-demand. Below ~40% utilization, on-demand wins because you’re not consuming what you’ve paid for.
For cluster-scale buyers, reserved 3-year capacity at long-tail providers ($1.30–$1.80/GPU-hour for H100) is genuinely competitive with owning.
Spot / preemptible
Buy unused capacity at deep discount, with risk of eviction when the provider needs it back. Cuts cost 50–80% vs on-demand.
Use spot when: workload is interruptible (batch training, non-time-sensitive inference, async processing). Spot pricing is excellent for fine-tuning runs, batch experiments, and embarrassingly-parallel workloads.
Don’t use spot for: production user-facing serving, time-sensitive jobs, or anything where eviction would cause material problems.
Mixing pricing models
Sophisticated AI infrastructure deployments combine all three:
- Reserved baseline for predictable production workload
- On-demand for spikes above baseline and new workload classes
- Spot for batch jobs, fine-tuning, and async processing
Done well, this drives effective compute cost 40–60% below pure on-demand pricing.
The 7 hidden costs in cloud GPU pricing
The headline GPU $/hour rate is roughly 60–70% of your actual cloud bill. The other 30–40% is below-the-line costs that don’t appear in pricing-page comparisons.
1. Egress (data transfer out)
Cloud providers charge to move data out of their network. For AI workloads with large datasets or active inference traffic, egress is a non-trivial cost line.
| Provider tier | Typical egress pricing |
|---|---|
| Hyperscaler | $0.05–$0.12/GB (with volume discounts at scale) |
| Specialty | $0.02–$0.08/GB |
| Long-tail | $0.005–$0.04/GB or no egress charges |
For high-throughput inference workloads, egress can be 5–25% of the total cloud bill. Long-tail providers’ minimal-or-zero egress pricing is a meaningful structural cost advantage that’s not captured in $/GPU-hour comparisons.
2. Storage
GPU compute requires data. Storage isn’t free.
- High-performance storage (NVMe-class, 10+ GB/s): $0.10–$0.30 per GB-month
- Object storage (S3-class): $0.015–$0.025 per GB-month
- Backup and snapshots: typically 50–100% premium over base storage
For a typical AI training operation handling 10TB of training data and 50TB of intermediate artifacts, storage costs run $2,000–$8,000/month, depending on tier and provider.
3. Network ingress for high-throughput training
For multi-node training with InfiniBand-class fabric, the high-bandwidth networking is sometimes priced separately:
- Hyperscalers: bundled into instance pricing (with caps)
- Specialty: usually bundled
- Long-tail: sometimes bundled, sometimes separate
For 8–64 GPU training jobs requiring 400G InfiniBand, expect $200–$800/month in some provider configurations.
4. Management and orchestration
Kubernetes, ML platforms (SageMaker, Vertex, Azure ML), workflow orchestration, and similar add costs:
- Hyperscaler ML platforms: 10–15% premium on raw compute when used
- Self-managed Kubernetes on raw GPUs: ops overhead but no platform tax
- Specialty providers’ platforms: typically 5–10% premium
Worth weighing the platform-vs-raw tradeoff explicitly. Many teams overpay for hyperscaler ML platforms when raw GPU rental at long-tail providers plus self-managed orchestration would deliver 40–50% lower total cost.
5. Identity, security, and compliance services
For enterprise deployments: SSO, RBAC, audit logging, key management, secrets, compliance certifications. These are sometimes free at long-tail providers (because they don’t offer them) and significant additions at hyperscalers (because they offer comprehensive enterprise tooling).
For compliance-bound deployments, hyperscaler advantages here can outweigh the GPU pricing premium. For non-compliance-sensitive deployments, this is mostly cost without value.
6. Support and SLA tiers
- Basic support: usually included
- Production support tier: 5–10% of compute spend
- Enterprise support: 10–15% of compute spend
For mission-critical deployments, support tier upgrades are real money. Worth understanding the actual SLA you’re paying for and whether you need it.
7. Idle and underutilization waste
The most common hidden cost: provisioned capacity you don’t use. If you reserve 100 GPUs and use them at 60% utilization, you’ve paid for 100 but received the value of 60 — a 40% cost waste that doesn’t show up as a separate line item but very much hits your effective per-useful-hour cost.
This is also the cost component cloud rentals can’t offset. Owned hardware with idle capacity can be monetized through Mercatus Provider listings; cloud-rented capacity cannot.
How to evaluate real cloud GPU cost
A useful framework for comparing cloud GPU offers, used in institutional procurement:
True effective $/GPU-useful-hour =
(Hourly rate
+ Egress per hour at expected throughput
+ Storage per hour at expected dataset size
+ Platform/management premium
+ Support tier premium)
/ Realistic utilization rate
Walk through this for each provider tier:
| Component | Hyperscaler (AWS p5) | Long-tail provider |
|---|---|---|
| H100 hourly on-demand | $4.50 | $2.20 |
| Egress (1TB/day at 24-hour usage) | $0.21/hr | $0.04/hr |
| Storage (5TB at 24-hour) | $0.12/hr | $0.04/hr |
| Platform tax (SageMaker, etc.) | $0.45/hr | $0 |
| Support tier (production) | $0.30/hr | $0.15/hr |
| Subtotal hourly | $5.58/hr | $2.43/hr |
| At 70% utilization | $7.97/useful-hr | $3.47/useful-hr |
The headline hyperscaler GPU rate ($4.50) understated the real cost by 75%; the long-tail rate ($2.20) understated by 58%. Both ratios are typical.
The 2.5× spread on $/GPU-hour translates to a 2.3× spread on real $/useful-hour — slightly compressed because hyperscalers have better volume discounts and lower egress per GB at high scale, but still dominant.
Provider selection framework
Given the landscape, how to pick:
Step 1: Define your workload’s properties.
- Predictable utilization (likely use reserved) vs unpredictable (likely use on-demand or spot)
- Compliance requirements (forces hyperscaler tier in some cases)
- Multi-region needs (hyperscalers have widest footprint)
- Production user-facing (rules out spot/decentralized for serving)
- Tolerable for interruption (enables spot/decentralized for training/batch)
Step 2: Filter by hard constraints.
- Compliance bound? → Hyperscaler or specialty with relevant attestations
- Specific region? → Filter to providers with capacity there
- Specific GPU? → Filter to providers offering it (not all do)
Step 3: Compare three providers minimum.
- One hyperscaler (for SLA reference)
- One specialty provider
- One long-tail provider in your region
Get real quotes including all hidden costs, not just GPU $/hour. Compare like-for-like.
Step 4: For LLM inference workloads specifically, evaluate an alternative path. If you're renting GPUs primarily to serve LLM inference, consider whether Mercatus Spot Market — a separate product offering token-level API access to LLM models across providers (similar to OpenRouter) — fits your use case better than managing GPU infrastructure yourself. Spot Market is for token consumption, not GPU rental. For training workloads, custom architectures, or non-LLM inference (vision, audio), GPU rental is still the right answer; use the framework above.
When cloud rental beats ownership at scale
The own-vs-rent decision at cluster scale (50+ GPUs) is closer than most institutional buyers think.
Owned 100-H100 cluster effective cost (per 100-H100 cluster TCO analysis): ~$3.00–3.30/GPU-useful-hour at 70% utilization, before optimizations. With optimizations (cheap power, wholesale colo, OEM volume discounts): ~$2.30–2.50.
Reserved 3-year cloud (long-tail tier): $1.30–$1.80/GPU-hour reserved, plus egress, storage, support — landing at ~$1.50–$2.20/useful-hour realistically.
For most teams in 2026, reserved cloud capacity from long-tail providers is genuinely cheaper than owning — and dramatically simpler operationally. Owning starts to win at:
- Very high utilization (90%+)
- Very large scale (500+ GPUs)
- Compliance constraints that prevent cloud
- Cheap-power-access markets that cloud providers can’t pass through
- Capacity monetization plans (selling slack via Mercatus changes the math)
For the framework decision: Buy vs Rent GPUs. For the H200-specific version: H200 Buy vs Rent.
How cloud GPU pricing translates to token prices
Everything in this article is the supply-side economics of token-level pricing. When you pay $5/1M tokens for GPT-4o or $0.14/1M tokens for DeepSeek V3, those numbers reflect: the underlying GPU $/hour the provider pays (or owns at) → throughput per GPU for that model → markup, margin, and market structure.
The 2.5× cross-provider spread on the same H100 SKU directly drives the cross-provider spread on per-token prices for the same model. This is why Token Index publishes cleared prices across providers — and why Mercatus Spot Market routes to the best effective price automatically.
The 2.5× cross-provider spread on the same H100 SKU directly drives the cross-provider spread on per-token prices for the same model — providers running expensive H100 fleets pass cost through to per-token rates. Mercatus Token Index publishes cleared per-token prices across LLM API providers, the token-layer equivalent of GPU Index for hardware.
For the broader thesis on opening the market between supply and demand: The Open AI Compute Economy.
For day-to-day work:
- Per-GPU-hour comparison: Mercatus GPU Index tracks live cross-provider GPU pricing
- Per-token API pricing comparison: Mercatus Token Index tracks cleared LLM API prices across providers — separate market from GPU rental
- Sell idle GPU capacity: Become a Provider
- For data center / colocation context: Mercatus Compute Atlas
- Sell inference capacity as LLM API: Become a Provider (provider lists on Token Market as LLM inference seller — not as GPU lessor)
Frequently Asked Questions
Where do I find the cheapest GPU cloud pricing?
Long-tail and regional providers consistently offer the best per-GPU-hour rates: H100 on-demand at $1.99–$2.50/hour vs $3.50–$5.00 at hyperscalers. Mercatus GPU Index tracks live cross-provider pricing across 22+ providers in real time. Be sure to compare on real cost (including egress, storage, support) not just hourly compute rates.
Why do cloud GPU prices vary so much between providers?
The 2.5× cross-provider spread reflects sales overhead and margin structures more than infrastructure cost. Hyperscalers carry expensive enterprise sales motions and high shareholder margin expectations. Specialty providers run leaner. Long-tail providers run leanest. Same hardware, very different cost structures. Full analysis: Why GPU Prices Differ by 30%+.
Should I use on-demand, reserved, or spot pricing?
Mix them. Reserved capacity for predictable baseline workload (cuts cost 30–50%). On-demand for variable demand above baseline. Spot/preemptible for interruptible workloads (training, batch, async processing). Sophisticated deployments combining all three reach 40–60% below pure on-demand cost.
How much does egress cost on top of the GPU rate?
Egress costs 5–25% of total cloud spend depending on workload throughput. Hyperscalers typically charge $0.05–$0.12/GB egress; specialty providers $0.02–$0.08/GB; long-tail providers often charge minimally or nothing. For high-throughput inference workloads, egress alone can swing total cost 15%+ between providers.
Are hyperscalers ever the right choice for GPU compute?
Yes, in three scenarios: (1) you have existing cloud commitments to consume, (2) compliance certifications require their attestations, (3) you’re tightly integrated with their bundled services (S3, BigQuery, etc.) and the integration value exceeds the price premium. Outside these cases, hyperscaler GPU on-demand is structurally non-competitive.
What about decentralized GPU networks?
Decentralized networks (Akash, Bittensor, io.net) offer aggressive pricing through aggregated supply but with reliability tradeoffs. For interruption-tolerant workloads (batch processing, fine-tuning, async inference), they’re worth evaluating. For production user-facing serving, most teams stay with centralized providers.
Is reserved cloud capacity worth it?
For predictable workloads with 12+ months of expected usage, yes — reserved 3-year capacity cuts costs 35–50% vs on-demand. The threshold is roughly 40% sustained utilization of the reserved capacity. Above that, reserved wins. Below, you’re paying for capacity you don’t use.
Should I use Spot Market instead of renting GPUs?
Mercatus Spot Market is for LLM token consumption, not GPU rental — two different products for two different markets. If your use case is "I want to call GPT-4o / Claude / Llama via API and pay per token," Spot Market may be a fit (similar to OpenRouter). If your use case is "I want to rent H100s to train my own model or run custom inference," GPU rental is the right path — use GPU Index to compare providers. Don't confuse the two.
Can I sell my unused cloud GPU capacity?
Cloud-rented capacity generally cannot be re-sold (terms of service restrict it). This is one of the structural arguments for owning hardware: idle capacity in owned clusters can be monetized through Mercatus Provider listings, while cloud rentals are sunk cost when not in use. → Become a Provider if you operate owned GPU capacity.
Methodology
Pricing data sourced from Mercatus GPU Index, which tracks cloud GPU pricing across 22+ providers globally, refreshed daily. Spreads reflect May 2026 cross-provider snapshot. Provider tier categorizations (hyperscaler / specialty / long-tail / decentralized) reflect industry-standard market segmentation. Hidden cost percentages are derived from Mercatus aggregate analysis of typical AI infrastructure billing across institutional customers. Last verified: 2026-05-04.
