Volatility in the GPU rental market
A model for studying AI compute bubbles
Very few organizations have an accurate model of pricing and trends in the high-grade GPU provider market (and demand side). It doesn’t help that the market landscape can be murky. There aren’t clear demarcations between each category of player — is CoreWeave a neocloud, or a hyperscaler like they claim to be? What’s the difference, really? — and incentive structures, sources, and money flow are hidden, especially to outsiders. As a customer, when you shop for a provider, what are you really paying for beyond just hardware access? Is this provider just a reseller — which providers are really just brokers, third-parties, or middlemen? How many sides to this market are there?
The last three years have seen large swings in supply and demand for AI infrastructure. A flood of new H100s — initially mispriced due to delays in the market and overpromises — kicked off a long price decline, with further downward pressure in going-price for AI GPU rentals caused by significant changes in GPU-use paradigms from innovations in AI. Market corrections coupled with second- and third-order market reactions lead to remarkable price volatility for GPUs. While long-term contracts are still the norm among large players, data for the average price for, say, a one-year contract, is not transparent and can vary wildly from provider to provider, and over the last few years, many buyers lost money on those contracts because the initial per-hour price they committed to paying ended up being much higher than the on-demand value.
Trends in both AI and hardware innovation are hard to predict in the short to medium term. While we can point to trends like Moore’s Law or even semiconductor fabrication and component bottlenecks, it is not possible to directly map this to when new chips will ship, their supply in data centers when they do, or the progress of other features like connectivity and throughput for a particular chip, as long as NVIDIA remains the only major GPU provider for high-grade AI use.
Analyzing the types of providers and how they function is important to understanding future trends in price volatility. For example, certain sellers make money on commission or kickbacks from sourcing from other providers that actually own their own GPUs, or make a profit on the money they charge at face value (renting from another provider and selling at an upside). In many of these cases, they differentiate by software, like Modal or Together AI, with smooth UX and deployment frameworks for developers. But this makes it hard to tell what physical GPUs there actually are in the market, and perhaps makes the seller market seem bigger than it actually is.
I think1 a better notation is seeing most providers (neoclouds and data centers) as a combination of banks and real estate companies.
Providers as banks
For neoclouds that almost only sell contracts (as opposed to on-demand sale), their incentives are to decrease their own risk by either
Only taking on customers that are not likely to default on their contracts (in this case, the cloud loans GPUs to a buyer over a required period of time, and can also charge a premium on high availability guarantees, maximum cluster size, or length of contract)
Or by requiring a significant amount of the contract to be prepaid (in which case they can charge a premium for serving riskier customers).
This, of course, then means that proportionally, whatever risk they decrease for themselves (spread out over multiple customers) is then transferred to the customer, who is responsible for maintaining solvency for the length of the contract (and understanding the risk that the contract per-hour price rises above market value). And because the total market is still not yet very large, mature, or liquid,2 the entire supply market could be faced with problems when customers at severe risk now want to resell their contracts.3
Providers as real estate companies
A second alternative is that we envision data centers4 as real estate companies, who effectively take out a loan to set up AI infrastructure (the real estate, in this case). They must be able to lease their inventory effectively for the lifecycle of the hardware to meet the original cost. Hence, the long leases are ideal for providers, because they decrease their risk of being able to sell for a long period of time.
Current incentives, though, may work against the effectiveness of this dynamic. Excess investments in AI infrastructure become, essentially, risk underwriting by VCs and LPs, as well as strong “borrowing” and speculation5 in the market at low rates. Providers then use this excess initial capital to enable them to bid lower and lower on contracts, at either a margin cut or a cut into their hardware expense at a loss (or at no loss in the case of Voltage Park). This will eventually have to stop. What happens when providers are no longer able to shoulder the cost of customer acquisition? NVIDIA has a very large incentive6 to be friendly and sell lots of hardware at low cost to their data center and cloud partners, which further drives overall prices down. The cost of hardware will keep increasing. The real estate equivalent to this is city-subsidized housing development and other mechanisms to encourage residents to move to a particular location, which can cause housing bubbles if not properly managed. The GPU bubble seems to grow via easy speculation, but long contract lengths make it much slower for market dynamics to become evident. At what point will the bubble pop again?
So, as we’ve seen, providers are best seen as banks or real estate companies, which gives us a better framework to understand potential speculative bubbles and evaluate flows of money in the market. As we’ve already demonstrated, easy resale of contracts may cause further crashes before long, particularly with an influx of new GPUs soon hitting most providers before the end of the year, combined with several shipment delays reported in the new Blackwell chip generation; according to Jensen Huang, “When Blackwells start shipping in volume, you couldn’t even give Hoppers away,” and further reports have supported that the rental price of a chip for a particular provider drops significantly as soon as the provider obtains a new generation. Of course, if you, as a customer, can lock in a short-term contract you’re in a winning position, because you are less subject to natural hardware cycles — you’ll get far more bang for your buck switching to the newest NVIDIA generation when it comes out. The market is not nearly stable enough.
Other GPU cost volatility concerns
Right now, lots of clouds are leaning towards prioritizing the inference market, particularly those differentiated by software. Making it easy to deploy and finetune models has served them well in the last couple years, and offers an alternative to purely selling training contracts. But what happens if inference-time scaling doesn’t lead to meaningful economic output, or it is replaced by a different paradigm? Overvaluing the impact of purely inference-time scaling could be detrimental, there is a change the reasoning paradigm boom does not last.
In particular, anyone sitting on compute needs lots of demand in order to make inference worth it from a per-token perspective, which makes the economics fundamentally different from training. Your throughput is limited by the number of batches you can execute in one go. Thus, inference providers take on the risk that demand could be low, and can charge a premium for taking on that risk. But a single buyer is better off switching to running inference themselves if they can adequately increase their expected demand value, in which case it could be better to switch back to contract-based sales at low rates.
We’re also slowly moving towards higher demand for high-latency requests — see Deep Research and other reasoning-high tasks. What does this mean if many companies can profit from long-running tasks that don’t have to be done quickly? Will inference economics diverge from the current paradigm, perhaps back to contracts of varying timeframes, or some other instrument altogether? DeepSeek’s off-peak usage pricing is a pioneering attempt at encouraging a different financial dynamic.
Additionally, there are significant limitations to scaling inference-time compute as a viable pathway to economic-scale AI usefulness; there remains a need for better causal and symbolic reasoning, reasoning may have a sharp efficiency vs. accuracy tradeoff, and there are diminishing returns with increased complexity.
No one can deny excess investment in AI, but particularly AI infrastructure. If VCs are underwriting the risk and profit of an AI infrastructure company, when both capex and risk are so huge, how long will it take for this bubble to burst? The lag time in materialization could mean this could take anywhere from months to a couple of years from now.
Escalating model training costs price out competition; by some estimates, at a rate of 2.4x per year since 2016. Soon, only top AI frontier companies might be able to afford it. The market demand could shrink. Only four companies — Amazon, Apple, Meta, and Microsoft — are projected to spend $315 billion in AI infrastructure, not including additional investments like the $500 billion Stargate project. Goldman Sachs estimates $1 trillion to be spent on AI infrastructure over the next few years; by some reports, too much compared to the projected economic value of AI over that same time period. Allegedly, Chinese data centers have already begun to feel the effect of lower-than-expected demand, with many facilities sitting nearly completely idle or shut down. Regardless of if you are a skeptic or not, it is undeniable that the results are far too unpredictable to rely on a highly reactive AI infrastructure market.
And finally, what happens if and when AMD becomes a viable enough competitor for NVIDIA, at least on inference, and cheaper? Inference has significantly less hardware requirements than training, and AMD’s strategy of tackling the inference market could prove to be fruitful.
A short note on software AI infrastructure businesses
My personal prediction is that the hardware will eat software.7 If you as a provider serve bare metal (or even VM), you are no longer limited by software, its capabilities, or the load of maintaining software. You can focus on the true bottleneck, the hardware and its cost, which is likely to be much more profitable in the long term.
As we reach economies of scale with AI, the margin potential on AI infrastructure will shrink. It will be far more cost-effective to hit bare metal directly, because the large AI-native firms of the future will need to abstract the software away themselves anyways.
Software abstractions in AI infrastructure are good devtools, but nothing more. The moat is shrinking — how many versions of Python script-to-deployment pipeline SaaSes can exist? — and software AI infra companies will be able to charge less and less through competition. I liken them to thin GPT wrappers, and their economic value compared to frontier labs and model training. We’ll see a scale of usefulness, but not if they remain only devtools. They’ll have their place in the market of the future, but will serve an ever shrinking market share of customers. They will have to find another way to differentiate. AI infra software companies will not be the SaaS companies of today anymore.
As first put into words by Evan Conrad of SF Compute.
“Large” in this case refers to the number of players, not necessarily volume. The market is very fractured, and no one seems to be able to agree on a price, value for features, or even pricing structure.
We saw this in the last two years when massive resale of unused compute caused downward pressure on the on-demand price of an H100.
For the record, data centers are the only actual GPU holders in the market, although clouds may also on data centers.
Speculation here refers to the belief that demand for AI will keep increasing (thus, an overvaluation of AI), and “borrowing” is in quotes because I’m referring to borrowing on equity, i.e. VC funding.
NVIDIA wants to avoid competing with their biggest customers, and in fact, Jensen Huang has stated that he does not intend to become a cloud provider. NVIDIA’s $100 million investment into CoreWeave emphasizes this fact — ”In some ways, CoreWeave exists because NVIDIA wanted it to exist.”
My opinions here are not fully fleshed out, but they are my opinions only!

