I’ve read some of Ed Zitron’s long posts on why the AI industry is a bubble that will never be profitable (and will bring down a lot of companies and investors), and one of the recurring themes is that the AI companies are trying to capture growing market share in an industry where their marginal profits are still negative, and that any increase in revenue necessarily increases their costs of providing their services.
But some of the comments in various HackerNews threads are dismissive, saying that each new generation of models makes the cost of inference lower, so that with sufficient customer volume, the companies running the models can make enough profit on inference to make up for the staggering up-front capital expenditures it took to build out the data centers, train their models, etc.
It’s all pretty confusing to me. So for those of you who are familiar with the industry, I have several questions:
- Is the cost of running any given pretrained model going down, for specific models? Are there hardware and software improvements that make it cheaper to run those models, despite the model itself not changing?
- Is the cost of performing a particular task at a particular quality level going down, through releases of newer models of similar performance (i.e., a smaller model of the current generation performing similarly to a bigger model of the previous generation, such that the cost is now cheaper)?
- Is the cost of running the largest flagship frontier models going down for any given task? Or does running the cutting edge show-off tasks keep increasing in cost, but where the companies argue that the improvement in performance is worth the cost increase?
I suspect that the reason why the discussion around this is so muddled online is because the answers are different depending on which of the 3 questions is meant by “is running an AI model getting cheaper over time?” And the data isn’t easy to synthesize because each model has different token prices and different number of tokens per query.
But I wanted to hear from people who are knowledgeable about these topics.


Well, I wonder if the frontier ends up looking like supersonic commercial flight (prohibitively expensive so that there wasn’t enough of a market for consumers at the actual cost of providing the service): technology that continues to exist but never really gets used, because the alternatives that aren’t as good are still much, much cheaper.
Not everyone needs a Lamborghini or Concorde to get where they are going.
Work is pushing us to use cloud models and I haven’t had time to experiment more than a few limited tests. Qwen 3.6 ~30B Q4 runs pretty well on 36GB of ram. It’s a very capable model. It did choke when I tried to connect Cline to it for Java dev. But when I just conversationally ask to write python scripts it works pretty well.
I can see a future where a goodly amount of ram and an AI chip can produce the results we are currently getting only from cloud models.
I agree with that. Still, Lamborghinis are still being built, operated, and maintained, while Concordes are not.
I’m wondering whether the future of AI looks like the last 50 years of aviation, where there aren’t that many generational advances because the cost of developing new stuff becomes prohibitively expensive, but where the commoditization of what has already been invented makes it so that the experience for the average person really isn’t that different between 2026 and 1976, where the sweet spot for cost effectiveness isn’t at the bleeding edge at all.
And for my own curiosity on this line of thinking, I wanted to know whether the day-to-day cost of running these models is going down, and in which contexts.