Is AI inference getting cheaper or more expensive over time?

GamingChairModel@lemmy.world · 2 days ago

Is AI inference getting cheaper or more expensive over time?

brucethemoose@lemmy.world · edit-2 1 day ago

TurboQuant is total baloney.

It’s just KV cache quantization, and we’ve had all sorts of that for ages. Backends, not just papers, have had 4-bit cache with hadamard rotation (a major component of TurboQuant), and very low loss, since like 2023.

We’ve had proof that Bitnet works for over a year.

And no one cares. No one uses that kind of quantization because it reduces batched throughput, just like TurboQuant.

Besides, new architectures (like DeepSeek V4) render it obsolete, as they don’t use traditional KV cache anymore. I honestly have no idea how TurboQuant became such a meme, other than major astroturfing.

TL;DR All AI news is total bull. It’s chum for investors.

You need to look at what the engines, papers and actual LLM weight architectures are doing.