

See: Facebook.


See: Facebook.


And to be clear, you need 3090 + at least 96GB of fast CPU RAM (really 128GB) to run Deepseek Flash coherently. It is a big model; there’s no way around it.
If you have less RAM, try Qwen 27B now (which also uses an exotic attention mechanism). It’ll fit on your 3090 just fine.
For DeepSeek Pro, you’d need a Xeon or EPYC homelab.


I view it differently.
In the US, there are either megacorps, or “people in garages” which honestly don’t have resources and stuff like legal support to do huge innovations. They publish cool papers, which never get implemented because they don’t have $200k+ for a bigger test, and can’t work on it themselves for a living. Any “garage devs” who get too big, get smited or amalgamated into Big Tech gray goo, and whatever was interesting gets lost in oblivion.
There’s no cooperation, no sharing, either.
And OpenAI/Anthropic are way more conservative than you’d think. Same with Meta; they want results next quarter. Zuckerburg literally fired the whole Llama team, which put meta on the AI map and basically founded the open weights space, when they had one failed experiment. In other words, I’d argue clueless billionaires and the Tech Bro acolytes surrounding them are poisoning LLM development, and it’s starting to catch up.
In China, things are different. The GPU sanctions forced these gigantic companies like Alibaba or Tencent to be compute-thrifty, but they all seem to have access to suspiciously good training data… I would be the Chinese govt is helping them under the table. Chinese devs also have an interesting attitude; I would characterize them as “cooperative,” with lots of private forum sharing going on, most models being open-weights, and clearly not a lot of desire to censor their models for the government. But they have their own forms of dysfunction too, sometimes by copying other firms a little to closely, or corporate/personal drama like anywhere.


Okay, I fudged the part about “for free.” The problem is DeepSeekv4 is literally in preview, and its architecture is so new that engine support for its weights is poor.
Right this second, you can either pay a few cents to try it from some API (there are many providers since its open weights), or rent a GPU (or maybe a CPU) instance if you don’t trust the public tests, and actually want to test resource usage yourself.
Or you can quantize it and self host it. I plan to do so on my 128GB RAM/RTX 3090 desktop, which is a affordable config to rent if you don’t have a desktop like that.
But llama.cpp support is a work-in-progress. Same with other backends like Ktransformers. Realistically your options are:
Wait a week, maybe a few weeks, for the llama.cpp/ik_llama.cpp developers to implement to DSV4 architecture.
Try one of the janky GPU/Apple forks availible right now.
Try one of the slightly-less-janky, but slow CPU-only chinese forks.
But once its implemented, I’m going to make my own personal IQ3_KS mixed quantization for 128G desktops, and see how it compares to older architectures myself.
Another confounding factor is, if you’re researching “AI farm inference costs,” thats very different.
Frugal providers like Deepseek use complicated schemes to batch requests over many GPUs, with each taking requests in parallel. In other words, the more GPUs they have, the more speed per GPU they can squeeze out. For DeepseekV3, last I heard, Around 300 GPUs or so was an ideal deployment number…
And they aren’t even going to be using Nvidia GPUs anyway. I believe Deepseek is switching to Huawei for inference.
But however you slice it, they’re using order of magnitudes fewer resources than Tech Bro providers like OpenAI or Grok. They have been, for over a year.
Even nonreligious, science-inclined family I have says, basically, “Geoengineering will fix it, so best to maximize economic output now.”
I believe this came from an WSJ opinion column.


Yes.
It’s dropping, dramatically.
Look at the history of open and closed releases, on benchmarks that aren’t totally gamed, and it’s easy to see. LLM capabilities are plateauing, and bigger models are getting more and more niche.
But inference efficiency is increasing exponentially. Tiny models are getting closer and closer to frontier ones. See: Qwen 27B, and how it can do most of what mega models did just months ago.
And there’s tons of unpicked efficiency fruit in papers. Bitnet is the big one, but I’ve seen dozens of proof of concepts, just yet to be tried in a production model, that are dramatic efficiency boosts.


Also, on Deepseek V4… you can run it yourself, for free. There’s no mystery. And there’s tons of benchmarks out there already.
It’s indeed very efficient if you’re into long context. But at shorter context lengths, it’s not too different than Deepseek’s previous releases (and the flood of MoE models that have come since then).


TurboQuant is total baloney.
It’s just KV cache quantization, and we’ve had all sorts of that for ages. Backends, not just papers, have had 4-bit cache with hadamard rotation (a major component of TurboQuant), and very low loss, since like 2023.
We’ve had proof that Bitnet works for over a year.
And no one cares. No one uses that kind of quantization because it reduces batched throughput, just like TurboQuant.
Besides, new architectures (like DeepSeek V4) render it obsolete, as they don’t use traditional KV cache anymore. I honestly have no idea how TurboQuant became such a meme, other than major astroturfing.
TL;DR All AI news is total bull. It’s chum for investors.
You need to look at what the engines, papers and actual LLM weight architectures are doing.


I would bet my shoes Facebook or someone lobbied for this.
It’s easy to blame Mormons, but I think that bloc was more of a mark.


All three are banned in China, but lemmy.world can be read in China.
That is amazing.
That simple situation just says so much.
Yeah, well, science also says Twitter is an information black hole, and few that need to see this will see it.
I know it’s microblog memes, but still. It’s surprising to me how much scientist believers post about the state of the world… on Twitter. It’d be like going into a Nazi bar in Berlin to warn about the Holocaust, or to criticize Catholicism in the Vatican, and thinking you’re somehow in an open forum.
…At least it isn’t a blue checkmart Tweeter, though.
My guess: the “primitive” go-back-to-manly-cavemen manosphere stuff.
It sounds like some fantasy of what “should” happen to a pre-civilization woman.
Mix it up with ovulation, and I bet there’s a distant grain of truth there. If I were an isolated guy listening to podcast bros all day, I might even buy it.


GOG’S AI take
What happened, exactly?
All I can find is someone used an AI image for some kind of marketing.


I mean, yes.
Steam is a scary monopoly, getting scarier.
It’s not their fault the industry (minus GOG) comitted mass seppuku.
Both can be true. One can worry about Valve, and use them hesitantly, while laughing at everything else like it’s a cartoon.


It completely depends what you use your computer for.
For example, do you game? DRM free or no, and where are they installed? On a seperate drive?
What about work stuff? Media? The larger question I’m getting at is “how much of what you do is portable, and easy to just plop on a USB stick, reinstall from the internet, or just leave on a second drive already in your desktop?”


Partially configured some parts via LLM but please don’t crucify me for that.
Slap in a spare GPU, and self-host one!
The 30B-class models are unbelievably good now, for being so small. They’re kinda where Claude was like a year ago, if not less. And (with the right backend) they aren’t expensive to host.


Better yet, download Qwen 3.5/3.6, with a “raw” notepad like Mikupad. Try it yourself:
https://huggingface.co/ubergarm/Qwen3.6-27B-GGUF
https://github.com/lmg-anon/mikupad
One might observe:
Chat formating, and how janky the “thinking” block is.
How words are broken up into tokens, not characters.
How particularly funky that gets with numbers.
Precisely how sampling “randomizes” the answers by visualizing “all possible answers” with the logprobs display.
And, thus, precisely how and why carb counting in ChatGPT fails, yet a measly local LLM on a desktop/phone could get it right with a little tooling or adjustment.
This is exactly what OpenAI/Anthropic don’t want you to do. They want users dumb and tethered, like a cloud subscription or social media platform. Not cognizant of how tools they are peddling as magic lamps actually work. And why, and how, they’re often stupid.


See, my Windows partition starts instantly. TBH its faster than linux, which takes an extra second to initialize SDDM, and then network connectivity.
…Perhaps because its so neutered. It’s not really a fair comparison, as Windows is a narrow-focus OS for me, a tool for running things, to the point I don’t trust it for anything security sensitive.


Perhaps they are talking handhelds, specifically?
Look. I am the biggest, most shameless CachyOS fanboy you will find. It’s like 90% of my desktop time, has been for years.
But I’ve benchmarked a few games on Windows and Linux, Proton and native, sparsely, and Windows still has an advantage, sometimes. Cyberpunk 2077 was the biggest outlier for Proton (eg faster on Windows, enough to visibly affect settings I can manage on my 3090).
And many native ports are still truly awful. Often where performance equates to simulation time, like modded Stellaris or Rimworld.
Mind you, that’s not always the case. Proton is faster in many games, and (for example) anything Java like Minecraft or Starsector are just hilariously faster on Linux.
The caveats:
My Windows 11 is neutered to hell. It’s a barren wasteland. Even Defender is disabled.
I’m running Nvidia.
Some of my testing is aging now.
Still, I am a Linux shill, and think the headline is a bit dramatic. Stripped Windows is still faster in plenty of realistic scenarios.
Since they’re referencing SteamOS, they’re probably talking about stock mobile systems, where the overhead from that mountain of background junk in Windows is much more painful.
You’re much, much better off buying a GoG digital copy if they sell it, and backing the installer up.
Physical disks have DRM, and most of my old physical disks are useless now. New ones are likely even worse.