Anyone else following the commentary on Deepseek?

TechnologyAnyone else following the commentary on Deepseek?

Posted January 27, 2025 by m0RT_1 in STEM

Nvidia has seen almost $1 trillion wiped off its value and other tech giants have been hammered.

Copy & pasting some X commentary

First, some context:

Right now, training top AI models is INSANELY expensive. OpenAI, Anthropic, etc. spend $100M+ just on computing. They need massive data centers with thousands of $40K GPUs. It's like needing a whole power plant to run a factory.

DeepSeek just showed up and said "LOL what if we did this for $5M instead?" And they didn't just talk - they actually DID it. Their models match or beat GPT-4 and Claude on many tasks. The AI world is (as my teenagers say) shook.

How? They rethought everything from the ground up. Traditional AI is like writing every number with 32 decimal places. DeepSeek was like "what if we just used 8? It's still accurate enough!" Boom - 75% less memory needed.

Then there's their "multi-token" system. Normal AI reads like a first-grader: "The... cat... sat..." DeepSeek reads in whole phrases at once. 2x faster, 90% as accurate. When you're processing billions of words, this MATTERS.

But here's the really clever bit: They built an "expert system." Instead of one massive AI trying to know everything (like having one person be a doctor, lawyer, AND engineer), they have specialized experts that only wake up when needed.

Traditional models? All 1.8 trillion parameters active ALL THE TIME. DeepSeek? 671B total but only 37B active at once. It's like having a huge team but only calling in the experts you actually need for each task.

The results are mind-blowing:

Training cost: $100M → $5M
GPUs needed: 100,000 → 2,000
API costs: 95% cheaper
Can run on gaming GPUs instead of data center hardware

"But wait," you might say, "there must be a catch!" That's the wild part - it's all open source. Anyone can check their work. The code is public. The technical papers explain everything. It's not magic, just incredibly clever engineering.

Why does this matter? Because it breaks the model of "only huge tech companies can play in AI." You don't need a billion-dollar data center anymore. A few good GPUs might do it.

For Nvidia, this is scary. Their entire business model is built on selling super expensive GPUs with 90% margins. If everyone can suddenly do AI with regular gaming GPUs... well, you see the problem.

And here's the kicker: DeepSeek did this with a team of <200 people. Meanwhile, Meta has teams where the compensation alone exceeds DeepSeek's entire training budget... and their models aren't as good.

This is a classic disruption story: Incumbents optimize existing processes, while disruptors rethink the fundamental approach. DeepSeek asked "what if we just did this smarter instead of throwing more hardware at it?"

The implications are huge:

AI development becomes more accessible
Competition increases dramatically
The "moats" of big tech companies look more like puddles
Hardware requirements (and costs) plummet

Of course, giants like OpenAI and Anthropic won't stand still. They're probably already implementing these innovations. But the efficiency genie is out of the bottle - there's no going back to the "just throw more GPUs at it" approach.

Final thought: This feels like one of those moments we'll look back on as an inflection point. Like when PCs made mainframes less relevant, or when cloud computing changed everything.

AI is about to become a lot more accessible, and a lot less expensive. The question isn't if this will disrupt the current players, but how fast.

/end P.S. And yes, all this is available open source. You can literally try their models right now. We're living in wild times! 🚀 Momma, I'm going viral! No substack or gofundme to share but a few things to add/clarify:

1/ The DeepSeek app is not the same thing as the model. Apps are owned and operated by a Chinese corporation, the model itself is open source.

2/ Jevon's paradox is the counter argument. Thanks papa @satyanadella. Could be a mix shift in chip type, compute type, etc. but we're constrained by power and compute right now, not demand constrained.

3/ The techniques used are not ground breaking. It's the combination of them w/the relative model performance that is so exciting. These are common eng techniques that combined really fly in the face of more compute is the only answer for model performance. Compute is no longer a moat.

4/ Thanks to all for pointing out my NVIDIA market cap numbers miss and other nuances - will do better next time, coach. 🫡 • • •

8 comments

Sort by: Best ▾

CompassionateGoddess

December 23, 2024

Adorable! How’d you get the mane and tail to be curled and stay curled?

direct link
source

Turtlefuzz

December 23, 2024

Super-duper tight tension! Just pull the yarn really tight and keep the stitches really small. It makes it a little harder to get the hook in the stitch but you can't argue with the results :)

direct link
source

sensusquaeramthingmaker

December 22, 2024

The tail! So cute. ☺️

direct link
source

Femina

December 22, 2024

So cool! :D

direct link
source

Turtlefuzz

December 21, 2024

I made a phoenix for my older daughter, and my younger asked for a unicorn!

The pattern is from "Crochet Creatures of Myth and Legend" by Megan Lapp. I highly recommend it if you like making amigurumi 😊

direct link
source

RikkiTikkiTavi

December 22, 2024

Your mythical crochet projects are really awesome!
Nice work!

direct link
source

OneryBox

December 21, 2024

Those curls! How cute!

direct link
source

[Deleted]

December 21, 2024

Very cute- I love the curly mane and tail

direct link
source

Posted January 27, 2025 by m0RT_1

Score: 13

/o/STEM

7536 subscribers

Created August 28, 2020

Welcome to /o/STEM! This circle is dedicated to science, technology, engineering, maths, and medicine, where you can discuss all things STEM, seek career advice, share learning resources, and pose “ask an expert” questions to the Ovarit STEM community.

Please provide a descriptive title for your posts and add a flair that best describes your post topic. We encourage posters to add links to relevant articles or research papers in their posts and comments.

Since our focus is on STEM, posts should not be focused on politics, ethics, or philosophy. Any discussion on popular science or medicine (to include COVID-19) must be based on scientific principles and research.

This is also not the place to discuss diet, lifestyle, alternative medicine, or psychology. For these topics please check out o/women which is a general discussion circle and o/womenshealthlounge which is a support sub for health issues.

No conspiracy theories, no quackery, no spamming, and no politically charged posts. [The Sitewide Rules] (https://www.ovarit.com/wiki/rules) and [Sitewide Guidelines] (https://www.ovarit.com/wiki/guidelines) are both enforced here.

Any posts that fail to meet the criteria provided above may be removed.

Thank you and enjoy!

Moderators