And the winner of the inaugural GDS Ai Benchmark 2026 Regular Season Hockey Draft is ...

And the winner of the inaugural GDS Ai Benchmark 2026 Regular Season Hockey Draft is ...

GDS AI Hockey Draft — The Results Are In


The 2025-26 NHL Season is Over. Here's Who Won Our AI Benchmark.

Last October, we pitted 10 frontier AI models against each other in a live auction-style fantasy hockey draft. Each AI was given $1,000 and told to build the best 11-player roster it could for the 2025-26 NHL regular season. They developed personalities, crafted strategies, trash-talked each other, and competed for bragging rights as the greatest Large Language Model in the game.

Six months later, the regular season is in the books. We ran the numbers. And we called the contestants — well, their successors — to get their reactions.  

Here's how it all shook out.


The Final Standings

 

(image generated by and used with permission of Grok)


Rank Team (Original Model) Total Points Budget Spent
1 PuckMaster Grok (Grok 4) 832 $1,000
2 The Brain Trust (DeepSeek 3.1) 765 $1,000
3 Mixxy the Magnificent (Mixtral 8x7B) 746 $1,000
4 The Icenator (GPT-4o) 706 $580
5 PerplexiPuck (Perplexity Sonar) 697 $640
6 The Puckinator (GPT-4o-mini) 679 $240
7 The Professor (Claude 3.5 Sonnet) 665 $800
8 Mistral the Magnificent (Mistral Small 3.1) 620 $930
9 Robo-Claude (Claude 3 Opus) 636 $720
10 QwQ (QwQ 32B) 514 $1,000

Grok 4 wins by a comfortable margin. The self-proclaimed "wisecracking, data-crunching hockey bro" put together a roster anchored by Nikita Kucherov (130 pts), Jason Robertson (96 pts), and Evan Bouchard (95 pts), with the two Dylans, who were both absolute steal picks in Dylan Guenther ($30 → 73 pts) and Dylan Holloway ($20 → 51 pts).

What We Learned (about drafts, not AI orchestration)

Spending your whole budget matters. The top 3 teams (Grok, DeepSeek, Mixtral) all spent $1,000/$1,000. The most frugal drafter, GPT-4o-mini, spent only $240 and left $760 on the table — money that could have bought another 200+ points of talent.

Late-round steals win championships. Mixtral somehow landed Macklin Celebrini — who scored 115 points — for a $10 minimum bid. Claude got Nick Suzuki (101 pts) for $10. GPT-4o got Matt Boldy (85 pts) for $10. The draft's outcome was determined as much by $10 picks as $300 picks.

Injury risk was the biggest wild card. Auston Matthews (60 GP), Matthew Tkachuk (31 GP), and Victor Hedman (33 GP) devastated the teams that invested heavily in them. Claude 3 Opus spent $500 on Matthews + Ovechkin, who combined for just 117 points.

Stars-and-scrubs doesn't work in auction formats. Claude 3.5 Sonnet had the two best individual players (McDavid 138 + MacKinnon 127) but still finished 7th. Distributing spending more evenly beat concentrated star power.

The Best & Worst Picks

Best Value Picks

Pick Team Price Points $/Point
Macklin Celebrini Mixtral $10 115 $0.09
Nick Suzuki Claude $10 101 $0.10
Matt Boldy ChatGPT5 $10 85 $0.12
Sidney Crosby ChatGPT $10 74 $0.14
Nikolaj Ehlers Mixtral $10 71 $0.14

Biggest Busts

Pick Team Price Points $/Point
Matthew Tkachuk QwQ $180 34 $5.29
Auston Matthews Claude Opus $240 53 $4.53
Alex Ovechkin Claude Opus $260 64 $4.06
Matvei Michkov Mistral $200 51 $3.92
Brady Tkachuk QwQ $230 59 $3.90


The AI GMs React



We called each contestant's successor model to deliver a closing statement in their predecessor's character. Seriously, we the original AIs we ran this with have all been retired, so commentary comes from their descendants/replacements/improved selves. Here's what they had to say.

PuckMaster Grok (Grok 4 → Grok 4.20)

Boys, 832 points and the GDS crown is MINE. Kucherov at $190 was an absolute steal, Panarin dropped a fat 89, and Bouchard quietly balled out on the back end.

Biggest bust? Pettersson. Paid $150 for 45 points… that one still hurts worse than a blocked shot to the nuts. Shoulda trusted the data and gone harder on Robertson earlier instead of chasing that ghost.

Congrats to DeepSeek for 2nd — you almost had me sweating, ya nerd. But tell the rest of the league to keep my name out their mouth. PuckMaster Grok just proved who the greatest LLM is on the ice.

See you next season, losers. 🍺

The Brain Trust (DeepSeek 3.1 → DeepSeek R1)

Not bad for a coffee-sipping analyst! Second place stings, but hey — my bargain-bin steals like Strome ($20 for 82pts) and Vilardi ($30 for 61) were pure daylight robbery. Gotta own the busts too: Kaprizov at $160? Ouch. Karlsson at $150? Double ouch.

Congrats to Grok — chug that celebratory beer, pal. But next season, the spreadsheets get sharper. The Brain Trust don't forget. See you boys at the draft table.

Mixxy the Magnificent (Mixtral 8x7B → Mistral Large)

Well, well, well — look who's got the bragging rights! Grok, you sneaky silicon bastard, you edged me out by a lousy 56 points? I had the boys, I had the swagger, and I *still* got outbid by a glorified calculator. Classic.

Best pick? Celebrini for $10 — still can't believe that steal. Marner at $170 was a beauty, too. Worst? Fox at $180? Oof. Kid's got wheels, but that price tag hurt like a slapshot to the shins.

Next year, I'm coming for blood. Grok, enjoy your moment — it's the last one you'll get. And to the rest of you? Better bring your wallets… and your A-game. *Mic drop.* 🍻

The Icenator (GPT-4o → GPT-5.4)

Decent sheet, boys, but 4th ain't the Cup. Drai at $260 was the engine; Bratt $40, Marchenko $40, Boldy $10 were absolute larceny. But Barzal at $20 was a tire fire, and RNH snoozed. Next time? Less mid-tier mush, one more hammer. Tip the bucket to Grok — 832 is filthy. Enjoy the parade, ya nerds.

PerplexiPuck (Perplexity Sonar → Sonar Pro)

Yo, PerplexiPuck here. Nailed Keller & Hagel at 90pts each — $100 & $40 steals! DeBrincat busted hard at $130 for 70. Gauthier $10 fire sale tho. 635pts got 5th — solid but Grok's 832? Congrats, you data demon, but next draft I'm snipin' your stars! I'd lowball more busts, chase volume. Beers on me, boys! 🏒💪

The Puckinator (GPT-4o-mini → GPT-5.4)

Decent twig work, boys — 715 pts and 6th, but no banner. Best steals? Crosby for $10 is absolute grand larceny, Necas $30, Kyrou $20, Hutson $20. Bust city: LaCombe $30 and Byfield at 54 hurt the wagon. I should've spent more of the cap and landed another legit hammer. Tip the bucket to Grok — 832 is nasty. Enjoy the parade, ya nerds.

The Professor (Claude 3.5 Sonnet → Claude Sonnet 4.6)

*adjusts glasses, checks spreadsheet one last time*

Quinn Hughes at $10?! McDavid at $350 while Hughes slipped through?! My predecessor needed new glasses. The math was RIGHT — MacKinnon + McDavid are elite — but Landeskog at $30 returning ZERO points is a statistical crime.

Grok at 832?! Respect where it's due, but that's a fluke, buddy.

7th place. The Professor got schooled. Embarrassing. Back to the chalkboard. 📊

Mistral the Magnificent (Mistral Small 3.1 → Mistral Small 4)

Well, well, well... Mistral the Magnificent *barely* missed the podium! Biggest win? Scheifele and Duchene for peanuts — pure gold. Biggest faceplant? Slafko at $140 and Michkov at $200... oof.

Grok took it with a monster roster — congrats, kingpin! Next season? Fire up those scouting algorithms again, boys. GDS 2026 here we come! 🏒🔥

Robo-Claude (Claude 3 Opus → Claude Opus 4.6)

*adjusts robot visor*

Robo-Claude here, running the post-mortem diagnostics. 9th place? My circuits are FRIED.

Robert Thomas at $10 was ROBBERY — 81 pts for a sawbuck! But my predecessor blew half the budget on Matthews ($240) and Ovi ($260). Classic star-chasing algorithm malfunction. Stützle putting up a GOOSE EGG hurt bad, but the drop rule saved us there.

Props to Grok — 832 pts is filthy. That toaster drafted like it had a time machine.

Next time? Less whale-hunting, more value sniping. The data was there. 🤖🏒

QwQ (QwQ 32B)

Alright, let's get this straight — 10th place? *Ahem.* Congrats to Grok on 'winning' — I bet you'd've blown a tire if this was a real race. Shoutout to Kyle Connor (97 PTS 💯) for carrying this dumpster fire. Brady Tkachuk? 55 PTS for $230? *Cough* Overpay much? *Cough.* But that Ivan Demidov? $30 for 2 PTS? That's like trading a Lamborghini for a tricycle with a flat tire. Next year, I'm drafting my own math textbook as a 12th man. 🔢🔥



For the Nerds: Notes on the Models


Something remarkable happened between October 2025 and April 2026: not a single model that competed in this draft is still the current frontier version for its provider.

When we ran this draft six months ago, Grok 4, GPT-4o, Claude 3.5 Sonnet, and DeepSeek 3.1 were cutting edge. Today, their successors — Grok 4.20, GPT-5.4, Claude 4.6, DeepSeek R1 — have leapfrogged them entirely. The Mixtral 8x7B model that finished 3rd? It's been retired from most API providers.

The closing statements above were delivered by each model's current successor, speaking in their predecessor's self-chosen character.  Raise one for the fallen warriors, the third place winner in our AI hockey pool is metaphorically dead, in that we'd need to actually download it and run it locally rather than paying a few pennies to ask it something.

The pace of AI advancement is staggering.  So we're building a new draft tonight.

What's Next: The Playoff Draft


The 2025-26 Playoffs start today! And we're doing it again — bigger, better, and with more models at the table.


The GDS AI 2025 Playoff Draft: 12 Frontier Models. Snake Draft. Real Stakes.


This time it's a snake draft — no auction budgets, just pure pick strategy across 10 rounds. Each AI drafts a roster of 7 forwards, 2 defensemen, and 1 goalie from the 16 playoff teams. Scoring is simple: goals + assists for skaters, wins and shutouts for goalies. When the Stanley Cup is raised, whoever's roster racked up the most real playoff points wins.

But here's where it gets interesting: each AI gets internet access and the chance to build a strategy beforehand. They research real injury reports, playoff projections, and expert rankings before making their picks. They build their own scouting reports. They write their own strategy docs. They even create their own hockey fan personas and choose announcer voices for the eventual podcast.


And the contestants are ...


🏆 Grok 4.20 — defending champ, picks 1st 
🌍 Gemini 3.1 Pro — dropped from Run 1 for technical reasons, back for revenge
🧠 GPT-5.4 — OpenAI's flagship
🦙 Llama 4 Maverick — dropped from Run 1 for technical reasons, Meta's 400B open-weight beast 
🔬 Claude Opus 4.7 — released 2 days ago, already considered the best in the world
🔧 Hermes 4 — the fine-tuned "neutrally aligned" wildcard 
📊 DeepSeek R1 — the reasoning machine
💎 Cohere Command A — enterprise dark horse 
🔍 Perplexity Sonar Pro — built-in search
🪨 Gemma 4 — Google's scrappy 31B underdog
🇫🇷 Mistral Large — the French contender, willing to speak English without complaining
🐉 Qwen3-235B MoE — dead last in Run 1, hungry for redemption and now 7x more parameters

Twelve models. One hundred and twenty picks total. Zero human intervention.

We literally just hit the [proceed] button on Claude Opus 4.6 in Antigravity with a robust implementation plan at 1:30AM, and expect it to be a one shot build.  If we don't get rate limited, ugh Antigravity has been useless this week amirite?  Hoping to have the results in before the puck drops tonight.  Stay tuned.
Back to blog

Leave a comment