Wikipedia 14:41
Dec 21, 2024
14:41
Wikipedia 14:41
Mar 17, 2024
Grok-1 is finally out! But while everyone was focused on the weights, I decided to take a look at the tokenizer. I also added it to the Tokenizer Playground!
Structurally, it looks quite similar to the Llama 2 tokenizer (BPE w/ byte-fallback), with a vocabulary size of 2¹⁷ =…
X: @xenovacom 22:07
Grok-1 is now the highest quality open-source LLM
Grok's declared MMLU score of 73% beats Llama 2 70B’s 68.9% and Mixtral 8x7B’s 70.6%. At 314 billion parameters, xAI’s Grok-1 is significantly larger than today’s leading open-source model.
@xai's Grok-1 is a Mixture-of-Experts…
@xai's Grok-1 is a Mixture-of-Experts…
X: @artificialanlys 19:15
Link
https://huggingface.co/xai-org/grok-1
X: @clementdelangue 18:45
The largest ever open LLM, trained by a world class team, dropped by a magnet link. Apache 2.0. I wonder what it feels like to be out-opened by @grok.
314B, mixture of expert (2 out of 8 active). Even the active parameters only (86B) is more than the biggest Llama. Can’t wait to see the benchmarking results and what people build with it.
X: @drjimfan 18:26
Folk should stop whining about Grok being “big” yet so “bad”
It was their first model done on a brand new stack coded in JAX and Rust by a new and tiny team
Were any of your first models better?
StableLM alpha wasn’t great, now they are the best performing models of their size
Were any of your first models better?
StableLM alpha wasn’t great, now they are the best performing models of their size
X: @emostaque 17:53
Simonwillison.net 16:20
Few comments on Grok-1 code release in JAX!
https://github.com/xai-org/grok
Looking quickly:
- model nicely written
- partition rules for sharding follow the old style of t5x
- they used haiku but it wouldn't be too hard to update to flax
- they use shard_map on the MoE layers for…
Looking quickly:
- model nicely written
- partition rules for sharding follow the old style of t5x
- they used haiku but it wouldn't be too hard to update to flax
- they use shard_map on the MoE layers for…
X: @borisdayma 16:09
Sources
44 | |
10 | Threads |
6 | |
5 | Cryptopolitan |
4 | Fortune |
4 | Insider |
3 | Financial Times |
3 | Mashable |
3 | PCMag |
3 | |
3 | The Decoder |
3 | xAI |
3 | |
2 | Android Headlines |
2 | CNBC |
2 | CoinGape |
2 | Engadget |
2 | Metaverse Post |
2 | MSPoweruser |
2 | PYMNTS |
2 | Reuters |
2 | SamMobile |
2 | SlashGear |
2 | Stack Diary |
2 | TechCrunch |
2 | TechRadar |
2 | TechStartups |
2 | The Messenger |
2 | TweakTown |
2 | VentureBeat |
2 | Wikipedia |
1 | |
1 | Analytics India Magazine |
1 | Android Authority |
1 | BBC |
1 | Bitcoin Insider |
1 | Bloomberg |
1 | Business Today |
1 | China Tech News |
1 | CityAM |
1 | Coinspeaker |
1 | CoinStats |
1 | DatacenterDynamics |
1 | Decrypt |
1 | Firstpost |
1 | Forbes |
1 | Ghacks |
1 | Github |
1 | Hacker News |
1 | InfoWorld |
1 | International Business Times |
1 | iPhone in Canada Blog |
1 | Livemint |
1 | MarkTechPost |
1 | MediaNama |
1 | Metro.co.uk |
1 | MobileSyrup |
1 | MySmartPrice |
1 | NDTV |
1 | Neowin |
1 | New York Post |
1 | OpenAI |
1 | PhoneArena |
1 | Platformer |
1 | SiliconANGLE |
1 | Silicon Republic |
1 | Simonwillison.net |
1 | Slashdot |
1 | SPARROWS NEWS |
1 | Tech.co |
1 | TechEBlog |
1 | Tech in Asia |
1 | TechSpot |
1 | Tech Xplore |
1 | The Guardian |
1 | The Hill |
1 | The Information |
1 | The Verge |
1 | TIME |
1 | Tom's Guide |
1 | Tom's Hardware |
1 | Voicebot.ai |
1 | Wall Street Journal |
1 | Wccftech |
1 | WinBuzzer |