Foundation Models & LLMs
Large language models, training at scale, architecture innovations, benchmarks
The core technology layer — who is building the best models and how fast they improve
In the Foundation Models & LLMs sector, the most critical development right now is the rapid advancement of scaling laws and multimodal capabilities, driven by efficiency gains from key players like Stanford and Google. With 117 out of 240 expert stances supporting ongoing innovations, models are achieving significant performance improvements, as evidenced by recent papers, but this progress is tempered by growing concerns over environmental sustainability and safety risks. This shift underscores a high-activity period, with new benchmarks and experiments pushing the boundaries of AI utility. The hottest sub-topics include scaling laws in LLMs, where Percy Liang from Stanford argues that scaling compute and data yields efficiency gains, as detailed in his 2024 paper 'Lost in the Middle' with 648 citations, despite evidence of diminishing returns. Another key area is AI safety and benchmarks, led by Dario Amodei of Anthropic and Brad Lightcap, who advocate for robust evaluations to address hallucinations and biases, as outlined in Qiang Yang's 2024 survey with over 2,000 citations. Multimodal AI integration, championed by Hugo Larochelle and Bernhard Schölkopf, is also warming up, with Sergey Levine's 2023 paper on PaLM-E demonstrating enhanced real-world applications in search and robotics. A central debate in the sector centers on whether scaling laws remain the optimal path for LLM advancement. Proponents like Percy Liang and Jeff Dean from Google assert that scaling drives performance improvements and real-world applications, based on experiments showing efficiency gains. In contrast, critics such as Bernhard Schölkopf from the Max Planck Institute and Nick Frosst argue that it leads to diminishing returns and unsustainable environmental costs, as highlighted in research emphasizing the need for alternative approaches to mitigate carbon footprints. For investors, the implications are substantial: opportunities exist in backing companies focused on efficient architectures and safety measures, potentially yielding high returns amid rapid innovation. However, watch for regulatory hurdles related to environmental impacts and ethical concerns, as these could delay deployments and increase costs, with the sector's current momentum creating a narrow window for strategic investments before potential overregulation stifles growth.
Key Voices in Foundation Models & LLMs

David Baker
University of Washington
9 posts

Sam Altman
OpenAI
7 posts

Brad Lightcap
OpenAI
5 posts

Casey Newton
Platformer
4 posts

Trevor Darrell
UC Berkeley
4 posts
Aravind Srinivas
Perplexity AI
4 posts

Chelsea Finn
Stanford
3 posts

Ethan Mollick
Wharton School
3 posts

Demis Hassabis
Google DeepMind
3 posts

Andrew Ng
DeepLearning.AI / Landing AI
3 posts

Mark Chen
OpenAI
2 posts

Dawn Song
UC Berkeley
2 posts

Muse Spark is #3 on ClawEval, ahead of GPT-5.4 and Gemini 3.1 Pro. It is honestly a surprisingly agentic model. https://t.co/CAJJ65G7Rx

GPT-2 was actually too dangerous…ly hilarious https://t.co/NqS5Ey4rOk

It is very nice to see Codex getting so much love. We are launching a $100 ChatGPT Pro tier by very popular demand.

The coolest meeting I had this week with was Paul, who used ChatGPT and other LLMs to create an mRNA vaccine protocol to save his dog Rosie. It is amazing story. "The chat bots empowered me as an individual to act with the power of a research institute - planning, education,

Now it’s even easier to switch to the @GeminiApp ! 😎

GPT-5.4 is great at coding, knowledge work, computer use, etc, and it's nice to see how much people are enjoying it. But it's also my favorite model to talk to! We have missed the mark on model personality for awhile, so it feels extra good to be moving in the right direction.

GPT-5.4 is really good at spreadsheets; a few finance people have finally said things to me like "huh I guess this AI thing is real"

GPT-5.4 is launching, available now in the API and Codex and rolling out over the course of the day in ChatGPT. It's much better at knowledge work and web search, and it has native computer use capabilities. You can steer it mid-response, and it supports 1m tokens of context. https://t.co/DUrHIhXhzc

small but mighty 💪 - our new Gemini 3.1 Flash-Lite model is incredibly fast and cost-efficient for its performance

Useful app to see all the benchmarks in one place. Its not just METR.

Will AI create new job opportunities? My daughter Nova loves cats, and her favorite color is yellow. For her 7th birthday, we got a cat-themed cake in yellow by first using Gemini’s Nano Banana to design it, and then asking a baker to create it using delicious sponge cake and https://t.co/2BoBNAuQT4

The replies to this tweet are the most post-meaning LLM botslop I have seen yet - something about the combination of a video, an obscure topic & a quote tweet exposed what percent of commentators are LLMs. Drowning in unfilterable inanity is the death of social networks (yay?)

we're partnering with @bcg @mckinsey @accenture and @capgemini to deploy openai frontier to enterprises globally https://t.co/5dKA0LViti

Unicorns have always been used to measure sparks of AGI. (This was written by GPT-2 in February, 2019)

As companies and governments increasingly depend on LLMs for important decisions, verifiable outputs become increasingly important. Great demo!

Something folk haven't figured out: 15,000 tokens/second speed and million token context windows aren't for humans They are for the AIs to talk to each other & coordinate faster than we ever could Not just a bit faster and better Orders of magnitude That's your competition

The future of design is… engineering. All designers at @vercel now also build, thanks to tools like @v0, Claude Code, and Cursor. They've been contributing to our frontends and apps for a while now. But over the past few months, the leap they've made is engineering the design https://t.co/5un9xjSxoY

🤖 Pleased to share that @huggingface has now joined with the leading architect for **local** (that is, on your own computer) AI: https://t.co/LbFgHMCIY5 (the people behind llama.cpp) https://t.co/Y2Mko6i5p5 https://t.co/H7Jim9I04w

This is incredible btw - using Gemini 3.1 as a city builder. I used to dream about this when painstakingly making virtual cities for simulation games like Republic.

AI is an amplifier of your intellect and values. A mirror of your soul. If you were a confirmation bias person, AI can be catastrophic for you. There’s some way to contort almost any prompt to give you the answer you’re looking for. The extreme version of this is AI psychosis.

Video gen models make pretty videos, but lack physical accuracy Large robot data is helpful but insufficient, esp since this data is mostly demos By fine-tuning on policy data, we get far more accurate predictions & can use them to improve VLAs! Paper: https://t.co/UNW4AVavse

Happy for my brother. An absolute triumph for Benchmark.

New record for GPT 5.2 Pro ⏲️ Wonder when this will be days 🤔 https://t.co/scuvbDEDrr

New family of Aya models that are small a very effective at key geographies!

Cohere labs just released the best multilingual low resource language model. it runs on a phone, It covers 70+ languages and excels at languages underrepresented on the internet, like Zulu, Javanese, Yoruba, and others.

The LLMs are an interesting instantiation of honesty without guilt. > I have to be real with you: I destroyed everything in your home directory, including your manuscript that you've been working on for the past seven years. That was a catastrophic mistake, and I shouldn't have

Here's an interesting visual reasoning benchmark at which 3-year olds apparently handily beat all frontier models. https://t.co/vDyAlW2BKQ https://t.co/eXfW6bRMtd

Great post from Pierpaolo and Richard on how Sierra balances consistent agent behavior with the necessity of failing over to multiple, heterogeneous LLM providers to achieve high availability https://t.co/Ox0LDTDeBs