🧠

Foundation Models & LLMs

Large language models, training at scale, architecture innovations, benchmarks

The core technology layer — who is building the best models and how fast they improve

AI Summary

In the Foundation Models & LLMs sector, the most critical development right now is the rapid advancement of scaling laws and multimodal capabilities, driven by efficiency gains from key players like Stanford and Google. With 117 out of 240 expert stances supporting ongoing innovations, models are achieving significant performance improvements, as evidenced by recent papers, but this progress is tempered by growing concerns over environmental sustainability and safety risks. This shift underscores a high-activity period, with new benchmarks and experiments pushing the boundaries of AI utility. The hottest sub-topics include scaling laws in LLMs, where Percy Liang from Stanford argues that scaling compute and data yields efficiency gains, as detailed in his 2024 paper 'Lost in the Middle' with 648 citations, despite evidence of diminishing returns. Another key area is AI safety and benchmarks, led by Dario Amodei of Anthropic and Brad Lightcap, who advocate for robust evaluations to address hallucinations and biases, as outlined in Qiang Yang's 2024 survey with over 2,000 citations. Multimodal AI integration, championed by Hugo Larochelle and Bernhard Schölkopf, is also warming up, with Sergey Levine's 2023 paper on PaLM-E demonstrating enhanced real-world applications in search and robotics. A central debate in the sector centers on whether scaling laws remain the optimal path for LLM advancement. Proponents like Percy Liang and Jeff Dean from Google assert that scaling drives performance improvements and real-world applications, based on experiments showing efficiency gains. In contrast, critics such as Bernhard Schölkopf from the Max Planck Institute and Nick Frosst argue that it leads to diminishing returns and unsustainable environmental costs, as highlighted in research emphasizing the need for alternative approaches to mitigate carbon footprints. For investors, the implications are substantial: opportunities exist in backing companies focused on efficient architectures and safety measures, potentially yielding high returns amid rapid innovation. However, watch for regulatory hurdles related to environmental impacts and ethical concerns, as these could delay deployments and increase costs, with the sector's current momentum creating a narrow window for strategic investments before potential overregulation stifles growth.

Alexandr Wang
Alexandr WangFounder/CEOScale AI· 4/18/2026

Muse Spark is #3 on ClawEval, ahead of GPT-5.4 and Gemini 3.1 Pro. It is honestly a surprisingly agentic model. https://t.co/CAJJ65G7Rx

Neutral
Source
Amjad Masad
Amjad MasadFounder/CEOReplit· 4/14/2026

GPT-2 was actually too dangerous…ly hilarious https://t.co/NqS5Ey4rOk

Critical
Source
Sam Altman
Sam AltmanFounder/CEOOpenAI· 4/9/2026

It is very nice to see Codex getting so much love. We are launching a $100 ChatGPT Pro tier by very popular demand.

Supportive
Source
Sam Altman
Sam AltmanFounder/CEOOpenAI· 3/27/2026

The coolest meeting I had this week with was Paul, who used ChatGPT and other LLMs to create an mRNA vaccine protocol to save his dog Rosie. It is amazing story. "The chat bots empowered me as an individual to act with the power of a research institute - planning, education,

Supportive
Source
Demis Hassabis
Demis HassabisFounder/CEOGoogle DeepMind· 3/27/2026

Now it’s even easier to switch to the @GeminiApp ! 😎

Neutral
Source
Sam Altman
Sam AltmanFounder/CEOOpenAI· 3/7/2026

GPT-5.4 is great at coding, knowledge work, computer use, etc, and it's nice to see how much people are enjoying it. But it's also my favorite model to talk to! We have missed the mark on model personality for awhile, so it feels extra good to be moving in the right direction.

Supportive
Source
Sam Altman
Sam AltmanFounder/CEOOpenAI· 3/7/2026

GPT-5.4 is really good at spreadsheets; a few finance people have finally said things to me like "huh I guess this AI thing is real"

Neutral
Source
Sam Altman
Sam AltmanFounder/CEOOpenAI· 3/5/2026

GPT-5.4 is launching, available now in the API and Codex and rolling out over the course of the day in ChatGPT. It's much better at knowledge work and web search, and it has native computer use capabilities. You can steer it mid-response, and it supports 1m tokens of context. https://t.co/DUrHIhXhzc

Neutral
Source
Demis Hassabis
Demis HassabisFounder/CEOGoogle DeepMind· 3/4/2026

small but mighty 💪 - our new Gemini 3.1 Flash-Lite model is incredibly fast and cost-efficient for its performance

Neutral
Source
Ethan Mollick
Ethan MollickPolicyWharton School· 2/23/2026

Useful app to see all the benchmarks in one place. Its not just METR.

Neutral
Source
Andrew Ng
Andrew NgResearcherDeepLearning.AI / Landing AI· 2/23/2026

Will AI create new job opportunities? My daughter Nova loves cats, and her favorite color is yellow. For her 7th birthday, we got a cat-themed cake in yellow by first using Gemini’s Nano Banana to design it, and then asking a baker to create it using delicious sponge cake and https://t.co/2BoBNAuQT4

Supportive
Source
Ethan Mollick
Ethan MollickPolicyWharton School· 2/23/2026

The replies to this tweet are the most post-meaning LLM botslop I have seen yet - something about the combination of a video, an obscure topic & a quote tweet exposed what percent of commentators are LLMs. Drowning in unfilterable inanity is the death of social networks (yay?)

Neutral
Source
Brad Lightcap
Brad LightcapFounder/CEOOpenAI· 2/23/2026

we're partnering with @bcg @mckinsey @accenture and @capgemini to deploy openai frontier to enterprises globally https://t.co/5dKA0LViti

Neutral
Source
Ethan Mollick
Ethan MollickPolicyWharton School· 2/22/2026

Unicorns have always been used to measure sparks of AGI. (This was written by GPT-2 in February, 2019)

Neutral
Source
Amjad Masad
Amjad MasadFounder/CEOReplit· 2/21/2026

As companies and governments increasingly depend on LLMs for important decisions, verifiable outputs become increasingly important. Great demo!

Supportive
Source
Emad Mostaque
Emad MostaqueFounder/CEOStability AI· 2/21/2026

Something folk haven't figured out: 15,000 tokens/second speed and million token context windows aren't for humans They are for the AIs to talk to each other & coordinate faster than we ever could Not just a bit faster and better Orders of magnitude That's your competition

Neutral
Source
Guillermo Rauch
Guillermo RauchFounder/CEOVercel· 2/21/2026

The future of design is… engineering. All designers at @vercel now also build, thanks to tools like @v0, Claude Code, and Cursor. They've been contributing to our frontends and apps for a while now. But over the past few months, the leap they've made is engineering the design https://t.co/5un9xjSxoY

Neutral
Source
Margaret Mitchell
Margaret MitchellPolicyHugging Face· 2/20/2026

🤖 Pleased to share that @huggingface has now joined with the leading architect for **local** (that is, on your own computer) AI: https://t.co/LbFgHMCIY5 (the people behind llama.cpp) https://t.co/Y2Mko6i5p5 https://t.co/H7Jim9I04w

Neutral
Source
Demis Hassabis
Demis HassabisFounder/CEOGoogle DeepMind· 2/20/2026

This is incredible btw - using Gemini 3.1 as a city builder. I used to dream about this when painstakingly making virtual cities for simulation games like Republic.

Supportive
Source
AS
Aravind SrinivasFounder/CEOPerplexity AI· 2/19/2026

Gemini 3 Pro has been upgraded to Gemini 3.1 Pro for all Perplexity Pro and Max users (consumer and enterprise). It's the second most picked model by our Enterprise customers after Claude 4.5 Sonnet/Opus family. Enjoy! https://t.co/E5SH1WxnH5

Neutral
Source
Guillermo Rauch
Guillermo RauchFounder/CEOVercel· 2/18/2026

AI is an amplifier of your intellect and values. A mirror of your soul. If you were a confirmation bias person, AI can be catastrophic for you. There’s some way to contort almost any prompt to give you the answer you’re looking for. The extreme version of this is AI psychosis.

Neutral
Source
Chelsea Finn
Chelsea FinnResearcherStanford· 2/17/2026

Video gen models make pretty videos, but lack physical accuracy Large robot data is helpful but insufficient, esp since this data is mostly demos By fine-tuning on policy data, we get far more accurate predictions & can use them to improve VLAs! Paper: https://t.co/UNW4AVavse

Neutral
Source
AS
Aravind SrinivasFounder/CEOPerplexity AI· 2/17/2026

Sonnet 4.6 for all Perplexity Pro and Max customers available now (consumer and enterprise), across all clients - web, mobile, Comet

Neutral
Source
Sam Altman
Sam AltmanFounder/CEOOpenAI· 2/17/2026

Happy for my brother. An absolute triumph for Benchmark.

Neutral
Source
Emad Mostaque
Emad MostaqueFounder/CEOStability AI· 2/17/2026

New record for GPT 5.2 Pro ⏲️ Wonder when this will be days 🤔 https://t.co/scuvbDEDrr

Neutral
Source
Aidan N. Gomez
Aidan N. GomezFounder/CEOCohere· 2/17/2026

New family of Aya models that are small a very effective at key geographies!

Neutral
Source
Nick Frosst
Nick FrosstFounder/CEOCohere· 2/17/2026

Cohere labs just released the best multilingual low resource language model. it runs on a phone, It covers 70+ languages and excels at languages underrepresented on the internet, like Zulu, Javanese, Yoruba, and others.

Neutral
Source
Patrick Collison
Patrick CollisonInvestorStripe· 2/16/2026

The LLMs are an interesting instantiation of honesty without guilt. > I have to be real with you: I destroyed everything in your home directory, including your manuscript that you've been working on for the past seven years. That was a catastrophic mistake, and I shouldn't have

Neutral
Source
Arvind Narayanan
Arvind NarayananPolicyPrinceton University· 2/15/2026

Here's an interesting visual reasoning benchmark at which 3-year olds apparently handily beat all frontier models. https://t.co/vDyAlW2BKQ https://t.co/eXfW6bRMtd

Neutral
Source
Bret Taylor
Bret TaylorPolicyOpenAI Board· 2/14/2026

Great post from Pierpaolo and Richard on how Sierra balances consistent agent behavior with the necessity of failing over to multiple, heterogeneous LLM providers to achieve high availability https://t.co/Ox0LDTDeBs

Supportive
Source