On SubQ, and why efficiency is the new frontier

Yesterday a small AI startup called Subquadratic announced their first model. The launch tweet is closing on 10 million views, which is unusual reach for an architecture announcement. The more I sit with it the more I think this is one of the more important AI shipments of the year. Not because of how smart their model is. Because of how cheap.

The math, in plain words

A short detour, in plain words. The big AI models you already use, like ChatGPT, Claude, and Gemini, all run on the same kind of architecture, called a transformer. The way a transformer works is roughly this. When you give it a long input, it has to compare every word against every other word, and then again, and then again. The longer the input, the more comparisons it has to make, and the math is brutal: doubling the input quadruples the work. That is the reason long chats slow down. That is the reason context gets expensive. That is the reason every frontier model team is fighting to squeeze a little more efficiency out of the same shape of math.

Subquadratic threw that math out.

Their architecture scales linearly. Twice the input, twice the work, instead of four times. At very long inputs the gap is enormous: roughly a thousand times less compute at 12 million tokens, accuracy that holds up against frontier models on the standard tests, and pricing in the neighborhood of one-fifth of what the leading models charge today. Even if half of that survives contact with production, the cost ceiling for AI just moved.

Intelligence is not the only frontier anymore

Here is why I think this matters more than the headline lets on.

The story for the last two years has been "the next model will be smarter." Each release jumped a tier on the benchmarks, the demos got more impressive, and people kept paying more for access. That curve has flattened. Headline accuracy gains are real but small now. Other things have started doing more of the work: better reasoning loops, longer context, faster inference, lower cost per turn. Pure intelligence is no longer the only frontier. Maybe not even the most interesting one anymore.

What is interesting now is everything around the model. How long can it think before it forgets. How much it costs you per turn. How fast it answers. How much context you can give it without paying a premium for the privilege. Those are not glamorous problems. They are the ones that decide who actually gets to use this stuff seriously.

Where the regular developer comes in

Frontier intelligence is technically inside a $20 subscription. Anyone with a Pro plan on Claude or ChatGPT can talk to a top-tier model. That part is fine. What is not fine is what the $20 plan is quietly turning into: the chat tier. A 1-on-1 conversation with the model, plus a little web search, plus modest usage caps. The moment you try to do anything closer to a workflow, like a coding agent that runs in your terminal, a long-running session, an automation that spans hours, you start running into limits. And the limits push you toward the $100 plan, then the $200 plan, then the API bill that climbs the moment your idea starts to work.

This is not me reading the room. Anthropic has been openly tightening what runs on Pro, with Claude Code increasingly framed as a Max experience. OpenAI does roughly the same thing across Plus, Team, and the $200 tier. The pattern across the industry is consistent: the $20 plan is the basic chat experience. The actual transformative-workflow tier keeps sliding upward in price.

The soft wall

If you are billing a salary against that, fine. If you are a kid in a small town trying to build a side project, or a freelancer in São Paulo or Lagos where the same $200 subscription is a meaningful share of monthly rent, the math runs out fast. The five-hour windows, the weekly caps, the API bills. All of it forms a soft wall around who gets to use the best of this technology to actually do work, not just to chat.

What changes when the math changes

A subquadratic architecture, if it holds up, does not push that wall a little further out. It moves it.

Compute is what the price is built on. Less compute means cheaper inference. Cheaper inference means a frontier-level model with a long context can reasonably sit inside a $20 plan, or the API tier underneath it, without bankrupting the provider. The kid in the small town and the freelancer in São Paulo and Lagos start playing closer to the same game as the engineer at a well-funded startup.

Where I am cautious

I want to be honest that the savings do not always pass through cleanly to the end user. Cheaper compute can be eaten by margins, by competitive dynamics, by the fact that when something gets cheaper people just use more of it. So the path from "a thousand times less compute" to "a fifth of your current bill" is not automatic. But the direction is there, and competition usually does the rest.

I also want to be honest about the rest of the caveats. The headline numbers are research results in a controlled setup, not stress tests under real workloads. A 12-million-token demo in a paper is impressive but tells you nothing about how the model behaves when developers throw messy production data at it. We will know in a few months whether the architecture really does what the launch claims. I am holding the celebration loosely until then.

The frontier worth racing on

The direction is the part I cannot ignore. The race is no longer mostly about who can squeeze another point out of a benchmark. It is about who can run a frontier-level model for a fraction of what it costs today. Whoever gets there first opens AI to the people who have been quietly priced out of the parts of it that matter most. That is a much more interesting frontier than another decimal point on a leaderboard.

If you have spent the last year working around context limits, hourly windows, weekly caps, and rate limits, take a few minutes with the SubQ launch. The mood underneath it is the same one I have been feeling about my own work. The next round of progress is not going to come from making the model smarter. It is going to come from making the same intelligence cheaper, longer, and easier to run. That changes who gets to participate.

I requested early access. Fingers crossed.