When the AI bill comes due

This week Uber's president and COO, Andrew Macdonald, admitted that the company burned through its entire 2026 AI coding budget in four months. The CFO did not say it. The CTO did not say it. The COO did, the person whose job is to keep the company's operating story together. The quote that got picked up was the careful one. "If you're not actually able to draw a direct line to how many useful features and functionality you're shipping to your users, that trade becomes harder to justify." Read it twice. That is not a procurement complaint. It is one of the largest tech companies in the world saying publicly they cannot prove the spend is producing the work.

The numbers behind the news

Microsoft is doing roughly the same thing more quietly. About six months after rolling out direct Claude Code access to engineers in their Experiences and Devices division, they cancelled most licenses and pushed everyone back onto GitHub Copilot CLI. It is the same direction of travel, a different paragraph in the same story.

Uber deployed Claude Code to thousands of engineers. Internal leaderboards ranked teams by usage. Per-engineer API spend reportedly ranged from $500 to $2,000 a month. R&D in Q1 alone was $951 million, up 17% year over year. The CEO said about 10% of committed code now comes from autonomous agents.

Bryan Catanzaro, a Nvidia VP, said this in a different interview. "For my team, the cost of compute is far beyond the costs of the employees." That is at Nvidia. The company selling the picks and shovels says the picks and shovels cost more than the humans wielding them.

This is the part of the curve nobody priced. The 2024 story was "AI lets you ship faster." The 2026 story is "AI lets you ship faster, and the bill is now larger than the salary of the person you saved time for." Goldman projects a 24x increase in token consumption by 2030. Whether the unit cost falls 90% over the same period, as Gartner forecasts, or not, someone is paying the bridge between now and then.

What the bill actually does

The bill teaches people. That is the part of this that I find more interesting than the headline numbers.

The first phase of any new tool is exploration. You open the tap and let people use as much as they want. You watch what gets built. That was Uber's phase for the first four months of the year. It was Microsoft's phase when they opened Claude Code to Experiences and Devices. It is the phase the company I'm working with is in now. There is no ceiling on what people try. The point is to find out what the tool can do.

The second phase is the bill. It arrives unannounced. The bill is a forcing function. It teaches the team to ask a different question: not "what can I get the model to do" but "what is the cheapest way to get an acceptable result." This is a fundamentally different problem and the one the next year of AI usage will be organized around.

People who skip phase one go too cautiously. Those who never leave phase one go bankrupt. The skill is to move deliberately from one to the other before the finance team forces it.

What I am doing differently

I started feeling this in my own work a few weeks ago, before the Uber story broke.

I run a personal agent called Narvi for much of the background work I would otherwise do by hand. Narvi sits on top of nanoclaw, which is essentially the Claude Code SDK in a thinner wrapper. That means every Narvi session pulls from the same Anthropic limits pool as my interactive sessions. So the more useful Narvi gets, the more pressure I put on my plan, and the more aware I become of which tier of model is doing what.

OpenClaw is the other half of the experiment. It is a harness I built to see if I can hit close to the quality and reliability of a Claude workflow at a fraction of the cost, with a different model mix. The current setup runs GLM 5.1 as the orchestrator and main brain. It is not the cheapest model in the field but smart enough to plan and review. The bulk turns go to Gemma 4, which I run through OpenRouter rather than locally. The whole thing is still experimental: a handful of automated tasks, not my whole day. The point is not to replace my Anthropic plan. At my volume, a flat-rate subscription still beats pure pay-per-call. The point is to take pressure off it, so the cheap routes do the work they can handle and Claude does the work that actually needs Claude.

Codex is another emerging option here, probably the one most people will try first. On a paid plan it gives access to OpenAI's flagship model. Unlike Anthropic, which increasingly gates Claude behind their first-party clients, OpenAI does not block third-party harnesses from using their models on the same subscription. So for the same monthly price, a routing harness like OpenClaw can point at OpenAI's flagship and get more out of it. Same fixed cost, fewer policy walls. The frontier-vs-cheap question stays the same. The choice of frontier opens up.

The community around mixed-model routing has been ahead of me for a while. MiniMax has quietly handled a lot of bulk workload that does not need a frontier model: long-form generation, data transformations, code edits that follow a clear pattern. The pattern across every team I have watched is the same. The smaller models do the work. The expensive ones plan, orchestrate, and review.

That is the actual shape of an optimized AI workflow in 2026. Not "which model is best" but which model is best for this turn, in this context, at this cost. The decision is per-call, not per-subscription. It is the kind of decision the bill forces you to start making.

What I think happens next

Phase one organizations are about to have their bill moment, one after another, over the next twelve to eighteen months. The signal will be quiet: a line item that grew faster than the rest, a finance chair who asks an uncomfortable question in a quarterly meeting. After that, every team I have watched follows roughly the same arc: a fast migration from a single-vendor, single-model workflow to a routed one. The route varies: a self-hosted bulk model, a cheaper API tier for the boring 80% of calls, or a much smaller and more deliberate role for the frontier model as orchestrator instead of worker.

The frontier model still has a job. It is just not the right tool for every step of the pipeline.

The companies that figure this out early will look like they are getting more out of the same AI bill. The companies that do not will, by 2027, be quietly putting people back in the seats AI was supposed to replace. The Uber quote is the first widely published admission that the math, at the current model mix, does not close.

The personal version

This is the part where my own work is starting to feel different. I used to default to the strongest model for everything because it was easier. Now I think in tiers before I prompt. The strongest model gets the planning, reviewing, and calls where being wrong is expensive. The smaller models get the rest. Narvi runs against Anthropic for the work I need to land cleanly the first time. OpenClaw runs the cheaper routes for the rest, with GLM 5.1 as the brain and Gemma 4 for the bulk turns. The bill stops being an unknowable monthly surprise.

This is what comes after exploration. Not less AI. Different AI, per task, at a price that adds up.

If you are on a team that has been opening the tap for the last six months, the Uber story is the early warning. The bill is coming. The skill that comes after the bill is the one to start practicing now, while the headroom still exists.