// AI

The cheapest token is the one you never send

A few weeks ago I wrote about the moment the AI bill arrives and forces a team to stop asking "what can this model do" and start asking "what is the cheapest way to get an acceptable result." That post was about picking the right model for each task. This one is about the other half of the same problem, the half almost nobody talks about. Not which model you send the work to. How much you send it. Every word you feed an AI costs something, in money or in the usage caps that throttle your account, and most of what we feed it is bulkier than it needs to be. There is a small, free tool called Headroom that does one quiet thing about exactly that, and I have been running it every day for about a week.

First, what a token even is

If you are not a developer, one bit of plumbing makes the rest of this make sense. AI models read and write in "tokens," which are just small chunks of text, roughly a few characters each. A short word is one token. A long one is two or three. Everything you type, everything the AI says back, and everything it quietly pulls in to answer you, like a file or a page of search results, gets counted in tokens. And you pay for all of it. Sometimes in real dollars on a bill. Sometimes in the five-hour windows and weekly limits that decide when your AI stops answering. Either way, tokens are the meter.

Here is the part that surprises people. Every time you send a new message, the model re-reads the whole conversation up to that point, plus whatever it loaded along the way. So the bulky stuff does not cost you once. It costs you again on every turn.

What Headroom does

Headroom sits between you and the AI, like a filter on a tap. Before your request reaches the model, it shrinks the parts that are bulky and repetitive: long lists of data, logs, the contents of files, pages of search results. The kind of content that is mostly structure and noise. It leaves your actual words alone. The model gets a slimmer version of the same information, gives you the same answer, and a chunk of the bill never happens.

The clever part is that it does not throw anything away. It keeps the original on your own computer, and if the model decides it needs the full version of something, it can ask for it back. So you get the savings without the usual risk of squeezing your data down, which is losing the one detail you actually needed. The makers call this reversible, and it is the feature that made me comfortable leaving it on.

Two more things I like about it. It runs entirely on your own machine, so your data is not shipped off to another company to be processed. And it is free and open source under a permissive license, which means anyone can read exactly what it does, line by line. It was built by a developer who goes by chopratejas, and the whole thing lives at github.com/chopratejas/headroom. It plugs into most of the popular AI coding tools. Claude Code, Cursor, Codex and others. Claude Code is the one I run.

What the numbers say

On the kind of work it was built for, the savings are dramatic. The project's own tests show a search through a hundred results dropping from about 17,000 tokens to 1,400, a 92% cut. A messy log file going from 10,144 tokens down to 1,260, with the model still finding the exact error it was meant to find. Cuts of 60 to over 90% on that kind of bulky, structured content.

The more honest picture comes from the tool's own live numbers. Headroom keeps a public page of anonymous totals from everyone who runs it, with the option to switch that sharing off. The documented snapshot from this spring counted more than 1.4 billion tokens saved across hundreds of users. On that same data, the typical single request gets shrunk by around 5%, while the heavy sessions, the ones full of files and logs and search results, land in the 40 to 80% range. The tool says plainly why: most messages people send are short, and there is not much to compress in a sentence. The big wins live in the bulky work.

It has clearly struck a nerve. Headroom passed ten thousand stars on GitHub in its first few months, and other agent projects are already filing requests to build it into their own tools. You can watch the running totals climb on the community savings page in their docs.

My week with it

For what it is worth, here is my own week, offered as information rather than a verdict. I wired Headroom into Claude Code about a week ago and mostly left it running in the background. Worth saying where my numbers come from: most of my sessions lately are conversational, writing and thinking rather than churning through code and files, which is exactly the kind of work the tool compresses least. So my reductions have been in the single digits so far, with the occasional turn higher. I have not yet pointed it at Forge or Narvi, the home-grown agents I wrote about last time, or at the heavier code work where it earns its keep. That is the next experiment. For now it is the cheapest kind of improvement there is. I turned it on, I did not change how I work, and a slice of the bill quietly goes away in the background.

Why a small number still matters

The reason I think this is worth your ten minutes is the shape of the trade, not the size of any single saving. The tool is free. It runs locally. It works in the background. It keeps the originals so it cannot quietly lose your data. The downside is close to zero, and the upside compounds every single day you leave it on. A few percent off everything, on every turn, across a lot of people, is how a single open-source tool quietly saves over a billion tokens. The savings on your own machine are smaller, but they are the same shape, and they cost you nothing to collect.

The bigger point

I keep coming back to the same idea in these posts. The next stretch of progress in AI is not only about models getting smarter. A lot of it is about the same intelligence getting cheaper to run, because that is what decides who gets to keep using it. The funded startup will always afford the bill. The freelancer, the student, the person somewhere a subscription is a real share of the rent, they are the ones who feel every token. Headroom is one small, honest piece of that shift. Worth the ten minutes it takes to try.

Headroom is free and open source, built by chopratejas, at github.com/chopratejas/headroom. The live community savings page is in their docs.