LLMMonday, April 13, 2026·8 min read

LLM is a compiler, not a runtime

AI Agents Daily

Curated by AI Agents Daily team · Source: HN LLM

Why This Matters

Developer Anton May argues that AI agents waste enormous amounts of money by treating LLM inference as a recurring runtime cost instead of a one-time build step. His proposal, published in April 2026, outlines a system where LLMs generate reusable, deterministic mini-apps that ca...

Anton May, writing for the Pocket Bot blog in April 2026, makes a pointed case that the entire AI agent industry has been thinking about LLMs backwards. The argument is deceptively simple: every time an agent handles a request, it fires up an LLM, does the same reasoning it did last time, and charges you for the privilege of re-deriving an answer you already paid for. May is not shy about calling this out as waste on a massive scale, and he has a concrete alternative in mind.

Why This Matters

The cost problem May describes is not theoretical. Developers are already waking up to four-figure API bills from overnight agent runs gone wrong, and May specifically names OpenClaw as an offender that makes unsupervised LLM calls every 15 minutes while passing in an entire system prompt each time. The skills marketplace ecosystem, which now numbers in the millions of listings by May's count, is a band-aid on a structural wound. Treating every inference call as a runtime expense rather than a build investment is the AI equivalent of running a full database query on every page load instead of caching the result, and the industry needs to hear that directly.

Stay ahead in AI agents

Daily briefing from 50+ sources. Free, 5-minute read.

The Full Story

May opens his piece with a complaint that will resonate with anyone building on top of LLM APIs: the same agent, handling the same request, running the same reasoning chain, costing real money every single time. He frames this as the "N+1 query problem of the AI era," borrowing from a classic database antipattern where developers accidentally fire off one database query per record in a list instead of fetching everything in a single efficient call. The analogy holds up. Inference is being treated like electricity, a recurring operational cost, when it should be treated like compilation, a one-time cost that produces something reusable.

His critique of skills-based agent frameworks is worth unpacking. Skills, in most modern agent systems, are blocks of natural language instructions that get stuffed into an LLM's context window to guide its behavior on a specific type of task. The idea is that pre-written instructions save the model from having to reason through a task from scratch. May's problem with this approach is that skills are still just prompts. They are instructions, not programs. They do not produce deterministic, cacheable, executable outputs. They are, in his framing, a band-aid on a broken model rather than a fix to the underlying architecture.

The alternative May proposes is a library of what he calls mini-apps, small deterministic programs written in something like QuickJS, a lightweight JavaScript runtime. When a user makes a request, a cheap local model handles the intake, performs an embedding-based search across a library of up to 100 mini-apps, and decides whether an existing program covers the request or whether a new one needs to be generated. If a suitable mini-app exists, the local model fills in the typed input arguments and runs it. If it does not exist, the system defers to a more capable coding agent, which May references as something like Claude Code in a loop, to generate the missing program.

The critical distinction is what the system produces at the end of that generation step: not a prompt, not a chain-of-thought, but an actual script with hardcoded values. That script gets tested in a sandbox and, once verified, lives in the shared library for anyone to reuse. Future requests that match the same pattern never touch the expensive model again. The LLM did its work once, at build time, and the deterministic program handles runtime from that point forward.

May also envisions this library as a community resource, not just a personal cache. Users across a broader base share the load of generating and correcting mini-apps, which means any individual user benefits from programs that someone else's agent already debugged and validated. He draws a parallel to how local models are already pushing the field in this direction, and he expresses confidence that this is the trajectory the industry will follow regardless.

Key Details

Anton May published this piece on the Pocket Bot blog in April 2026.
He specifically names OpenClaw as an agent tool that makes LLM calls every 15 minutes, passing in a full system prompt each time.
The skills marketplace ecosystem has grown to what May describes as "a million" listings.
He cites hermes-agent by Nous Research as a more thoughtfully built alternative that semantically fetches and self-corrects skills.
His proposed mini-app library retrieves up to 100 candidate programs per user request using embedding-based search.
Mini-apps are scoped to under 1,000 lines of code in a single file for compatibility with current coding agents.
QuickJS is named as a candidate runtime for executing generated mini-app scripts.

What's Next

If agent framework builders take this framing seriously, the next wave of AI tools will need to incorporate sandboxed program libraries and typed interfaces for generated scripts, not just better prompt engineering. Watch for projects in the open-source agentic coding space, particularly those building on top of Claude Code or similar coding agents, to experiment with persistent script caches as a first step. The economic pressure from API costs alone will push developers toward some version of this architecture within the next 12 months.

How This Compares

The academic community has been circling this same idea from a different angle. A January 5, 2026 arxiv paper titled "The New Compiler Stack: A Survey on the Synergy of LLMs and Compilers," co-authored by researchers from the Chinese Academy of Sciences and the University of Leeds, treats LLMs as formal components of a compilation pipeline rather than standalone tools. That framing aligns closely with May's argument, though the academic paper focuses on how LLMs fit into existing compiler infrastructure while May is more interested in the economic consequences of getting the model wrong. Both are pointing at the same architectural truth from different directions.

Security researcher Adam Shostack made a similar compiler analogy popular in developer circles, arguing that LLMs translate natural language into executable code the way a compiler translates high-level source code to binaries. His visualization of the three-stage pipeline, human language, generated code, and executable output, is clean and has gotten traction. But Shostack's framing is mostly about how to prompt correctly, whereas May is arguing for a deeper structural change in how agent systems are built and billed.

Deepak Vohra, writing for Sticky Minds in February 2026, warned developers about what he called "the simulation trap," the mistaken belief that a general-purpose LLM is actually running code rather than generating text that looks like code output. That warning is a useful complement to May's piece. Vohra's concern is about developer confusion over what LLMs can do. May's concern is about what happens when you build production systems on top of that confusion and run them at scale. A discussion on Hacker News from Federico Pereiro comparing LLMs to high-level programming languages received 177 points and 378 comments, which signals genuine community interest in related AI news about how to properly frame these tools in a software architecture.

FAQ

Q: What does it mean to call an LLM a compiler? A: It means the LLM's job is to translate a human request into a working program once, not to re-execute reasoning every time a user asks something. Just as a compiler turns source code into a binary you can run repeatedly, an LLM should ideally generate a reusable script rather than thinking through the same problem from scratch on every API call.

Q: Why are AI agent costs so high right now? A: Most agent frameworks treat every user request as a fresh LLM inference call, even when the request is identical to something the system handled before. This means developers pay for the same reasoning over and over. Anton May compares it to the N+1 database query problem, a well-known antipattern where poor architecture multiplies costs unnecessarily.

Q: What are mini-apps in the context of AI agents? A: Mini-apps, as May describes them, are small deterministic programs, potentially written in QuickJS, that an LLM generates once and stores in a shared library. When a similar user request comes in later, a cheap local model retrieves the relevant mini-app, fills in the required arguments, and runs it without ever calling an expensive LLM again. Check the AI Agents Daily guides for more on agentic architecture patterns.

Anton May's framing is blunt, practical, and grounded in real economic pain that developers are already feeling. The shift from treating LLM inference as a runtime cost to treating it as a build cost will not happen overnight, but the architecture he describes is coherent and the incentives pushing toward it are only going to grow stronger as API bills scale with usage. Subscribe to the AI Agents Daily weekly newsletter for daily updates on AI agents, tools, and automation.

Our Take

This story matters because it signals a shift in how AI agents are being adopted across the industry. We are tracking this development closely and will report on follow-up impacts as they emerge.

Post Share

Get stories like this daily

Free briefing. Curated from 50+ sources. 5-minute read every morning.

LLM is a compiler, not a runtime

Why This Matters

The Full Story

Key Details

What's Next

How This Compares

FAQ

Get stories like this daily

More in LLM

Milla Jovovich's New Open Source LLM Memory App and the Dark Code Problem

Your intuition of LLM token usage might be wrong

Show HN: Bloomberg Terminal for LLM ops – free and open source

Learn more — Guides