LLMSaturday, April 11, 2026·8 min read

Predict-Rlm: The LLM Runtime That Lets Models Write Their Own Control Flow

AD
AI Agents Daily
Curated by AI Agents Daily team · Source: HN LLM
Predict-Rlm: The LLM Runtime That Lets Models Write Their Own Control Flow
Why This Matters

Researchers have built a runtime called predict-rlm that lets language models write and execute their own Python control flow, storing state in variables instead of cramming everything into a growing context window. This approach, built on a research idea from MIT CSAIL's Alex Zh...

According to Isaac Miller, author and collaborator credited on the predict-rlm project page at repo-explainer.com, the system treats the language model not as a chatbot generating one answer at a time but as a runtime that orchestrates a live Python workspace. The repo, published under the Trampoline AI GitHub organization, operationalizes a research concept called Recursive Language Models, or RLMs, that Alex Zhang and Omar Khattab introduced to address what they call "context rot," the well-documented degradation in LLM performance as context windows grow crowded and expensive.

Why This Matters

Context rot is not a minor inconvenience. It is the central bottleneck for every developer trying to build agents that handle long documents, multi-step reasoning, or large codebases. The fact that an RLM implementation using GPT-5-mini beat full GPT-5 on the OOLONG long-context benchmark tells you something important: throwing a bigger model at the problem is not always the answer. predict-rlm is the most concrete open-source implementation of this idea, and it arrives at exactly the moment when the agent developer community is running out of patience with prompt-bloat workarounds.

Stay ahead in AI agents

Daily briefing from 50+ sources. Free, 5-minute read.

The Full Story

The core insight behind predict-rlm is surprisingly clean. Instead of feeding a model an ever-growing prompt that tries to carry all prior context forward, the system gives the model a Python interpreter as a workspace. The model writes code, the interpreter runs it, intermediate results live in variables and files, and the model loads only what it actually needs back into its token stream. The prompt stays lean. The state does not.

Alex Zhang and Omar Khattab published the foundational RLM research on October 15, 2025, with the paper available on arXiv under identifier 2512.24601v1. Their work introduced recursive self-calling as an inference strategy, meaning the model can invoke itself, or a specialized sub-model, before returning a final answer. Think of it like a function calling another function: the main model delegates a subtask to a sub-LM, waits for a structured result, and folds it back into the broader computation.

predict-rlm turns that research into working infrastructure. The system uses a Deno host environment running a WebAssembly sandbox, with a JavaScript Promise Integration bridge handling asynchronous calls between the sandbox and the host. That architecture makes parallel sub-calls practical rather than theoretical. A root LLM can split a task, spin up a sub-LM for table formatting, another for content cleaning, and another for summarization, all running concurrently while the main context stays uncluttered.

What the repo adds beyond the raw research concept is DSPy integration. DSPy signatures and dynamic schema reconstruction let the system behave like a typed workflow engine. Instead of loosely structured text bouncing between agents, each call has a defined input and output shape. That typed structure is what separates predict-rlm from the fragile agent loops that most developers have already abandoned after watching them hallucinate tool calls at step 7 of a 10-step task.

The skill merging and file plumbing features complete the picture. Long tasks can write intermediate outputs to virtual file mounts accessible inside the sandbox, so a multi-hour research or coding task does not have to rebuild its own context from scratch at each step. This mirrors what tools like Claude Code and OpenAI's Codex already do with file-system state and LLM-driven compression, but predict-rlm exposes the mechanism directly to developers who want to build their own typed, recursive workflows rather than using a closed product.

Key Details

  • Alex Zhang and Omar Khattab at MIT CSAIL published the foundational RLM paper on October 15, 2025, on arXiv (identifier 2512.24601v1).
  • A GPT-5-mini RLM implementation outperformed full GPT-5 on the OOLONG long-context benchmark, according to the research.
  • The predict-rlm runtime uses a Deno host with a WebAssembly sandbox and a JavaScript Promise Integration bridge for async sub-calls.
  • Isaac Miller is credited as author and collaborator on the project documentation.
  • The repo integrates DSPy signatures for typed input and output schemas across recursive calls.
  • The system supports at least 3 concurrent specialized sub-LMs: one for table formatting, one for content cleaning, and one for summarization.
  • The official research codebase lives at github.com/alexzhang13/rlm, while the operational implementation is at github.com/Trampoline-AI/predict-rlm.

What's Next

The immediate thing to watch is whether the DSPy community folds RLM-style recursive calls into its own standard primitives, which would dramatically expand the developer audience for this pattern. Developers building AI tools and platforms for document processing, long-form coding agents, or multi-step research pipelines should be testing predict-rlm against their current prompt-chaining setups now, because the benchmark data already suggests that recursive decomposition beats raw context scaling on cost and accuracy. If Google DeepMind or Anthropic publishes a competing framework before mid-2026, the design choices in predict-rlm, particularly the typed schema layer, will become the reference point for comparison.

How This Compares

The closest parallel in production is what Claude Code does with file-system state and rolling context compression. Anthropic has not published the internals, but the behavioral pattern is similar: write intermediate results to disk, compress what has already been reasoned about, and keep the active prompt focused. predict-rlm does the same thing but makes the mechanism transparent and programmable. That transparency is a meaningful advantage for developers who need to audit, debug, or customize the control flow. A closed product that does context management invisibly is useful. An open framework that lets you inspect the recursion tree is something you can actually build . Compare predict-rlm to the broader scaffolding movement documented by Prime Intellect in their RLM blog post. Prime Intellect frames scaffolding as a strategy that has consistently multiplied effective context length, and they point to chains of connected agents linked through prompts and file state as the dominant pattern already in use. predict-rlm formalizes that pattern with types, sandboxing, and recursive sub-calls, which is an important step beyond ad hoc agent chains stitched together with string templates.

Google DeepMind's parallel work, specifically Yash Akhauri and Xingyou Song's regression language model research published in July 2025 (arXiv identifier 2506.21718), is adjacent but distinct. That work focuses on numeric prediction through text-to-text regression, a domain-specific adaptation rather than a general-purpose inference strategy. The more relevant DeepMind comparison is their April 2026 research showing LLMs rewriting game theory algorithms and outperforming expert-designed implementations. Both that work and predict-rlm point in the same direction: the most interesting frontier is not bigger models but models that can reshape their own computational process.

FAQ

Q: What is a recursive language model and how does it work? A: A recursive language model is an LLM that can call itself, or another model, before returning a final answer. Instead of processing a massive prompt in one pass, it breaks a problem into parts, handles each part through a separate call, and assembles the results. The MIT CSAIL paper by Alex Zhang and Omar Khattab, published October 15, 2025, formally introduced this strategy.

Q: How does predict-rlm prevent the context window from getting too large? A: The system gives the model a Python sandbox as a workspace. Intermediate results, plans, and data are stored in variables and files inside that sandbox rather than in the prompt itself. The model loads only the information it currently needs into its token stream, so the active context stays small even as the overall task grows large.

Q: Do I need to know DSPy to use predict-rlm? A: Familiarity with DSPy helps because predict-rlm uses DSPy signatures to define typed input and output schemas for each recursive call. If you have already read through the DSPy guides and tutorials or worked with typed LLM pipelines before, the learning curve is manageable. Developers coming from unstructured prompt chaining will need to adjust their mental model.

The recursive language model pattern is not a research curiosity anymore. It is a practical answer to a problem that every serious agent developer has already hit, and predict-rlm is the first open-source runtime to make it operational with typed schemas, sandboxed execution, and parallel sub-calls built in. Keep an eye on the Trampoline AI repo for updates as the DSPy ecosystem catches up. Subscribe to the AI Agents Daily weekly newsletter for daily updates on AI agents, tools, and automation.

Our Take

This story matters because it signals a shift in how AI agents are being adopted across the industry. We are tracking this development closely and will report on follow-up impacts as they emerge.

Post Share

Get stories like this daily

Free briefing. Curated from 50+ sources. 5-minute read every morning.

Share this article Post on X Share on LinkedIn

This website uses cookies to ensure you get the best experience. We use essential cookies for site functionality and analytics cookies to understand how you use our site. Learn more