Mneme – project memory injection for LLM workflows
A developer named TheoV823 has released Mneme, an open-source tool that injects structured project memory directly into LLM workflows without requiring a vector database. This matters because it offers AI agent builders a simpler, more interpretable alternative to retrieval-augme...
According to the GitHub repository published by developer TheoV823 on April 16, 2026, Mneme is a Python-based library that gives large language models access to persistent project memory by injecting structured context directly into the model's prompt, bypassing the semantic search infrastructure that most developers currently rely on. The project surfaced on Hacker News and has drawn attention from developers looking for lighter-weight alternatives to full RAG pipelines. The repository, released under an MIT license, includes a FastAPI layer, a core library, and example project memory files ready to use out of the box.
Why This Matters
The vector database industry has attracted enormous capital over the past three years, with companies like Pinecone, Weaviate, and Qdrant building substantial businesses on the premise that semantic search is the only practical path to LLM memory. Mneme challenges that assumption head-on, and even if the project stays small, it is a signal that developers are actively seeking simpler architectures. Direct memory injection is more auditable than cosine similarity lookups, and auditability matters enormously as enterprises push AI agents into production workflows. This is the kind of grassroots tooling that, historically, reshapes how entire categories of infrastructure get built.
Daily briefing from 50+ sources. Free, 5-minute read.
The Full Story
TheoV823 published the first complete version of Mneme on April 16, 2026, with 10 commits to the main branch across 5 active development branches. The project is named after Mneme, one of the three original Muses in Greek mythology associated with memory, which is a fitting name for a tool designed to give AI systems a form of durable recall.
The core idea is straightforward. Most LLM-powered agents today use retrieval-augmented generation, a pattern where past information is stored in a vector database, converted into numerical embeddings, and retrieved through similarity search when a new query arrives. RAG works, but it requires running and maintaining a separate database service, tuning similarity thresholds, and managing the computational overhead of converting text to embeddings on every request. Mneme skips all of that. Instead, it stores project memory as structured JSON and injects the relevant rules, constraints, facts, and examples directly into the model's context window before each completion call.
The technical architecture includes several components working together. The core library, housed in a directory called mneme-project-memory, contains a memory store, a retriever, a context builder, an LLM adapter, and an evaluator. There is also a demo script that runs a before-and-after comparison between a baseline LLM response and a Mneme-enhanced response, complete with alignment scoring so developers can actually measure how much the injected memory improves output quality. The example project memory file ships with 20 items and 5 decisions pre-populated, giving new users an immediate sense of what structured memory looks like in practice.
The FastAPI wrapper, found in app/api.py, exposes a single POST endpoint at /complete. It accepts a question alongside either an inline memory dictionary or a path to a memory file, and it returns both the model's answer and a context summary that breaks down which rules, constraints, facts, and examples were injected for that particular request. This transparency is one of the most practically useful features. When something goes wrong with an agent's output, a developer can inspect exactly what memory was provided, which is far easier than debugging why a vector similarity search returned the wrong documents.
One commit message reveals that the project was co-authored with Claude Sonnet 4.6, Anthropic's model, which TheoV823 credited directly in the git log with the notation "Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com." That is an unusually transparent acknowledgment of AI-assisted development and reflects a growing norm among open-source developers who use models as coding partners rather than tools they hide.
The repository can be installed as a standard Python package using pip install -e . for the core library or pip install -e .[api] to include the FastAPI dependencies. That packaging decision lowers the barrier to adoption considerably, since developers can drop it into an existing Python project without restructuring their environment.
Key Details
- TheoV823 published Mneme to GitHub on April 16, 2026, with 10 commits across 5 branches.
- The repository is licensed under MIT, making it free for commercial and personal use.
- The FastAPI endpoint at /complete returns a context_summary object covering 4 memory categories: rules, constraints, facts, and examples.
- The example project_memory.json file includes 20 memory items and 5 recorded decisions.
- The project can be installed in 2 configurations using pip, one for the core library and one adding FastAPI and uvicorn.
- Claude Sonnet 4.6 is credited as a co-author in multiple git commits.
- The Hacker News post for the project received 1 point and 0 comments as of the reporting date.
What's Next
The immediate question for Mneme is whether the memory injection approach scales to production workloads where context windows fill quickly and the cost of large prompts becomes meaningful. TheoV823 should prioritize adding a token budget parameter that caps how much memory gets injected per request, which would make the tool usable in cost-sensitive deployments. Developers evaluating the project should watch for benchmark comparisons against standard RAG setups, particularly around latency and answer quality on multi-turn tasks.
How This Compares
The most direct point of comparison is the broader RAG ecosystem. LangChain, LlamaIndex, and purpose-built vector databases like Pinecone have spent two years convincing developers that semantic retrieval is the mature approach to LLM memory. Mneme does not try to beat them at their own game. It sidesteps the game entirely by arguing that for many agent workflows, you do not need probabilistic similarity search at all. That is a credible argument for closed-domain agents with predictable, bounded memory needs, though it becomes harder to defend when an agent needs to search across thousands of past interactions.
The security angle is also worth taking seriously. Researchers Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang published a paper on arXiv in March 2025, revised in February 2026, titled "Memory Injection Attacks on LLM Agents via Query-Only Interaction." Their MINJA attack methodology shows that an adversary who can write to an agent's memory store can cause that agent to produce harmful outputs in later sessions. Mneme's current implementation does not appear to include authentication or validation on memory inputs, which means anyone building a multi-user product on top of it needs to add those controls themselves before shipping.
Compare Mneme also to the direction Anthropic has taken with its tools API and extended context work. Anthropic's approach has been to push toward longer native context windows, essentially making external memory systems less necessary by fitting more history into the model itself. Mneme and Anthropic's strategy are not mutually exclusive, but they reflect different bets about where the real bottleneck lies. Mneme bets it is in architectural complexity. Anthropic bets it is in raw context length. Both bets could be right simultaneously for different use cases, and that is precisely why projects like this deserve attention even at 1 GitHub star.
FAQ
Q: What is memory injection in AI agents? A: Memory injection means inserting stored information directly into the text prompt that gets sent to a language model, rather than searching for relevant information using a vector database. The model reads the injected memory as part of its input and uses it to give more informed, context-aware responses.
Q: How is Mneme different from RAG? A: RAG uses a vector database and semantic similarity search to find relevant past information before each model call. Mneme skips the vector database entirely and inserts structured memory directly into the prompt using explicit programmatic logic. This makes the system simpler to run and easier to debug, but less suited to searching large, unstructured memory archives.
Q: Is Mneme safe to use in production applications? A: The current release is an early-stage open-source project best suited for experimentation and internal tools. Before using it in production, developers should add input validation and access controls on the memory store, since academic research published in 2025 demonstrated that unauthorized memory injection can cause AI agents to behave maliciously.
The AI agent infrastructure space is moving fast, and tools like Mneme represent exactly the kind of pragmatic, developer-first thinking that tends to stick around. Keep an eye on TheoV823's repository for updates on token budgeting, authentication support, and benchmark results against standard RAG pipelines. Subscribe to the AI Agents Daily weekly newsletter for daily updates on AI agents, tools, and automation.
Get stories like this daily
Free briefing. Curated from 50+ sources. 5-minute read every morning.




