Ask HN: Is a purely Markdown-based CRM a terrible idea? Optimized for LLM agents
A developer posted on Hacker News proposing a CRM system built entirely from Markdown files instead of a relational database, specifically designed for LLM agents to read and write customer data. The idea is technically provocative and reflects a real tension in AI-native softwar...
A Hacker News user going by dmonterocrespo posted an Ask HN thread asking whether a purely Markdown-based CRM, built from the ground up for LLM agent consumption, is a smart architectural bet or an elaborate reimagining of a flat-file database. The post, which collected 1 point and 1 comment within 20 hours of posting, has sparked a question worth taking seriously: as AI agents become the primary consumers of business data, does the schema we inherited from 1970s relational theory still make sense?
Why This Matters
This is not a niche thought experiment. The broader Hacker News community has been wrestling with AI agent memory and data persistence for months, and a related discussion titled "AI agents are starting to eat SaaS" attracted 412 points and 386 comments just 3 months before this post. If agents are genuinely replacing SaaS workflows, then the data layer those agents read and write becomes a first-class architectural decision. Designing a CRM around what a human database administrator expects versus what a GPT-4-class model can efficiently parse is a real tradeoff, and most teams are not thinking hard enough about it yet.
Daily briefing from 50+ sources. Free, 5-minute read.
The Full Story
The proposal from dmonterocrespo is straightforward in its ambition and honest about its problems. Instead of storing leads, contacts, email threads, client technical specs, and system configurations in a relational database with normalized tables, every record becomes a Markdown file. YAML frontmatter carries the structured metadata, such as contact IDs, status fields, and timestamps, while the Markdown body holds the narrative: meeting notes, interaction history, technical requirements written in plain prose.
The reasoning centers on a single premise: the primary consumer of this CRM is not a human clicking through a GUI but an autonomous LLM agent operating in the background. Markdown, the argument goes, is the format LLMs process most naturally. Rather than requiring the agent to construct SQL queries, join tables, or navigate a schema, it simply reads a directory of text files the same way it reads anything else. The friction between the AI and the data drops considerably.
The proposed architecture has three layers. The storage layer is the local file system, holding all those Markdown files. Redis serves as the index layer, updating its mappings whenever a file is modified so the system can locate records quickly without scanning every file on disk. The third component is what dmonterocrespo calls the "Brain," a background LLM agent that runs during idle periods, reading files, writing summary documents, categorizing clients, scheduling follow-ups, and building what the proposal calls a "memory" text file for the system as a whole.
Replication and backup are elegantly simple in this model. Because the entire database is a directory of plain text files, copying the system is a matter of running rsync. No database dumps, no schema migrations, no export formats. That simplicity is genuinely attractive for small teams or solo developers who want powerful CRM functionality without database administration overhead.
The original poster does not pretend the idea is without problems. The post calls out file locking conflicts when a human and the agent attempt to write the same file simultaneously, concurrency bottlenecks under load, OS inode limits if the contact list scales into the hundreds of thousands, and a complete absence of ACID compliance. These are not minor caveats. They are the reasons flat-file databases were largely abandoned in enterprise contexts decades ago. The question being asked is whether those tradeoffs look different when the primary user is an AI agent rather than a human or a traditional application server.
The sole comment on the thread, from a user named noemit, pushed back directly. The commenter argued that it is more efficient for models to write database queries than to read large documents, framing the core tradeoff as a choice between smaller documents that require the model to locate them through queries versus larger documents the model reads in full. Noemit's position is that AI-native applications with staying power will prioritize token efficiency, and SQL queries accomplish that better than full document reads.
Key Details
- The post was submitted by Hacker News user dmonterocrespo on item ID 47721153, approximately 20 hours before indexing.
- The proposed stack uses 3 layers: local file system for storage, Redis for indexing, and an LLM agent as the operational brain.
- The system covers at least 5 data types: leads, contacts, email threads, client technical specs, and system configurations.
- A related Hacker News thread on AI agents displacing SaaS generated 412 points and 386 comments 3 months prior.
- A thread on LLM agent memory challenges posted 20 days before this proposal attracted 4 points and 3 comments, signaling the memory problem is still largely unsolved.
- The only comment on the thread, from user noemit, argued that query-based retrieval beats document reads for token efficiency.
What's Next
If dmonterocrespo builds a prototype, the hardest test will not be whether LLMs can read Markdown files. They clearly can. The test will be concurrent write performance when a human sales rep and a background agent both try to update the same contact record during a live call. Watch for whether the AI-native software design community converges on a hybrid model, using structured vector stores or lightweight SQL for metadata while keeping narrative context in plain text, which would validate the core intuition here while solving the concurrency problem.
How This Compares
The Markdown CRM idea sits in an interesting spot relative to what the broader industry is actually building. Vector databases like Pinecone, Weaviate, and Chroma have emerged as the de facto "AI-native" storage layer for the current generation of agent applications. They store semantic embeddings alongside metadata and enable similarity search, which is genuinely useful for retrieval-augmented generation. But they solve a different problem. Vector databases answer the question "what data is conceptually similar to this query?" while the Markdown CRM tries to answer "how can an agent read and understand a customer's full history without translation overhead?" These are complementary concerns, not competing ones.
Obsidian, the Markdown-based note-taking application built on local files with bidirectional linking, has quietly become a reference architecture for AI researchers and developers who want LLM-accessible knowledge bases. The fact that a nontrivial developer community has already adopted Markdown-on-disk as a knowledge management pattern suggests dmonterocrespo is not alone in this instinct. The gap between Obsidian's personal knowledge management use case and a multi-user CRM with agent writes is real, but the underlying philosophy translates.
What makes this proposal distinct from both vectors and Obsidian clones is the explicit design for an agent as the backend operator rather than a retrieval assistant. Most current AI tools treat the agent as a layer on top of existing data infrastructure. This proposal inverts that relationship, treating the agent as the system itself and the file format as the interface. That inversion is either the next logical step in AI-native software design or a path back to the limitations that killed flat-file systems the first time. The honest answer is probably both, depending on scale.
FAQ
Q: What is a Markdown-based CRM and how does it work? A: A Markdown-based CRM stores all customer relationship data, including contacts, email threads, and meeting notes, as plain text files with a .md extension. Structured metadata like contact IDs and status fields live in YAML headers at the top of each file. Instead of querying a database, an AI agent reads these files directly to understand customer history and relationships.
Q: Why would an LLM agent benefit from reading Markdown files instead of a database? A: Large language models are trained heavily on Markdown-formatted text, so they parse it accurately without requiring special tooling or query language translation. Reading a plain text file is simpler for an agent than constructing SQL queries, interpreting table schemas, or navigating database abstraction layers, which reduces complexity in the agent's reasoning pipeline.
Q: What are the biggest risks of using a flat-file system for CRM data? A: The main risks are concurrency failures when two processes write the same file at once, no built-in protection against data corruption or orphaned references, and slow search performance as the number of files grows. Relational databases handle all three of these problems natively through transactions, foreign keys, and indexes. A Markdown system requires custom solutions for each.
The conversation dmonterocrespo started on Hacker News is early and small, but it points at a real architectural question that AI-native software teams will have to answer explicitly over the next 12 to 24 months: who is the actual user of your data layer, a human or an agent, and does your storage format reflect that answer. Subscribe to the AI Agents Daily weekly newsletter for daily updates on AI agents, tools, and automation.
Get stories like this daily
Free briefing. Curated from 50+ sources. 5-minute read every morning.




