Show HN: Lmscan – Detect AI text and fingerprint which LLM wrote it (zero deps)
A developer on GitHub has released Lmscan, a zero-dependency Python tool that detects AI-generated text and attempts to identify which specific large language model wrote it. This matters because reliable, lightweight LLM fingerprinting has been a missing piece in the growing too...
A tool called Lmscan, published to GitHub by developer stef41, aims to do something most AI detection tools do not attempt: not just flag text as machine-generated, but name the specific LLM that produced it. According to the project's GitHub repository at stef41/lmscan, the tool operates with zero external dependencies, meaning developers can drop it into any Python project without pulling in heavy machine learning libraries or creating version conflicts. The repository had collected 6 stars and 1 fork at the time of writing, with 19 commits across 3 branches and 3 tagged releases.
Why This Matters
The ability to fingerprint which LLM generated a piece of text is a genuinely harder problem than simple detection, and the fact that someone has built a working approach in pure Python without dependencies is worth paying attention to. OpenAI launched its own detection tool and killed it because accuracy was too low, which should tell you something about how difficult this space is. If Lmscan can reliably distinguish between, say, GPT-4 output and Claude output at scale, it becomes a practical building block for academic integrity systems, newsroom verification workflows, and content trust pipelines. The zero-dependency constraint is a smart engineering bet that could drive adoption in environments where installing heavyweight libraries is simply not an option.
Daily briefing from 50+ sources. Free, 5-minute read.
The Full Story
Stef41 posted Lmscan to Hacker News under the "Show HN" category, a section reserved for developers sharing their own projects. The submission received 3 points at the time of documentation with no public comments, which is a modest start, though early-stage tools on Hacker News often gain traction through GitHub activity rather than comment threads.
The core technical premise behind Lmscan is that different language models leave detectable stylistic fingerprints in their outputs. These fingerprints arise from differences in training data, model architecture, fine-tuning approaches, and sampling behavior. Each model tends to favor certain phrase structures, vocabulary choices, and sentence rhythms in ways that are subtle to human readers but potentially measurable through statistical analysis. Lmscan's implementation, housed in the src/lmscan directory, includes what the commit history describes as "advanced algorithmic modules" added in version 0.4.0, covering syllable counting, Flesch-Kincaid readability scoring, and trigram analysis.
The Flesch-Kincaid and trigram components are particularly telling about the methodology. Readability scoring captures how complex the sentence structures are, which varies meaningfully across models. Trigram analysis looks at three-word sequences and their frequency distributions, a classic natural language processing technique that can surface probabilistic patterns without requiring a neural network. The test suite added in the most recent commit covers 101 edge cases across these modules, suggesting the developer is taking accuracy and correctness seriously.
The zero-dependency constraint shapes the entire architecture. Without access to NumPy, scikit-learn, or PyTorch, the developer built pure Python implementations of these algorithms from scratch. That is a harder road than importing a library, but it means the tool installs instantly in any Python environment and introduces no supply chain risk from third-party packages. The project also includes a Dockerfile, a Homebrew formula for macOS installation, and a SECURITY.md file, which are the markers of a project built with deployment and production use in mind rather than a quick experiment.
Perhaps the most ambitious piece of the project is an arXiv paper draft included in the repository's paper directory, committed on April 10, 2026. The paper describes the statistical approach to AI text detection that underpins the tool. Publishing methodology alongside code is a sign that stef41 wants the approach to be scrutinized and validated, which is the right posture for a detection tool where false positives carry real consequences.
The repository's CI setup includes an OpenSSF Scorecard workflow, a security scoring system maintained by the Open Source Security Foundation. That is an unusual addition for a small open-source project and signals that the developer is thinking about supply chain security from the start.
Key Details
- The repository had 6 stars, 1 fork, and 19 commits as of April 11, 2026.
- Version 0.4.0 added advanced algorithmic modules including Flesch-Kincaid readability scoring and trigram analysis.
- The test suite covers exactly 101 edge cases across syllable counting, readability metrics, trigram analysis, and CLI behavior.
- The project spans 3 branches and 3 tagged releases, indicating active iteration.
- An arXiv paper draft describing the statistical detection methodology was committed on April 10, 2026.
- The tool requires zero external Python dependencies, installable via Homebrew or Docker.
- The Hacker News submission received 3 points with 0 comments at the time of writing.
What's Next
Watch for the arXiv paper to go live, since public peer commentary will be the first real stress test of the statistical methodology. The 101-test suite suggests the developer is close to a stable release, and a version 1.0 with documented accuracy benchmarks across specific models like GPT-4, Claude 3, and Gemini 1.5 would give potential adopters the evidence they need to integrate it into real workflows. Institutional users in education and journalism will likely wait for those benchmarks before committing.
How This Compares
The most instructive comparison is OpenAI's own detection tool, which the company released and then shut down due to accuracy rates too low to be useful in practice. OpenAI had access to its own model's training data, architecture details, and output distributions, and still could not build a reliable detector. Lmscan is attempting the same problem from the outside, using only statistical signals in the text itself, which is a harder constraint but also the only approach available to anyone who is not a model developer.
A separate Hacker News project called "i typed it," built by a developer named Matt, takes a fundamentally different angle. Rather than analyzing finished text for AI signals, "i typed it" captures behavioral evidence of human authorship, essentially keystroke timing and typing patterns, to prove a person wrote something. That approach is strong as proof of human origin but useless for identifying what machine generated text that was never typed by a person. Lmscan and "i typed it" are complementary, not competing.
The academic side is moving in parallel. The ACM published a paper in 2024 titled "The Science of Detecting LLM-Generated Text" that received 50 points on Hacker News, reflecting genuine research community interest in the detection problem. Lmscan's included arXiv draft puts it in conversation with that body of work, which is unusual for a solo developer project and could accelerate credibility if the paper survives review.
Compared to commercial detection services like GPTZero and Turnitin's AI detection feature, Lmscan's open-source, zero-dependency design fills a different niche. Those services are black boxes with subscription pricing. Lmscan is auditable, free, and embeddable, which makes it far more useful to developers building AI tools and pipelines who need detection as a component rather than a standalone product.
FAQ
Q: What does "zero dependencies" mean for an AI detection tool? A: It means Lmscan runs using only Python's built-in standard library, with no need to install external packages like NumPy or scikit-learn. For developers, this makes integration simpler and faster, since there are no version conflicts to manage and the installation footprint stays small.
Q: Can Lmscan really tell which LLM wrote a piece of text? A: The tool attempts to fingerprint LLM authorship using statistical signals like readability scores and trigram frequency patterns, each of which varies across different models. Whether it achieves reliable accuracy across major models like GPT-4 and Claude is not yet confirmed by published benchmarks, and the arXiv paper draft in the repository should address that question when peer-reviewed.
Q: How is this different from tools like GPTZero or Turnitin AI detection? A: Those are commercial, closed-source services. Lmscan is open-source and embeddable directly into developer projects at no cost. Developers can read the source code, contribute improvements, and integrate detection into their own tools and platforms without depending on a third-party API or paying a subscription fee.
Lmscan is a small project right now, but the combination of a principled zero-dependency architecture, a formal paper draft, and a serious test suite puts it ahead of most "Show HN" detection experiments. Whether the fingerprinting accuracy holds up against real-world LLM output diversity will determine whether it stays a developer curiosity or becomes something institutions actually build on. Subscribe to the AI Agents Daily weekly newsletter for daily updates on AI agents, tools, and automation.
Get stories like this daily
Free briefing. Curated from 50+ sources. 5-minute read every morning.




