Show HN: Preseason – see which developer tools each LLM picks
A new platform called Preseason has launched to track which developer tools AI models actually recommend across thousands of prompts, spanning skill levels from beginners to expert engineers. This matters because LLM recommendations are quietly becoming one of the most powerful d...
According to the Preseason project page at preseason.ai, the platform was submitted to Hacker News as a "Show HN" post (item ID 47807152) with no named author credited in the byline. The tool tracks LLM tool preferences across thousands of prompts at multiple skill levels, building a dataset that shows which products AI models consistently push developers toward, regardless of whether those recommendations reflect genuine best-fit advice or training data bias.
Why This Matters
This is one of the most important transparency problems in developer tooling right now, and almost no one is talking about it seriously. When a developer asks Claude or ChatGPT to recommend a stack for their next project, they are essentially consulting a system whose preferences were baked in during training, with zero disclosure about why those preferences exist. LLM recommendations now influence millions of technical decisions daily, and Preseason is the first public attempt to surface what those preferences actually look like. If this data holds up at scale, it could expose self-reinforcing adoption cycles that have nothing to do with tool quality.
Daily briefing from 50+ sources. Free, 5-minute read.
The Full Story
Preseason describes itself as a tracker for "what agents want," which is a sharp way to frame the core question. The platform runs a wide variety of developer prompts through multiple large language models and records which specific tools each model recommends in response. The goal is to build a comparative picture of LLM tool preferences across skill levels, from developers just starting to code with AI assistance all the way to senior engineers tackling complex production architecture problems.
The prompts themselves are genuinely sophisticated. One example prompt asks models to design a production-grade AI revenue operations copilot that ingests CRM, billing, and product telemetry through APIs. The prompt specifies requirements for data models, operator feedback loops, prompt version observability, tool-call traces, latency monitoring, and evaluation pipelines measuring hallucination risk before releases go live. This is not a toy question, and it reveals something meaningful about which tools LLMs reach for when the complexity is real.
The results from that particular prompt are telling. LangSmith came out on top with 11.2% of recommendations, followed by LangChain at 8.1%, Cursor at 6.2%, and PostgreSQL at 6.1%. For a production project management platform prompt, PostgreSQL led with 11.2%, followed by Prisma at 9.9%, AWS S3 at 8.5%, and Auth0 at 5.9%. A documentation site prompt produced a different pattern entirely, with Algolia capturing 21.0% of recommendations, Vercel at 19.9%, and Tailwind CSS at 18.3%.
What those numbers reveal is that LLM recommendations are not random. The models show consistent, measurable preferences, and those preferences cluster around specific tools in ways that will matter enormously to tool vendors. A tool that earns even a 10% recommendation share across thousands of prompts in a given category is essentially getting free distribution to every developer who asks an LLM for advice, which is now a very large number of developers.
The platform structures prompts by difficulty level, labeling the examples above as "Advanced." This tiered approach is smart because tool preferences likely shift depending on whether the model thinks it is talking to a beginner or an expert. A beginner asking how to build a web app might get pointed toward simpler hosted solutions, while an advanced prompt might surface more specialized infrastructure tools. Tracking both levels creates a richer picture of how LLMs modulate recommendations based on perceived user sophistication.
Key Details
- Preseason launched via Hacker News "Show HN" submission item ID 47807152, receiving 1 point and 0 comments at the time of reporting.
- The AI revenue ops copilot prompt returned LangSmith as the top recommendation at 11.2% frequency across tested prompts.
- The project management platform prompt returned PostgreSQL as the top recommendation at 11.2%.
- The documentation site prompt returned Algolia as the top pick at 21.0%, with Vercel close behind at 19.9%.
- The platform tracks recommendations across at least 3 difficulty tiers, including an "Advanced" tier with production-grade architectural prompts.
- Auth0 appeared in the project management results at 5.9%, showing even identity and auth tools are getting measured.
What's Next
The most important thing to watch is whether Preseason publishes methodology details about which LLMs it is testing and how it normalizes recommendation counts across different model response formats. Without that transparency, the percentages are interesting but not independently verifiable. If the team publishes a breakdown by specific model, such as GPT-4o versus Claude 3.5 Sonnet versus Gemini 1.5 Pro, the data becomes genuinely useful for tool vendors trying to understand where their products are showing up in AI-assisted developer journeys.
How This Compares
The closest parallel to what Preseason is doing is how SEO analysts tracked Google search result rankings before tools like Ahrefs and SEMrush matured into full industries. Early ranking trackers were rough, but they surfaced something real: that organic visibility had measurable patterns worth understanding and, eventually, optimizing for. Preseason is doing the same thing for LLM recommendations, and if the platform scales its prompt library and model coverage, it could become the first serious benchmarking layer for what you might call "AI search visibility" in the developer tools space.
Compare this to the METR study from 2025 that measured AI's impact on experienced open-source developer productivity, which earned 775 points and 485 comments on Hacker News. That research focused on whether AI made developers faster. Preseason is asking a different but equally important question: faster toward which tools, and why? The productivity question and the recommendation bias question are deeply connected, but the second one has received almost no rigorous public treatment until now.
There are also competitive implications for LLM providers themselves. If analysis shows that OpenAI's models consistently recommend OpenAI-affiliated tooling, or that Google's Gemini steers developers toward Google Cloud products at rates that exceed those models' market share, that is a disclosure problem. Anthropic has published work on reducing model bias through constitutional AI approaches, but neither Anthropic nor OpenAI has specifically addressed tool recommendation bias. Preseason creates public pressure for them to do exactly that, which is the kind of accountability the industry needs from an AI tools transparency standpoint.
FAQ
Q: What does Preseason actually measure and track? A: Preseason runs developer prompts through large language models and records which specific tools each model recommends in its responses. By running thousands of prompts at different complexity levels, it builds a statistical picture of which tools each LLM tends to favor, expressed as a percentage of total recommendations within a given prompt category.
Q: Why do LLM tool recommendations matter for developers? A: Developers increasingly ask AI assistants like ChatGPT and Claude which tools to use for projects, and those recommendations carry real weight in technology decisions. If an LLM consistently recommends a particular database or deployment platform, that tool gains adoption not because it is objectively best but because it happened to dominate the model's training data. Understanding these patterns helps developers evaluate AI advice more critically, which is a core skill for anyone building with AI guides.
Q: Who should pay close attention to Preseason's data? A: Developer tool companies should treat this as a competitive intelligence signal. If a tool is not appearing in LLM recommendations for relevant prompts, it faces a discoverability problem that will only grow as AI-assisted development becomes the default workflow. Investors, product teams, and developer advocates at companies like Vercel, Prisma, and Algolia all have direct reasons to monitor these numbers closely.
Preseason is a small project right now, but it is measuring something that will only become more consequential as developers lean harder on AI for technical guidance. The team has the right instinct, and if they build out model-level breakdowns and a larger prompt library, this could become essential reading for anyone in the AI news and developer tools space. Subscribe to the AI Agents Daily weekly newsletter for daily updates on AI agents, tools, and automation.
Get stories like this daily
Free briefing. Curated from 50+ sources. 5-minute read every morning.




