LLMSaturday, April 18, 2026·8 min read

Meshcore: Architecture for a Decentralized P2P LLM Inference Network

AI Agents Daily

Curated by AI Agents Daily team · Source: HN LLM

Meshcore: Architecture for a Decentralized P2P LLM Inference Network

Why This Matters

A project called Meshcore is proposing a peer-to-peer network architecture that lets people pool their idle computing hardware to run large language model inference without relying on centralized cloud providers. If it works at scale, it could cut AI inference costs dramatically ...

A Hacker News submission posted by user elyawhoo points to Meshcore, a technical architecture proposal for running large language model inference across a decentralized peer-to-peer network. The project originates from Block, the financial technology company founded by Jack Dorsey, with development credited to Block engineer Michael Neale. The core idea is straightforward but ambitious: instead of routing every AI inference request through a hyperscale data center, Meshcore distributes that workload across a mesh of consumer and enterprise hardware that is otherwise sitting idle.

Why This Matters

The economics of running frontier AI models are genuinely broken for most organizations right now. GPU compute on commercial cloud platforms runs between $10 and $30 per hour, and that cost compounds fast at enterprise inference volumes. Meshcore is betting that pooled idle hardware, coordinated through a well-designed P2P protocol, can undercut those prices while also solving data residency problems that block entire industries from adopting cloud AI in the first place. The project's MIT license and broad model support, covering Llama, DeepSeek, Qwen3, and GLM, signal a serious attempt to build infrastructure rather than a demo.

Stay ahead in AI agents

Daily briefing from 50+ sources. Free, 5-minute read.

The Full Story

Running a 100-billion-parameter language model is not a casual computing task. You need a GPU with more than 80 gigabytes of high-bandwidth memory, multi-terabit interconnects like NVIDIA NVLink or InfiniBand, and specialized runtime kernels such as FlashAttention implementations. Today, only a handful of organizations have that hardware, and everyone else rents access from cloud giants at prices that make sustained inference unaffordable for smaller teams. Meshcore's architects at Block are proposing a way to change who holds that power.

The architecture is built on two layers. The first layer handles network coordination: routing inference requests to available nodes, managing consensus on results, and balancing load across machines with wildly different specs. The second layer handles the actual model execution, running on whatever hardware a given node contributes to the network. Keeping these two concerns separate is what lets the system accommodate heterogeneous devices, from a developer's gaming PC to a company's underused server rack, without forcing everyone to run identical hardware configurations.

Model compatibility is broad by design. Meshcore supports Qwen3, DeepSeek, GLM, and the Llama family of open-source models. That list covers most of the serious open-source inference use cases organizations are running today. Publishing the project under the MIT license means anyone can audit the code, fork it, or deploy their own instance without negotiating a commercial license, which is exactly the kind of openness that builds trust in infrastructure software.

The economic logic behind decentralized inference is compelling. Commercial cloud GPU nodes cost between $10 and $30 per hour. A distributed network that aggregates idle hardware from participants who get compensated or receive priority access in return creates a marketplace where supply comes from resources that would otherwise generate zero value. It resembles distributed rendering networks or, more loosely, the resource-sharing model that made early peer-to-peer file networks so efficient, except here the product is AI computation rather than file transfers.

Jack Dorsey's involvement through Block is worth taking seriously. Block has deep experience building distributed financial infrastructure, and that engineering culture maps reasonably well onto the coordination problems Meshcore needs to solve. Getting payment, incentive, and routing mechanisms right in a distributed network is hard, and Block has done versions of that work before at scale. The project has not yet published detailed benchmarks or a mainnet timeline, but the combination of a credible team and an MIT-licensed codebase gives this more substance than most whitepaper-stage proposals.

Key Details

GPU compute on commercial cloud platforms costs between $10 and $30 per hour per node
Meshcore supports 4 major open-source model families: GLM, Qwen3, DeepSeek, and Llama
The project was developed at Block, the company founded by Jack Dorsey
Lead developer is Block engineer Michael Neale
Models with 100 billion or more parameters require GPUs with more than 80 gigabytes of high-bandwidth memory
The codebase is published under the MIT open-source license
The Hacker News post was submitted by user elyawhoo and received 1 comment noting the absence of a direct project link

What's Next

The immediate test for Meshcore is whether the team publishes a working prototype or detailed technical specification that lets the developer community stress-test the architecture's consensus and scheduling claims. Watch for a GitHub repository with actual inference benchmarks comparing distributed node performance against single-node cloud baselines, because that data will determine whether the economic case holds up under real-world latency and throughput conditions. Regulatory interest in data residency for AI inference is growing in the EU and Southeast Asia, and Meshcore's jurisdiction-aware routing capability could accelerate enterprise adoption if the team targets those compliance use cases explicitly.

How This Compares

The closest direct comparison is Petals, the open-source project from the BigScience community that pioneered running BLOOM and later Llama models across volunteer hardware using a swarm inference approach. Petals proved the concept works but never solved the incentive layer: volunteers have no structured reason to keep their nodes online, which makes the network unreliable for production workloads. Meshcore appears to be addressing that gap directly by designing compensation mechanisms into the architecture from the start, which is where Petals stalled.

Render Network and Akash Network have built functional decentralized compute marketplaces, but neither was designed specifically for LLM inference. They treat GPU compute as a generic commodity, which creates inefficiencies when workloads require specific memory configurations and low-latency inter-node communication that transformer inference demands. Meshcore's dual-layer architecture, purpose-built for language model execution, is a more targeted solution than adapting a general compute marketplace to AI workloads.

The blockchain-based AI inference projects, including Bittensor's subnet model and the inference-focused layer built by io.net, have attracted significant capital but introduced token economics that add complexity and speculative risk to what should be an infrastructure-grade service. Meshcore's MIT license and apparent absence of a native token, at least in the current proposal, suggest a more pragmatic approach that enterprise buyers will find easier to evaluate against their existing vendor standards. The AI tools category is filling up with distributed inference experiments, but most are still closer to research projects than production-ready infrastructure. Meshcore has the right team lineage to be different, but it has to ship.

FAQ

Q: What does decentralized LLM inference actually mean for developers? A: Instead of sending your AI model requests to a company like OpenAI or Google and paying their rates, decentralized inference lets your request get processed by a network of regular computers owned by different people or organizations. Developers get the same model output but potentially at lower cost and without depending on a single provider staying online or keeping their prices stable.

Q: Is Meshcore safe to use for sensitive business data? A: The architecture includes cryptographic guarantees designed to protect data in transit and validate inference results, but the project has not yet published a formal security audit. Any organization with strict data handling requirements should wait for independent security review before routing confidential data through a Meshcore network, and should evaluate the jurisdiction-aware routing features carefully against their specific compliance obligations.

Q: How is Meshcore different from just renting a cheap GPU server? A: Renting a single GPU server still means you are dependent on one machine with one provider, and you still pay for idle time. Meshcore pools many machines together so the network can route requests to whichever nodes are available and cheapest at any given moment, similar to how a content delivery network routes web traffic. The goal is better reliability and lower average cost through aggregated supply.

Meshcore is one of the more technically credible proposals in the distributed inference space to appear in 2025, and Block's engineering pedigree gives it a real chance of graduating from architecture document to working network. The open-source community will now do what it does best: find the holes in the design and either patch them or prove the approach unworkable. Subscribe to the AI Agents Daily weekly newsletter for daily updates on AI agents, tools, and automation.

Our Take

This story matters because it signals a shift in how AI agents are being adopted across the industry. We are tracking this development closely and will report on follow-up impacts as they emerge.

Post Share

Get stories like this daily

Free briefing. Curated from 50+ sources. 5-minute read every morning.

Meshcore: Architecture for a Decentralized P2P LLM Inference Network

Why This Matters

The Full Story

Key Details

What's Next

How This Compares

FAQ

Get stories like this daily

More in LLM

Gemma-4-E2B's safety filters make it unusable for emergencies

Why doesn't any OSS tool treat llama.cpp as a first class citizen?

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it

Learn more — Guides