ResearchFriday, April 10, 2026·9 min read

Five AI Compute Architectures Every Engineer Should Know: CPUs, GPUs, TPUs, NPUs, and LPUs Compared

AI Agents Daily

Curated by AI Agents Daily team · Source: MarkTechPost

Five AI Compute Architectures Every Engineer Should Know: CPUs, GPUs, TPUs, NPUs, and LPUs Compared

Why This Matters

Modern AI systems run on five distinct processor types, each built for different jobs, and knowing which one to reach for can make or break an AI deployment. MarkTechPost breaks down CPUs, GPUs, TPUs, NPUs, and LPUs so engineers stop treating compute as a one-size-fits-all decisi...

According to MarkTechPost, the five compute architectures powering AI today are not interchangeable, and the differences between them have real consequences for performance, cost, and power draw. The piece offers a direct technical comparison of Central Processing Units, Graphics Processing Units, Tensor Processing Units, Neural Processing Units, and Liquid Processing Units, walking engineers through where each architecture excels and where it falls flat. No single author byline was identified in the scraped content, so credit goes to MarkTechPost for the breakdown.

Why This Matters

Most teams still default to GPUs because NVIDIA captured roughly 80 percent of the AI accelerator market as of 2024, and defaulting to the dominant option feels safe. But that instinct costs money. Running inference on a 700-watt H100 when an NPU drawing 1 to 2 watts can handle the task is an engineering mistake, not a conservative choice. The smarter engineers are already treating compute selection as a first-class design decision, and this breakdown gives them the vocabulary to make that case internally.

Stay ahead in AI agents

Daily briefing from 50+ sources. Free, 5-minute read.

The Full Story

The story of AI compute is really the story of specialization. In the early days of deep learning research, CPUs handled everything. They are general-purpose processors, typically ranging from 4 to 64 cores in modern configurations, and they are very good at sequential tasks where the next operation depends on the result of the previous one. That flexibility is also their weakness. Neural networks mostly involve repeating the same math across enormous matrices, and CPUs burn cycles on circuitry built for variety rather than repetition.

GPU adoption for AI work picked up in the late 2000s when researchers realized that graphics hardware, built for rendering millions of pixels simultaneously, mapped surprisingly well onto the parallel math of neural networks. NVIDIA saw the opportunity early, built CUDA as an accessible programming platform on top of GPU hardware, and effectively defined the market. A high-end NVIDIA H100 contains 16,896 CUDA cores designed to execute the same operation on many data elements at once. That parallelism delivers hundreds of teraflops for matrix multiplication, which is exactly what training a large neural network demands. The tradeoff is power consumption, ranging from 250 to 700 watts depending on the model, and relatively poor performance on sequential or branching code.

Google took a different approach entirely. Around 2016, the company developed Tensor Processing Units internally because it recognized that even GPUs carried overhead for workloads it did not need. TPUs are application-specific integrated circuits built exclusively for tensor operations, with no circuits wasted on graphics, general-purpose branch prediction, or arbitrary instruction handling. Google's own published research showed TPUs achieving 15 to 30 times better performance per watt compared to GPUs for production inference workloads. Google opened TPU access through Google Cloud Platform starting in 2017, but the architecture cannot run independently and requires a host CPU to manage orchestration. That dependency matters when designing systems.

NPUs represent the edge computing answer to the same problem. As AI started showing up in phones, cameras, and IoT sensors, the data center architectures became impractical. Apple, Qualcomm, and Intel each built proprietary NPU designs into their system-on-chip products. Apple's Neural Engine, integrated into recent iPhone processors, consumes just 1 to 2 watts while delivering genuine on-device inference for tasks like image recognition and natural language processing. Qualcomm took a similar path with its Hexagon processors. The goal is not raw throughput but continuous operation on battery power, which changes every design constraint.

The newest entry in this field is the Liquid Processing Unit, developed by Cerebras, a Silicon Valley semiconductor company founded in 2016. Cerebras spent years building a wafer-scale processor, the Wafer Scale Engine, that spans an entire silicon wafer with 850,000 cores connected by high-bandwidth internal links. Traditional chips send data between separate dies, which introduces latency and bandwidth bottlenecks. The Cerebras approach eliminates those inter-chip connections entirely. Cerebras released commercial LPU systems in the early 2020s. The catch is price, with systems currently carrying costs in the millions of dollars, which limits the audience to organizations running workloads large enough to justify that investment.

The practical takeaway is that no architecture dominates across all use cases. CPUs handle control logic and heterogeneous tasks. GPUs dominate training at scale. TPUs offer efficiency advantages for well-defined tensor workloads on Google's stack. NPUs enable AI on battery-powered devices. LPUs target the high end of large-model inference where latency and bandwidth are the binding constraints.

Key Details

CPUs typically range from 4 to 64 cores and consume 100 to 300 watts in high-performance configurations.
NVIDIA's H100 GPU contains 16,896 CUDA cores and draws between 250 and 700 watts depending on configuration.
Google introduced TPUs internally around 2016 and opened cloud access in 2017 through Google Cloud Platform.
Google's research showed TPUv5e accelerators achieving 15 to 30 times better performance per watt than GPUs for production inference.
Apple's Neural Engine consumes just 1 to 2 watts while handling on-device AI inference.
Cerebras was founded in 2016 and launched commercial LPU systems in the early 2020s.
The Cerebras Wafer Scale Engine integrates 850,000 cores on a single wafer.
NVIDIA held approximately 80 percent of the AI accelerator market as of 2024.

What's Next

The NPU space is the one to watch most closely over the next 12 to 18 months, as Qualcomm, Apple, and Intel each push to expand the range of models their edge chips can run efficiently. On the infrastructure side, Cerebras and competitors like Groq are actively targeting inference providers who need low-latency response times at scale, which will pressure NVIDIA's pricing on mid-tier inference deployments. Engineers building AI agents and real-time pipelines should be evaluating NPU and LPU options now before cloud GPU costs define their unit economics permanently.

How This Compares

This breakdown lands at a moment when the compute conversation has moved from the research lab into product decisions at mid-sized companies. Compare it to the surge in attention Groq received in 2023 and early 2024 for its Language Processing Unit, a different architecture also designed for low-latency inference. Groq's chips are not LPUs in the Cerebras sense but share the same philosophical motivation: GPU parallelism is overkill for inference, and the market needs chips optimized for generating tokens fast. Both represent a real challenge to NVIDIA's dominance, but neither has yet achieved the ecosystem depth that CUDA has built over 15 years.

The TPU comparison also deserves more credit than it usually gets in these discussions. Google's decision to build ASICs internally in 2016 looks prescient now, and the 15 to 30 times efficiency advantage over GPUs for specific workloads is a number that should make any CFO ask why their company is still renting H100s for inference at full price. The honest answer is ecosystem lock-in. CUDA's developer tooling, library support, and documentation have compounding advantages that new entrants cannot replicate quickly.

The NPU story is arguably the most underappreciated thread here. Apple's Neural Engine has been shipping at scale inside iPhone processors since 2017, and most developers still build mobile AI features that call cloud APIs rather than running locally. That gap between hardware capability and developer behavior represents a real opportunity, particularly for anyone building AI tools targeting mobile-first markets. The guides and tutorials ecosystem around on-device inference is still thin, which means the developer who gets comfortable with NPU deployment now will have a meaningful head start.

FAQ

Q: What is the difference between a GPU and a TPU? A: A GPU is a general-purpose parallel processor originally built for graphics and later adapted for AI training. A TPU is a custom chip Google built specifically for tensor math, stripping out everything a GPU can do that a neural network does not need. TPUs achieve 15 to 30 times better efficiency per watt for Google's inference workloads, but they only run on Google Cloud and require a host CPU.

Q: What is an NPU and do I need one? A: An NPU is a neural processing chip built into mobile and edge devices like iPhones and Android phones, designed to run AI inference locally without a cloud connection. Apple's Neural Engine draws just 1 to 2 watts. If you are building apps that need on-device AI, faster response, or offline operation, understanding NPUs matters. If you are training models at scale, a data center GPU is still the practical choice.

Q: What does Cerebras make and why is it expensive? A: Cerebras makes the Wafer Scale Engine, a processor that covers an entire silicon wafer with 850,000 cores. Traditional chips are small rectangles cut from a wafer, and multiple chips communicate across slow external connections. Cerebras eliminates those bottlenecks by keeping everything on one piece of silicon. The manufacturing complexity and scale make these systems cost millions of dollars, which is why adoption is currently limited to large AI labs and specialized infrastructure providers.

The AI compute market is fragmenting fast, and engineers who understand all five architectures will make better infrastructure decisions than those who default to whatever NVIDIA is selling. Stay on top of this space by following the latest AI news as new silicon announcements continue to reshape what is possible. Subscribe to the AI Agents Daily weekly newsletter for daily updates on AI agents, tools, and automation.

Our Take

This story matters because it signals a shift in how AI agents are being adopted across the industry. The research findings here could reshape how developers build agentic systems in the coming months.

Post Share

Get stories like this daily

Free briefing. Curated from 50+ sources. 5-minute read every morning.

Five AI Compute Architectures Every Engineer Should Know: CPUs, GPUs, TPUs, NPUs, and LPUs Compared

Why This Matters

The Full Story

Key Details

What's Next

How This Compares

FAQ

Get stories like this daily

More in Research

Google AI Research Proposes Vantage: An LLM-Based Protocol for Measuring Collaboration, Creativity, and Critical Thinking

A Hands-On Coding Tutorial for Microsoft VibeVoice Covering Speaker-Aware ASR, Real-Time TTS, and Speech-to-Speech Pipelines

Meta AI and KAUST Researchers Propose Neural Computers That Fold Computation, Memory, and I/O Into One Learned Model

Learn more — Guides