
Microsoft Unveils Maia 200 Inference Chip to Cut AI Serving Costs — Campus Technology
Microsoft Unveils Maia 200 Inference Chip to Cut AI Serving Costs
Microsoft recently introduced Maia 200, a custom-built accelerator aimed at lowering the cost of running artificial intelligence workloads at cloud scale, as major providers look to curb soaring inference expenses and lessen dependence on Nvidia graphics processors.
The chip is designed specifically for inference, the phase in which trained models produce text, images and other outputs. As AI services transition from pilots to everyday production use, the cost of generating tokens has become an increasingly significant share of overall spending. Microsoft said Maia 200 is intended to address those economics through lower-precision compute, high-bandwidth memory and networking optimized for large AI clusters.
“Today, we’re proud to introduce Maia 200, a breakthrough inference accelerator engineered to dramatically improve the economics of AI token generation,” Scott Guthrie, Microsoft’s executive vice president for Cloud and AI, wrote in a blog post announcing the chip.
Maia 200 is built on TSMC’s 3-nanometer process and is designed around lower-precision math used in modern inference workloads. Microsoft said each chip contains more than 140 billion transistors and delivers more than 10 petaFLOPS in 4-bit precision (FP4), and more than 5 petaFLOPS in 8-bit precision (FP8), within a 750-watt thermal envelope. The chip includes 216 gigabytes of HBM3e memory with 7 terabytes per second of bandwidth, 272 megabytes of on-chip SRAM, and data movement engines to reduce bottlenecks that can limit real-world throughput even when raw compute is high.
“Crucially, FLOPS aren’t the only ingredient for faster AI,” Guthrie wrote. “Feeding data is equally important.”
The launch comes as Microsoft, Google, and Amazon invest heavily in custom silicon alongside Nvidia GPUs. Google’s TPU family and Amazon’s Trainium chips offer alternatives within their cloud services, and Microsoft has long signaled that it wants greater control over costs and capacity in its AI infrastructure. Maia 200 follows Maia 100, introduced in 2023, and the company is positioning the new chip as an inference-focused workhorse for its AI products.
Microsoft said Maia 200 will support multiple models, including “the latest GPT-5.2 models from OpenAI,” and will be used to deliver a performance-per-dollar advantage to Microsoft Foundry and Microsoft 365 Copilot. The company also said its Microsoft Superintelligence team plans to use Maia 200 for synthetic data generation and reinforcement learning as it develops in-house models. Guthrie wrote that, for synthetic data pipelines, Maia 200’s design can accelerate the generation and filtering of “high-quality, domain-specific data.”
The chip is also an effort to compete on headline performance with hyperscaler rivals. Guthrie wrote that Maia 200 is “the most performant, first-party silicon from any hyperscaler,” adding that it offers “three times the FP4 performance of the third generation Amazon Trainium” and “FP8 performance above Google’s seventh generation TPU.” Reuters-style comparisons often hinge on vendor-provided benchmarks, and Microsoft did not, in its post, provide full test configurations for those claims.
Source link



