Microsoft Reveals Maia 200 Reasoning Chip to Cut AI Serving Expenses

  • By John K. Waters
  • 02/05/26

Microsoft recently introduced Maia 200, a customized accelerator aimed at decreasing the cost of running expert system work at cloud scale, as major providers want to curb skyrocketing inference expenditures and lessen reliance on Nvidia graphics processors.

The chip is created specifically for inference, the stage in which qualified designs produce text, images and other outputs. As AI services shift from pilots to daily production usage, the expense of creating tokens has actually become a progressively substantial share of general costs. Microsoft stated Maia 200 is planned to attend to those economics through lower-precision compute, high-bandwidth memory and networking enhanced for big AI clusters.

“Today, we’re happy to introduce Maia 200, a development inference accelerator engineered to drastically improve the economics of AI token generation,” Scott Guthrie, Microsoft’s executive vice president for Cloud and AI, composed in a blog post revealing the chip.

Maia 200 is built on TSMC’s 3-nanometer process and is designed around lower-precision mathematics used in contemporary reasoning workloads. Microsoft said each chip consists of more than 140 billion transistors and provides more than 10 petaFLOPS in 4-bit precision (FP4), and more than 5 petaFLOPS in 8-bit accuracy (FP8), within a 750-watt thermal envelope. The chip consists of 216 gigabytes of HBM3e memory with 7 terabytes per second of bandwidth, 272 megabytes of on-chip SRAM, and data movement engines to decrease traffic jams that can limit real-world throughput even when raw compute is high.

“Crucially, FLOPS aren’t the only component for faster AI,” Guthrie composed. “Feeding information is equally crucial.”

The launch comes as Microsoft, Google, and Amazon invest heavily in custom-made silicon alongside Nvidia GPUs. Google’s TPU household and Amazon’s Trainium chips provide alternatives within their cloud services, and Microsoft has long signified that it wants greater control over costs and capacity in its AI infrastructure. Maia 200 follows Maia 100, presented in 2023, and the company is positioning the brand-new chip as an inference-focused workhorse for its AI items.

Microsoft said Maia 200 will support numerous designs, consisting of “the latest GPT-5.2 models from OpenAI,” and will be utilized to deliver a performance-per-dollar benefit to Microsoft Foundry and Microsoft 365 Copilot. The company also said its Microsoft Superintelligence team prepares to use Maia 200 for synthetic data generation and reinforcement learning as it develops in-house designs. Guthrie wrote that, for synthetic data pipelines, Maia 200’s design can accelerate the generation and filtering of “top quality, domain-specific information.”

The chip is likewise an effort to complete on heading efficiency with hyperscaler rivals. Guthrie composed that Maia 200 is “the most performant, first-party silicon from any hyperscaler,” including that it uses “three times the FP4 performance of the third generation Amazon Trainium” and “FP8 performance above Google’s seventh generation TPU.” Reuters-style comparisons often hinge on vendor-provided criteria, and Microsoft did not, in its post, supply full test setups for those claims.

By admin