Deep X XM2 NPU: 80 TOPS Generative AI Accelerator at 5W

Deep X showcased its edge AI hardware and software solutions at Computex 2026, highlighting the next-generation XM2 silicon alongside its established system-on-chip and accelerator module portfolio. Head of Sales for Deep X Taiwan, Jack Horn, introduced the company’s Neural Processing Unit (NPU) technologies designed for low-power, high-performance edge computing applications. The company, headquartered in South Korea and founded by former Apple systems researcher Lokwon Kim, specializes in designing dedicated accelerators optimized for matrix operations to support machine learning workloads on-device.

—
HDMI® Technology is the foundation for the worldwide ecosystem of HDMI-connected devices; integrated with displays, set-top boxes, laptops, audio video receivers and other product types. Because of this global usage, manufacturers, resellers, integrators and consumers must be assured that their HDMI® products work seamlessly together and deliver the best possible performance by sourcing products from licensed HDMI Adopters or authorized resellers. For HDMI Cables, consumers can look for the official HDMI® Cable Certification Labels on packaging. Innovation continues with the latest HDMI 2.2 Specification that supports higher 96Gbps bandwidth and next-gen HDMI Fixed Rate Link technology to provide optimal audio and video for a wide range of device applications. Higher resolutions and refresh rates are supported, including up to 12K@120 and 16K@60. Additionally, more high-quality options are supported, including uncompressed full chroma formats such as 8K@60/4:4:4 and 4K@240/4:4:4 at 10-bit and 12-bit color.
—

The upcoming XM2 silicon, scheduled for fabrication tape-out around March 2027, is a five-watt NPU designed to deliver 80 TOPS of performance. This processor targets edge-based Large Language Models (LLMs) and Generative AI, supporting models with parameter counts ranging from 20 billion up to 100 billion utilizing Mixture of Experts (MoE) architectures. Operating within a power budget of five watts, the XM2 achieves token generation speeds of 20 to 30 tokens per second and supports memory configurations of up to 64 gigabytes to run large models directly on edge devices, including models such as Gemma.

In addition to the XM2, Deep X demonstrated its current hardware implementations, including the DX-H1 PCIe accelerator board and the DX-M1 series. The DX-H1 board integrates four DX-M1 chips, providing a combined performance of 100 TOPS with 16 gigabytes of system memory distributed as four gigabytes per chip. The M1M variant packages the memory on-chip with two gigabytes of integrated RAM. These modules are integrated into various industrial PCs and edge servers through partnerships with local IPC companies such as Advantech, AAEON, Avalue, and DFI, utilizing standard M.2 slots to expand hosting system capabilities.

A notable storage-focused integration includes a collaboration with Apacer, combining a four-terabyte PCIe SSD and two DX-M1 processors on a single M.2 board to deliver simultaneous storage and AI processing. The hardware ecosystem also features the DX-M1+ M.2 module, which comes in a quad-chip 100 TOPS configuration or a dual-chip 50 TOPS version that pairs two NPU chips with two Rockchip SoCs for video transcoding and decoding. Software demonstrations at the booth highlighted system compatibility across Linux, Windows, and Android, showcasing mobile OS integration for physical AI alongside medical diagnostic tablets built in partnership with Powertip.

Architecturally, the NPU design is optimized specifically for matrix operations, which are central to modern machine learning workloads, providing competitive performance-per-watt metrics. To support deployment, Deep X offers a custom software compiler and quantizer that converts models from FP32 precision to INT8 precision. This optimization process compresses the model weight representations to minimize processing latency and power consumption at the edge. The software tools allow developers to perform fine-tuning on quantized models to recover accuracy loss typically associated with precision reduction, ensuring high-accuracy inference during deployment.

source https://www.youtube.com/watch?v=iuHR0PNQ1TE

ARMdevices.net

Deep X XM2 NPU: 80 TOPS Generative AI Accelerator at 5W

Categories

Charbax's other sites