Mobilint ARIES and REGULUS edge AI, MLA400 LLM inference and multi-camera vision

Posted by – March 15, 2026
Category: Exclusive videos

Mobilint frames its edge AI story around efficiency rather than headline TOPS alone. In this booth conversation, the focus is on local inference, cost per watt, and practical deployment formats: USB devices, standalone edge boxes, low-profile PCIe cards, MXM modules, and SoC-class hardware for embedded designs. That fits Mobilint’s broader product stack around the ARIES NPU family, the REGULUS low-power SoC line, and the SDK qb software flow for model conversion and deployment. https://www.mobilint.com/

The demo is really about what edge AI looks like when it is treated as an appliance instead of a cloud extension. Mobilint shows multi-stream computer vision running fully offline, with real-time inference on several video feeds and no dependency on a datacenter link. That makes the pitch relevant for AI security, industrial monitoring, smart city analytics, and other latency-sensitive workloads where privacy, bandwidth, and predictable operating cost matter at the edge.

A big part of the discussion is about scaling from vision to LLM workloads. The speaker describes an M400-class configuration built from four accelerators, aimed at running multiple small language models concurrently and pushing into the roughly 35 to 36 billion parameter range with quantization. That lines up with Mobilint’s current direction: the MLA100 card is positioned around 80 TOPS with 16 GB LPDDR4X and 25 W TDP, while the upcoming MLA400 is presented as a quad-ARIES architecture for higher-throughput workstation and on-prem inference. In that context, the video is less about raw benchmark theater and more about usable local AI for mixed vision and language video.

What makes the booth interesting is the software angle behind the hardware. Mobilint keeps coming back to quantization, compiler tooling, runtime integration, and model adaptation, because edge NPUs live or die by how well they map real models rather than synthetic demos. Its SDK qb is built around framework support for PyTorch, TensorFlow, TFLite and ONNX, with optimization and Int8-oriented deployment aimed at preserving model accuracy while fitting tighter memory and power budgets. That is the practical layer that turns AI silicon into deployable embedded compute.

There is also a broader roadmap underneath the interview. Mobilint has recently been talking about both the ARIES and REGULUS NPU families, with REGULUS targeting compact on-device AI at about 10 TOPS under 3 W and support for 4K video pipelines, while products such as MLX-A1 package the accelerator into a more complete edge box. Seen from Embedded World 2026 in Nuremberg, the message is clear: Mobilint wants to compete where offline inference, multi-camera analytics, quantized LLMs, and power-aware embedded deployment matter more than brute-force datacenter silicon roadmap

source https://www.youtube.com/watch?v=ylvPT1Mlv_g