Vrch Agentic VJ System: real-time audio-to-diffusion visuals on local GPU + MIDI control

Posted by – January 6, 2026
Category: Exclusive videos

Vrch’s Agentic VJ System is a compact, backpack-friendly VJ computer that listens to a live audio feed (and can also ingest camera input) to generate visuals in real time, so a performance doesn’t depend on pre-rendered clips or a huge media library. https://www.vrch.io/aivj

Under the hood it chains several local AI components: a custom live-audio analysis model extracts tempo/BPM plus higher-level cues like genre and mood, a language-based agent turns those parameters into scene prompts (the demo mentions an Alibaba Qwen family model), and a diffusion model renders the frames on a discrete GPU (shown with RTX 4080-class hardware, with an upgrade path to 4090-class) with low latency.

The operator experience is closer to a visual synthesizer than “AI asset generation”: a touchscreen UI shows the auto-analysis, you can override or steer the prompt on the fly, and control can come from DJ gear via MIDI/OSC, gamepads, or other controllers. The hardware is designed around a swappable GPU and a tight parallel pipeline, aiming for high on-device throughput without cloud dependency.

For bigger stages, the system can scale out by running multiple nodes and stitching outputs over WebSockets, so each box renders a tile of a larger canvas for higher resolution projection. In the interview they reference tests in London’s Outernet immersive venue, and note that early prototypes are being rented frequently, with most interest coming from outside China; the clip itself was filmed at CES Las Vegas 2026 in Eureka Park.

Today it’s an x86 Linux build packaged as a 3D-printed prototype, with a target mass-production price around USD $2,000–$3,000 depending on performance tier. They also acknowledge a future path toward ARM/NPU acceleration, but the current stack leans on NVIDIA CUDA, which keeps the real-time render path straightforward while the product roadmap takes shape.

I’m publishing about 100+ videos from CES 2026, I upload about 4 videos per day at 5AM/11AM/5PM/11PM CET/EST. Check out all my CES 2026 videos in my playlist here: https://www.youtube.com/playlist?list=PL7xXqJFxvYvjaMwKMgLb6ja_yZuano19e

This video was filmed using the DJI Pocket 3 ($669 at https://amzn.to/4aMpKIC using the dual wireless DJI Mic 2 microphones with the DJI lapel microphone https://amzn.to/3XIj3l8 ), watch all my DJI Pocket 3 videos here https://www.youtube.com/playlist?list=PL7xXqJFxvYvhDlWIAxm_pR9dp7ArSkhKK

Click the “Super Thanks” button below the video to send a highlighted comment under the video! Brands I film are welcome to support my work in this way 😁

Check out my video with Daylight Computer about their revolutionary Sunlight Readable Transflective LCD Display for Healthy Learning: https://www.youtube.com/watch?v=U98RuxkFDYY

source https://www.youtube.com/watch?v=PJtupyq1R6g