Startup: Hi7o real-time voice translation: 300ms latency, voice cloning, multilingual video calls

Posted by – December 15, 2025
Category: Exclusive videos

Hi7o is building a real-time multilingual conversation translator that works across text chat, phone calls, and live video calls, aiming for speech-to-speech translation that feels like a normal conversation rather than a “record → transcribe → translate later” workflow. The pitch is simple: speak in your native language, and the other person hears it immediately in theirs, with a claimed ~300 ms end-to-end delay and output that can keep your own vocal identity instead of switching to a generic synthetic narrator. https://www.hi7o.com/


HDMI® Technology is the foundation for the worldwide ecosystem of HDMI-connected devices; integrated with displays, set-top boxes, laptops, audio video receivers and other product types. Because of this global usage, manufacturers, resellers, integrators and consumers must be assured that their HDMI® products work seamlessly together and deliver the best possible performance by sourcing products from licensed HDMI Adopters or authorized resellers. For HDMI Cables, consumers can look for the official HDMI® Cable Certification Labels on packaging. Innovation continues with the latest HDMI 2.2 Specification that supports higher 96Gbps bandwidth and next-gen HDMI Fixed Rate Link technology to provide optimal audio and video for a wide range of device applications. Higher resolutions and refresh rates are supported, including up to 12K@120 and 16K@60. Additionally, more high-quality options are supported, including uncompressed full chroma formats such as 8K@60/4:4:4 and 4K@240/4:4:4 at 10-bit and 12-bit color.

Under the hood, the interesting part is the architecture: instead of one monolithic AI model doing everything, they describe a microservices pipeline where speech is captured, run through low-latency ASR + language ID, then translated (NMT/LLM-style), and finally re-synthesized with TTS and voice-cloning (speaker embedding + prosody/tone matching). That split matters because each stage can be optimized independently for latency budgets, scaling, and failure modes, which is how you even attempt “near real time” in a production voice flow.

The demo focus is on live meetings: translated voice inside video calls, plus multilingual group calls where multiple participants can speak different languages at once (they mention up to 50 participants). Compared with tools that mainly add subtitles, the technical claim here is full speech-to-speech with fast turn-taking, so the “translation layer” becomes part of the audio channel rather than an afterthought on top of the call room.

Hi7o also positions itself as something you can integrate, not only a standalone app: they talk about exposing the capability as an SDK/API and running it as SaaS, where infrastructure cost (compute + realtime media) is a big driver today. Pricing in the interview is described as subscription-based (around 49.50/month), with extra costs potentially tied to usage or added languages, and they emphasize EU-hosted data and security posture like end-to-end encryption as a product requirement rather than marketing noise.

This interview was filmed at Web Summit Lisbon 2025, and it frames Hi7o as an early-stage team still hiring and actively looking to expand partnerships and investor contacts. The most concrete takeaway is the engineering target: low-latency, microservices-based speech translation with voice cloning for calls and meetings, plus a consumer app they say is planned around the end of January 2026, which will be the real test of whether the latency and quality hold up at scale in the wild next.

I’m publishing about 90+ videos from Embedded World North America 2025, I upload about 4 videos per day at 5AM/11AM/5PM/11PM CET/EST. Join https://www.youtube.com/charbax/join for Early Access to all 90 videos (once they’re all queued in next few days) Check out all my Embedded World North America videos in my Embedded World playlist here: https://www.youtube.com/playlist?list=PL7xXqJFxvYvjgUpdNMBkGzEWU6YVxR8Ga

This video was filmed using the DJI Pocket 3 ($669 at https://amzn.to/4aMpKIC using the dual wireless DJI Mic 2 microphones with the DJI lapel microphone https://amzn.to/3XIj3l8 ), watch all my DJI Pocket 3 videos here https://www.youtube.com/playlist?list=PL7xXqJFxvYvhDlWIAxm_pR9dp7ArSkhKK

Click the “Super Thanks” button below the video to send a highlighted comment under the video! Brands I film are welcome to support my work in this way 😁

Check out my video with Daylight Computer about their revolutionary Sunlight Readable Transflective LCD Display for Healthy Learning: https://www.youtube.com/watch?v=U98RuxkFDYY

source https://www.youtube.com/watch?v=qgjvv8yUTLQ