How on-device live conversation translation works
One real-time translation loop powered by on-device Voice AI and Language SDKs
On-device live conversation translation combines Cheetah Streaming Speech-to-Text, Zebra Translate, and Orca Streaming Text-to-Speech in a single local pipeline that runs in a loop. Cheetah converts Speaker 1's speech into text. Zebra translates it to Speaker 2's language. Orca reads the translated text aloud. When Speaker 2 hears the translation and responds, the same loop runs in reverse — Cheetah transcribes Speaker 2's response, Zebra translates it back, and Orca speaks it in Speaker 1's language. Live conversation translation removes the language barrier, but only if the delay between speaker turns is minimal enough to preserve natural conversation flow. The latency introduced by cloud translation APIs — typically hundreds of milliseconds per round-trip — compounds across every turn and breaks that flow. Lightweight on-device voice and language SDKs eliminate the round-trip entirely, which is why on-device processing is critical for real-world adoption.