Question 1

What is live captioning?

Accepted Answer

Live captioning is the real-time conversion of spoken audio into on-screen text, generated continuously as someone speaks. Unlike offline transcription, which produces a finished transcript after the audio ends, live captioning streams partial captions within milliseconds of each word being spoken. Live captioning is used in broadcast TV, video conferencing, accessibility tools, in-vehicle assistants, and customer support apps.

Question 2

What is live translation captioning?

Accepted Answer

Live translation captioning combines live captioning with machine translation: a speaker's words are transcribed in their language and immediately translated into one or more target languages, with both the source and translated captions appearing on screen in real time. This recipe builds exactly that — the streaming STT engine produces source-language captions, and the translation engine converts them into the target language as the speaker continues.

Question 3

What is CART captioning?

Accepted Answer

CART stands for Communication Access Realtime Translation. It is a professional live captioning service for deaf and hard-of-hearing individuals, commonly used in legal, educational, and medical settings. Automated CART replaces the human stenographer with a speech recognition engine.

Question 4

How is this different from Microsoft Live Captions or Apple Live Captions?

Accepted Answer

Microsoft Live Captions and Apple Live Captions are excellent on-device features for Windows/Copilot+ PCs and Apple devices respectively, but they ship as OS-level features, not licensable SDKs. If you are building a product that needs to run on Android, embedded hardware, or in a web browser, those options do not apply. Picovoice exposes the same on-device capability as a cross-platform SDK that any developer or OEM can integrate.

Question 5

Does the captioning work offline?

Accepted Answer

Yes. Cheetah Streaming Speech-to-Text and Zebra Translate both run 100% on-device. Audio is locally processed and never sent to any server.

Question 6

What languages are supported?

Accepted Answer

Cheetah Streaming Speech-to-Text currently supports English, French, German, Italian, Portuguese, and Spanish for speech-to-text, and Zebra Translate supports a wide set of language pairs across English, French, German, Korean, Japanese, Italian, Spanish, and Portuguese. For enterprises with specific needs, Picovoice offers custom model training for language pairs not currently in the standard catalog. For the up-to-date language list, see the Cheetah Streaming Speech-to-Text and Zebra Translate product pages.

Question 7

Can I customize the transcription with industry vocabulary?

Accepted Answer

Yes. Cheetah supports custom vocabulary and Orca supports custom pronunciation — add brand names, jargon, technical terms, and proper nouns that matter for your domain.

Question 8

How does this compare to running Whisper for live captioning?

Accepted Answer

Whisper is excellent for offline transcription, but was not designed for streaming. Whisper.cpp's streaming mode emits words at roughly 1.2–2.0 second latency vs Cheetah's 590 ms, uses substantially more compute, and ships base models in the 70–290 MB range vs Cheetah's 34 MB. For live captioning under tight latency and memory budgets, purpose-built Cheetah is a better choice.

Question 9

How can I get technical support for the live captioning and translation recipe?

Accepted Answer

Visit GitHub pico-cookbook/live-captioning-and-translation, where you can find the open-source demo code and create an issue for demo-related technical questions.

Live captioning and translation, running entirely on-device

On-device Voice AI and Language SDKs in a single pipeline

Lowest latency. Lowest compute. No accuracy tradeoff.

120 words per second. Opus-level accuracy. Zero network latency.

From broadcasts to cross-platform CART captioning

Live captions and translation for broadcasts

Cross-platform CART captioning

Build an on-device live captioning and translation app in 3 steps

Prerequisites

Usage

Create a virtual environment

Activate the virtual environment

Install dependencies

Download the Required Models

Run the captioning pipeline

More recipes from picoCookbook

FAQ