Cheetah Streaming Speech-to-Text

Real-time transcription for the conversational AI era

Lightweight, on-device streaming speech-to-text that transcribes naturally spoken language, with the speed, privacy, and control real-time applications demand.

>
Press the button
to start transcribing with Cheetah
What is Cheetah Streaming Speech-to-Text?

Cheetah Streaming Speech-to-Text is on-device transcription software that automatically transcribes voice data in real time without network delay or compromising accuracy. Cheetah Streaming Speech-to-Text processes voice data locally, enabling live transcription on-device, mobile, web browsers, on-premise, or cloud.

Get started with just a few lines of code

1o = pvcheetah.create(access_key)
2
3partial_transcript, is_endpoint =
4 o.process(get_next_audio_frame())
1const o = new Cheetah(accessKey)
2
3const [partialTranscript, isEndpoint] =
4 o.process(audioFrame);
1Cheetah o = new Cheetah.Builder()
2 .setAccessKey(accessKey)
3 .setModelPath(modelPath)
4 .build(appContext);
5
6CheetahTranscript partialResult =
7 o.process(getNextAudioFrame());
1let cheetah = Cheetah(
2 accessKey: accessKey,
3 modelPath: modelPath)
4
5let partialTranscript, isEndpoint =
6 try cheetah.process(
7 getNextAudioFrame())
1Cheetah o = new Cheetah.Builder()
2 .setAccessKey(accessKey)
3 .build();
4
5CheetahTranscript r =
6 o.process(getNextAudioFrame());
1Cheetah o =
2 Cheetah.Create(accessKey);
3
4CheetahTranscript partialResult =
5 o.Process(GetNextAudioFrame());
1const {
2 result,
3 isLoaded,
4 isListening,
5 error,
6 init,
7 start,
8 stop,
9 release,
10} = useCheetah();
11
12await init(
13 accessKey,
14 model
15);
16
17await start();
18await stop();
19
20useEffect(() => {
21 if (result !== null) {
22 // Handle transcript
23 }
24}, [result])
1_cheetah = await Cheetah.create(
2 accessKey,
3 modelPath);
4
5CheetahTranscript partialResult =
6 await _cheetah.process(
7 getAudioFrame());
1const cheetah = await Cheetah.create(
2 accessKey,
3 modelPath)
4
5const partialResult =
6 await cheetah.process(
7 getAudioFrame())
1pv_cheetah_t *cheetah = NULL;
2pv_cheetah_init(
3 access_key,
4 model_file_path,
5 endpoint_duration_sec,
6 enable_automatic_punctuation,
7 &cheetah);
8
9const int16_t *pcm = get_next_audio_frame();
10
11char *partial_transcript = NULL;
12bool is_endpoint = false;
13const pv_status_t status = pv_cheetah_process(
14 cheetah,
15 pcm,
16 &partial_transcript,
17 &is_endpoint);
1const cheetah =
2 await CheetahWorker.create(
3 accessKey,
4 (cheetahTranscript) => {
5 // callback
6 },
7 {
8 base64: cheetahParams,
9 // or
10 publicPath: modelPath,
11 }
12 );
13
14WebVoiceProcessor.subscribe(cheetah);
It felt like we tried every available solution on the market, and only Picovoice provided the stability, processing speed, excellent accuracy out of the box, and flexible training capabilities that we required. They are truly on the cutting edge of voice technology.
Jocelyn Kang
CTO,
Knowtex
Why Cheetah Streaming Speech-to-Text is the best transcription engine for real-time applications

Cloud-based transcription APIs introduce latency by sending audio to external servers, making them vulnerable to network delays, throttling, and outages. On-device engines with large model sizes or without true streaming architectures can also lag, adding compute-related latency.

Cheetah Streaming Speech-to-Text avoids both pitfalls: it’s a lightweight, on-device streaming engine that processes audio instantly at the point of capture, delivering real-time transcription with guaranteed response time.

Why choose Cheetah Streaming Speech-to-Text over other Transcription Engines?

Get started with
Cheetah Streaming Speech-to-Text
Does Cheetah Streaming Speech-to-Text sound too good to be true? See for yourself!
Start Free
  • Pre-trained transcription models
  • Custom vocabulary
  • Keyword boosting
  • Intuitive SDKs
  • Trucasing and Punctuation
  • English, French, German, Italian, Portuguese, and Spanish

Frequently asked questions

What are the use cases and applications of Streaming Speech-to-Text?
  1. Conversational AI use cases that demand real-time performance
    • Customer service automation: AI agents that can handle complex support calls with natural interruptions and follow-up questions.
    • Voice assistants and AI companions: Responsive voice interfaces that feel natural without awkward delays.
    • Meeting and conference AI: Transcription systems that capture rapid exchanges, enabling live captions, automated note-taking, and instant action item extraction.
  2. Mission-critical applications where delays aren't acceptable
    • Healthcare and medical dictation: Medical transcription that keeps pace with physician's speech during patient consultations. Delays disrupt clinical workflows and reduce the quality of patient care. Plus, voice data is never shared with 3rd parties. Thanks to on-device speech processing, it doesn't even leave physicians devices.
    • Financial services and trading: Voice commands for trading platforms must execute instantly. A few hundred milliseconds can mean the difference between profit and loss. Thanks to on-device speech processing, all audio processing happens within secured financial networks.
    • Manufacturing and industrial control: Hands-free voice control for quality assurance, inventory management, and equipment operation. Workers don’t wait for network delays when operating machinery or conducting safety inspections.
    • Productivity and collaboration tools:
      • Dictation: Voice-driven workflows where employees can dictate code, comments, and documentation.
      • Documentation and content creation: Transcription of discussions, meetings, and brainstorming sessions into structured documentation. Immediate transcription enables real-time editing and collaboration.
What is a real-time transcription engine?

Real-time transcription, also known as real-time speech-to-text, streaming transcription, streaming speech-to-text, live transcription, or live speech-to-text, refers to the technology and tools that convert audio streams to text synchronously with audio generation.

How does on-device real-time transcription differ from cloud-based real-time transcription APIs?

Cloud-based real-time transcription APIs record and send voice data to vendor servers where the transcription engine resides to convert voice into text. On-device real-time transcription brings the transcription engine where voice data is, offering guaranteed real-time experience by eliminating unpredictable delays.

What are the benefits of on-device real-time transcription over cloud-based real-time transcription?

Cloud-based real-time transcription converts voice data into text with delay due to network latency and connectivity issues. On-device real-time transcription eliminates these inherent latency and reliability limitations by processing voice data on the device without sending it to a 3rd party cloud. For time-sensitive applications, such as agent assistance, medical dictation, or meeting transcription, delays affect the experience and productivity. A recent study on delays in virtual communication depicts internet lag as a wrench in mental gears.

Can I use Cheetah Streaming Speech-to-Text in the cloud?

Yes. You can run Cheetah Streaming Speech-to-Text in the cloud, whether private, public, or hybrid. Picovoice on-device voice recognition technology allows enterprises to decide where to run the transcription engine instead of making the Picovoice cloud mandatory for voice processing.

What are the key metrics for evaluating real-time transcription engines?

Key metrics for evaluating real-time transcription engines are latency, reliability & resiliency, accuracy, availability of features, the total cost of ownership, and data privacy and governance. Each metric may have different weights in different projects of the same company.

Which platforms does Cheetah Streaming Speech-to-Text support?
Can Cheetah handle noisy environments and accents?

Yes. Cheetah Streaming Speech-to-Text is trained on diverse audio conditions including background noise, multiple speakers, and various accents. For specialized environments or specific accent patterns, custom training is available for Enterprise Plan customers via Picovoice Consulting to optimize performance for specific use cases.

How does deployment work for high-availability applications?

Cheetah Streaming Speech-to-Text runs entirely within your infrastructure, eliminating external dependencies that could cause outages. You can deploy across multiple instances, regions, or availability zones using standard load balancing and failover strategies.

Can I customize Cheetah Streaming Speech-to-Text for domain-specific vocabulary?

Yes, you can train custom speech-to-text models on Picovoice Console to optimize Cheetah Streaming Speech-to-Text for specific industries, terminologies, or use cases. This includes medical terminology, legal language, technical jargon, or company-specific vocabulary.

What happens if I need to process multiple languages in the same application?

Cheetah Streaming Speech-to-Text can be configured to handle multiple languages through separate instances or language-specific models. Enterprise Plan customers can work with Picovoice Consulting to get custom algorithms trained to get multilingual models or automatic language detection capabilities.

Which languages does Cheetah Streaming Speech-to-Text support?

Cheetah Streaming Speech-to-Text currently supports English, French, German, Italian, Portuguese, and Spanish.

What should I do to request Cheetah Streaming Speech-to-Text to support other languages?

Reach out to Picovoice Sales to tell us about your commercial endeavor.

How do I get technical support for Cheetah Streaming Speech-to-Text?

Picovoice docs, blog, Medium posts, and GitHub are great resources to learn about voice AI, Picovoice technology, and how to start building transcription products. Enterprise customers get dedicated support specific to their applications from Picovoice Product & Engineering teams. While Picovoice customers reach out to their contacts, prospects can also purchase Enterprise Support before committing to any paid plan.

How can I get informed about updates and upgrades?

Version changes appear in the and LinkedIn. Subscribing to GitHub is the best way to get notified of patch releases. If you enjoy building with Cheetah Streaming Speech-to-Text, show it by giving a GitHub star!