Question 1

What are the use cases and applications of Speech-to-Text?

Accepted Answer

Call Center QA & Compliance: Transcribe customer service recordings in bulk to assess agent performance, ensure compliance (HIPAA, PCI, GDPR), and flag risky interactions securely and at scale. Legal & Law Enforcement Recordings: Processes police interviews, depositions, and court recordings offline to ensure evidence integrity. On-device transcription avoids chain-of-custody issues tied to cloud uploads. Enterprise Knowledge Management: Automate transcription of internal training sessions, workshops, and engineering reviews into searchable knowledge bases or wiki entries.

Question 2

What is speech-to-text?

Accepted Answer

Speech-to-text (STT), also known as Automatic Speech Recognition (ASR) and Open-Domain Large Vocabulary Speech Recognition (LVSR), refers to the technology and methodologies that convert voice data into text.

Question 3

How does on-device speech-to-text differ from cloud-based speech-to-text?

Accepted Answer

Cloud-based speech-to-text APIs send voice data to vendors' servers, where the transcription engine resides. On-device voice processing brings voice recognition to where the voice data resides, eliminating all the unnecessary steps related to cloud processing.

Question 4

What are the benefits of on-device speech-to-text over cloud speech-to-text APIs?

Accepted Answer

On-device speech-to-text empowers enterprises to retain ownership and control over their data and products. Sending voice data to the cloud has privacy, latency, reliability, and cost implications. On-device speech-to-text overcomes these challenges, bringing control back to enterprises.

Question 5

How is Leopard Speech-to-Text faster and better than other on-device speech-to-text models like Whisper?

Accepted Answer

Most on-device speech-to-text solutions rely on pre-trained models or third-party frameworks like PyTorch, ONNX, or TensorFlow for runtime. This reliance limits fine-tuned optimizations and adds unnecessary overhead, restricting performance and adaptability.

In contrast, Leopard Speech-to-Text is built end-to-end by Picovoice's team, who develop proprietary training frameworks and inference engines. This complete control allows for deep optimization, enabling Leopard to deliver cloud-level accuracy directly on-device—without the typical latency, power, or memory costs of deep learning models.

As a result, Leopard Speech-to-Text is:

Fully customizable
Highly efficient
Exceptionally accurate

Question 6

Does Leopard Speech-to-Text support real-time transcription?

Accepted Answer

Leopard Speech-to-Text doesn't, but Cheetah Streaming Speech-to-Text does. Cheetah is Picovoice's on-device streaming speech-to-text engine that provides text output in real time.

Question 7

Can I use Leopard Speech-to-Text in the cloud?

Accepted Answer

Yes. You can run Leopard Speech-to-Text in the cloud, whether private, public, or hybrid. Picovoice's on-device voice recognition technology ensures that data doesn't have to leave the enterprises' premises regardless of the platform, instead of making the cloud mandatory. Don't forget to check tutorials for serverless speech-to-text with AWS Lambda and transcription microservice with gRPC.

Question 8

Does Leopard Speech-to-Text support Speaker Diarization?

Accepted Answer

Leopard Speech-to-Text offers an optimized Falcon Speaker Diarization embedded to simplify the development process. Please check Leopard Speech-to-Text documentation for more information.

Question 9

Does Leopard Speech-to-Text perform Trucasing and Punctuation?

Accepted Answer

Leopard Speech-to-Text performs Trucasing and Punctuation. Please refer to the Leopard Speech-to-Text documentation to enable or disable automatic punctuation.

Question 10

Does Leopard Speech-to-Text return Word-level Confidence Scores?

Accepted Answer

Leopard Speech-to-Text returns Word-level Confidence Scores. Please refer to the Leopard Speech-to-Text documentation for more information.

Question 11

Does Leopard Speech-to-Text generate Word-level Timestamps?

Accepted Answer

Leopard Speech-to-Text generates Word-level Timestamps. Please refer to the Leopard Speech-to-Text documentation for more information.

Question 12

How do I choose the best speech-to-text for my project?

Accepted Answer

"Best" is a subjective term. Every use case has different business requirements. Several factors, such as accuracy, availability of features, the total cost of ownership, and data privacy and governance, have different weights in different use cases.

Question 13

Which platforms does Leopard Speech-to-Text support?

Accepted Answer

Desktop and Servers: Linux, macOS, and Windows. Web Browsers: Chrome, Safari, Edge, and Firefox. Mobile Devices: Android and iOS. Single Board Computers: Raspberry Pi.

Question 14

Which languages does Leopard Speech-to-Text support?

Accepted Answer

Leopard Speech-to-Text supports English, French, German, Italian, Japanese, Korean, Portuguese, and Spanish.

Question 15

What should I do if I need support for other languages?

Accepted Answer

Reach out to Picovoice Sales to tell us about your commercial endeavor.

Question 16

How do I get technical support for Leopard Speech-to-Text?

Accepted Answer

Picovoice docs, blog, Medium posts, and GitHub are great resources to learn about voice AI, Picovoice technology, and how to start building transcription products. Enterprise customers get dedicated support specific to their applications from Picovoice Product & Engineering teams. While Picovoice customers reach out to their contacts, prospects can also purchase Enterprise Support before committing to any paid plan.

Question 17

How can I get informed about updates and upgrades?

Accepted Answer

Version changes appear in the Picovoice Newsletter and LinkedIn. Subscribing to GitHub is the best way to get notified of patch releases. If you enjoy building with Leopard Speech-to-Text, show it by giving a GitHub star!

On-device voice to text transcription built for regulated industries

Get started with just a few lines of code

Why choose Leopard Speech-to-Text over other Transcription Engines?

Smart IVR: Python Tutorial for AI Call Center Automation

Smart TV Voice Assistant Tutorial in Python

Build a Restaurant Voice Assistant in Python

10 Real-World Examples of Voice Recognition

Build a Voice-Controlled Hotel Assistant in Python

Complete Guide to Real-Time Transcription (2026)

Frequently asked questions