Lightweight, enterprise-grade on-device STT that converts voice to text with cloud-level accuracy while meeting the strictest compliance requirements.
Leopard Speech-to-Text is a software that converts audio and video recordings into text with cloud-level accuracy without sending them to the cloud. Leopard Speech-to-Text is compliant with any regulations, including GDPR, and HIPAA as it processes voice data offline, without transmitting it to 3rd party platforms.
1o = pvleopard.create(access_key)2
3transcript, words =4 o.process_file(path)
1const o = new Leopard(accessKey)2
3const { transcript, words } =4 o.processFile(path)
1Leopard o = new Leopard.Builder()2 .setAccessKey(accessKey)3 .setModelPath(modelPath)4 .build(appContext);5
6LeopardTranscript r =7 o.processFile(path);
1let o = Leopard(2 accessKey: accessKey,3 modelPath: modelPath)4
5let r = o.processFile(path)
1Leopard o = new Leopard.Builder()2 .setAccessKey(accessKey)3 .build();4
5LeopardTranscript r =6 o.processFile(path);
1Leopard o =2 Leopard.Create(accessKey);3
4LeopardTranscript result =5 o.ProcessFile(path);
1const {2 result,3 isLoaded,4 error,5 init,6 processFile,7 startRecording,8 stopRecording,9 isRecording,10 recordingElapsedSec,11 release,12} = useLeopard();13
14await init(15 accessKey,16 model17);18
19await processFile(audioFile);20
21useEffect(() => {22 if (result !== null) {23 // Handle transcript24 }25}, [result])
1Leopard o = await Leopard.create(2 accessKey,3 modelPath);4
5LeopardTranscript result =6 await o.processFile(path);
1const o = await Leopard.create(2 accessKey,3 modelPath)4
5const {transcript, words} =6 await o.processFile(path)
1pv_leopard_t *leopard = NULL;2pv_leopard_init(3 access_key,4 model_path,5 enable_automatic_punctuation,6 &leopard);7
8char *transcript = NULL;9int32_t num_words = 0;10pv_word_t *words = NULL;11pv_leopard_process_file(12 leopard,13 path,14 &transcript,15 &num_words,16 &words);
1const leopard =2 await LeopardWorker.3 fromPublicDirectory(4 accessKey,5 modelPath6 );7
8const {9 transcript,10 words11} =12 await leopard.process(pcm);
Enterprises in regulated industries, such as healthcare, finance, and defense must meet strict data privacy and retention policies. Cloud-dependent Speech-to-text APIs require enterprises to send their data to a 3rd party cloud, giving away control over their data and products.
Leopard converts voice to text by keeping all voice data local to the device, and complies with HIPAA, GDPR, SOC 2, and other regulations.
Speech-to-text (STT), also known as Automatic Speech Recognition (ASR) and Open-Domain Large Vocabulary Speech Recognition (LVSR), refers to the technology and methodologies that convert voice data into text.
Cloud-based speech-to-text APIs send voice data to vendors’ servers, where the transcription engine resides. On-device voice processing brings voice recognition to where the voice data resides, eliminating all the unnecessary steps related to cloud processing.
On-device speech-to-text empowers enterprises to retain ownership and control over their data and products. Sending voice data to the cloud has privacy, latency, reliability, and cost implications. On-device speech-to-text overcomes these challenges, bringing control back to enterprises.
Most on-device speech-to-text solutions rely on pre-trained models or third-party frameworks like PyTorch, ONNX, or TensorFlow for runtime. This reliance limits fine-tuned optimizations and adds unnecessary overhead, restricting performance and adaptability.
In contrast, Leopard Speech-to-Text is built end-to-end by Picovoice’s team, who develop proprietary training frameworks and inference engines. This complete control allows for deep optimization, enabling Leopard to deliver cloud-level accuracy directly on-device—without the typical latency, power, or memory costs of deep learning models.
As a result, Leopard Speech-to-Text is:
Leopard Speech-to-Text doesn’t, but Cheetah Streaming Speech-to-Text does. Cheetah is Picovoice’s on-device streaming speech-to-text engine that provides text output in real time.
Yes. You can run Leopard Speech-to-Text in the cloud, whether private, public, or hybrid. Picovoice’s on-device voice recognition technology ensures that data doesn’t have to leave the enterprises’ premises regardless of the platform, instead of making the cloud mandatory. Don’t forget to check tutorials for serverless speech-to-text with AWS Lambda and transcription microservice with gRPC.
Leopard Speech-to-Text offers an optimized Falcon Speaker Diarization embedded to simplify the development process. Please check Leopard Speech-to-Text documentation for more information.
Leopard Speech-to-Text performs Trucasing and Punctuation. Please refer to the Leopard Speech-to-Text documentation to enable or disable automatic punctuation.
Leopard Speech-to-Text returns Word-level Confidence Scores. Please refer to the Leopard Speech-to-Text documentation for more information.
Leopard Speech-to-Text generates Word-level Timestamps. Please refer to the Leopard Speech-to-Text documentation for more information.
“Best” is a subjective term. Every use case has different business requirements. Several factors, such as accuracy, availability of features, the total cost of ownership, and data privacy and governance, have different weights in different use cases.
Leopard Speech-to-Text supports English, French, German, Italian, Japanese, Korean, Portuguese, and Spanish.
Reach out to Picovoice Sales to tell us about your commercial endeavor.
Picovoice docs, blog, Medium posts, and GitHub are great resources to learn about voice AI, Picovoice technology, and how to start building transcription products. Enterprise customers get dedicated support specific to their applications from Picovoice Product & Engineering teams. While Picovoice customers reach out to their contacts, prospects can also purchase Enterprise Support before committing to any paid plan.