The Story
TYPICAL EDGE STACK | PICOVOICE STACK | |
---|---|---|
⚪ DATA | 3rd-party / open datasets ⚠️ No or limited visibility and quality control | Proprietary pipelines & custom curation 👍 Ensures diversity, fairness, edge-optimization |
⚪ MODEL | Cloud-pretrained (e.g., Whisper) ⚠️ Retrofitted, not edge-native | Edge-first proprietary training framework 👍 Efficiency built in from the start |
⚪ RUNTIME | Generic Runtimes (e.g., PyTorch, Onnx) ⚠️ No access to core tech for full optimization | Proprietary inference engine 👍 Memory & compute optimized, zero dependencies |
⚪ OPTIMIZE | Post-training and development ⚠️ Restricted scope, performance trade-offs | Full-stack control 👍 End-to-end optimization at every layer |
⚫ RESULT | ❌ Trade-off: accuracy vs. resource utilization ❌ Cannot match cloud-level accuracy ❌ Introduce compute latency | ✅ Cloud-level accuracy with no compromises ✅ Low latency ✅ Reliable real-time operation |
Customer Stories
Performance
Voice AI is a complex and rapidly evolving technology. Vendors' claims like "the best," "revolutionary," and "most accurate" often fail to help enterprises make informed decisions. Recognizing the lack of scientific methods for choosing the best wake word engine, we developed an open-source wake word benchmark. Addressing a real need led to its adoption by the researchers in the industry and academia. As we introduced new products, we open-sourced our internal benchmarks, which were originally used to ensure that Picovoice's voice AI technology is always on par with — or better than — cloud-dependent voice AI APIs.
Open-source text-to-speech latency benchmark compares the response times of different voice generators when used in LLM-based voice assistants. [Amazon Polly, Azure Text-to-Speech, ElevenLabs, OpenAI TTS, Picovoice Orca Text-to-Speech]
Open-source speech-to-text benchmark is a scalable framework to compare Amazon Transcribe, Azure Speech-to-Text, Google Speech-to-Text, IBM Watson Speech-to-Text, OpenAI Whisper Speech-to-Text, Cheetah Streaming Speech-to-Text, and Picovoice Leopard Speech-to-Text.
Open-source speech enhancement and noise suppression comparison brings a scientific, transparent, and objective framework to compare noise cancellation solutions. [Mozilla RNNoise Noise Suppression, Koala Noise Suppression]
Open-source speaker diarization comparison compares speaker diarization capabilities of Amazon Transcribe Speaker Diarization, Azure Speech-to-Text Speaker Diarization, Google Speech-to-Text Speaker Diarization with Falcon Speaker Diarization and pyannote Speaker Diarization.
Open-source speaker recognition comparison enables data-driven decision making while choosing the best speaker verification and identification SDK. [pyannote Speaker Recognition, SpeechBrain Speaker Recognition, WeSpeaker Speaker Recognition, Eagle Speaker Recognition]
Open-source wake word benchmark evaluates the performance of freely available wake word detection engines. Enterprises can add other alternatives to the comparison framework. [PocketSphinx Wake Word, Snowboy Wake Word, Porcupine Wake Word]
Open-source natural language understanding benchmark is a scalable framework to compare the voice command acceptance performance of Amazon Lex, Google Dialogflow, IBM Watson, Microsoft LUIS, and Picovoice Rhino Speech-to-Intent.
Offerings
Each Picovoice offering has a unique advantage, creating new opportunities for enterprises to bring their vision to life.
Picovoice Voice Recorders eliminates one of the biggest problems in voice AI: audio processing.
Voice AI engines receive audio streams and process them to generate the desired output. Voice AI vendors focus on processing the audio streams. Creating audio streams is a challenge left to developers. Especially finding a solution for real-time audio processing blocks many developers.
We initially built voice recorders for Picovoice engines to simplify the development process. Acknowledging the challenges, we created separate libraries, enabling developers to use them freely.
Incorporating audio output capabilities into your software can be a challenging endeavour. Most developers have limited experience with digital audio beyond voice assistant apps or other audio file playback options. To make matters worse, audio library usability and platform support can vary significantly based on what framework you're working with.
To make life easier for developers, we have created a collection of open-source SDKs designed to streamline audio processing and output, making them as straightforward as possible.
1recorder = PvRecorder(frame_length=512)2recorder.start()3
4while True:5 frame = recorder.read()6 # process audio frame
1const frameLength = 512;2const recorder = new PvRecorder(frameLength);3recorder.start()4
5while (true) {6 const frame = await recorder.read();7 // process audio frame8}
1VoiceProcessor voiceProcessor = VoiceProcessor.getInstance();2
3final VoiceProcessorFrameListener frameListener = frame -> {4 // process audio frame5};6
7voiceProcessor.addFrameListener(frameListener);8
9final int frameLength = 512;10final int sampleRate = 16000;11
12voiceProcessor.start(frameLength, sampleRate);
1let voiceProcessor = VoiceProcessor.instance2
3let frameListener = VoiceProcessorFrameListener { frame in4 // process audio frame5}6
7voiceProcessor.addFrameListener(frameListener);8
9try voiceProcessor.start(frameLength: 512, sampleRate: 16000);
1const worker = new Worker("${WORKER_PATH}");2const engine = {3 onmessage: function(e) {4 // process audio frame5 }6}7
8await WebVoiceProcessor.subscribe([engine, worker]);
1PvRecorder recorder = PvRecorder.Create(frameLength: 512);2recorder.Start();3
4while (true) {5 short[] frame = recorder.Read();6 // process audio frame7}
1VoiceProcessor? _voiceProcessor = VoiceProcessor.instance;2
3VoiceProcessorFrameListener frameListener = (List<int> frame) {4 // process audio frame5}6
7_voiceProcessor?.addFrameListener(frameListener);8
9final int frameLength = 512;10final int sampleRate = 16000;11if (await _voiceProcessor?.hasRecordAudioPermission() ?? false) {12 try {13 await _voiceProcessor?.start(frameLength, sampleRate);14 } on PlatformException catch (ex) {15 // handle start error16 }17} else {18 // user did not grant permission19}
1let voiceProcessor = VoiceProcessor.instance;2
3voiceProcessor.addFrameListener((frame: number[]) => {4 // process audio frame5});6
7const frameLength = 512;8const sampleRate = 16000;9
10try {11 if (await voiceProcessor.hasRecordAudioPermission()) {12 await voiceProcessor.start(frameLength, sampleRate);13 } else {14 // user did not grant permission15 }16} catch (e) {17 // handle start error18}
1const int32_t frame_length = 512;2const int32_t device_index = -1; // -1 == default device3const int32_t buffered_frame_count = 10;4
5pv_recorder_t *recorder = NULL;6pv_recorder_status_t status = pv_recorder_init(7 frame_length,8 device_index,9 buffered_frame_count,10 &recorder);11if (status != PV_RECORDER_STATUS_SUCCESS) {12 // handle PvRecorder init error13}14
15pv_recorder_status_t status = pv_recorder_start(recorder);16if (status != PV_RECORDER_STATUS_SUCCESS) {17 // handle PvRecorder start error18}19
20// must have length equal to `frame_length` that was given to `pv_recorder_init()`21int16_t *frame = malloc(frame_length * sizeof(int16_t));22
23while (true) {24 pv_recorder_status_t status = pv_recorder_read(recorder, frame);25 if (status != PV_RECORDER_STATUS_SUCCESS) {26 // handle PvRecorder read error27 }28
29 // use frame of audio data30 // ...31}
1speaker = PvSpeaker(sample_rate=22050, bits_per_sample=16)2
3speaker.start()4
5def get_next_audio_frame():6 // get audio frame7
8written_length = speaker.write(get_next_audio_frame())
1const sampleRate = 22050;2const bitsPerSample = 16;3const speaker = new PvSpeaker(sampleRate, bitsPerSample);4
5speaker.start()6
7function getNextAudioFrame(): ArrayBuffer {8 // get audio frame9}10
11const writtenLength = speaker.write(getNextAudioFrame())
1var speaker = new PvSpeaker(2 sampleRate: 22050,3 bitsPerSample: 16);4
5speaker.Start();6
7public static byte[] GetNextAudioFrame() { }8
9int writtenLength = speaker.Write(GetNextAudioFrame());
1const int32_t sample_rate = 22050;2const int16_t bits_per_sample = 16;3const int32_t buffer_size_secs = 20;4const int32_t device_index = -1; // -1 == default device5
6pv_speaker_t *speaker = NULL;7pv_speaker_status_t status = pv_speaker_init(8 sample_rate,9 bits_per_sample,10 buffer_size_secs,11 device_index,12 &speaker);13if (status != PV_SPEAKER_STATUS_SUCCESS) {14 // handle PvSpeaker init error15}16
17pv_speaker_status_t status = pv_speaker_start(speaker);18if (status != PV_SPEAKER_STATUS_SUCCESS) {19 // handle PvSpeaker start error20}21
22int32_t num_samples;23int8_t *pcm = get_pcm_data(&num_samples);24int32_t written_length = 0;25
26pv_speaker_status_t status = pv_speaker_write(speaker, pcm, num_samples, &written_length);27if (status != PV_SPEAKER_STATUS_SUCCESS) {28 // handle PvSpeaker write error29}