Accelerating the adoption of voice AI
through innovation

The Story

Cloud vs Edge deployment of voice AI:
trade-offs in performance, privacy, and cost

TYPICAL EDGE STACKPICOVOICE STACK
⚪ DATA3rd-party / open datasets
⚠️ No or limited visibility and quality control
Proprietary pipelines & custom curation
👍 Ensures diversity, fairness, edge-optimization
⚪ MODELCloud-pretrained (e.g., Whisper)
⚠️ Retrofitted, not edge-native
Edge-first proprietary training framework
👍 Efficiency built in from the start
⚪ RUNTIMEGeneric Runtimes (e.g., PyTorch, Onnx)
⚠️ No access to core tech for full optimization
Proprietary inference engine
👍 Memory & compute optimized, zero dependencies
⚪ OPTIMIZEPost-training and development
⚠️ Restricted scope, performance trade-offs
Full-stack control
👍 End-to-end optimization at every layer
⚫ RESULT❌ Trade-off: accuracy vs. resource utilization
❌ Cannot match cloud-level accuracy
❌ Introduce compute latency
✅ Cloud-level accuracy with no compromises
✅ Low latency
✅ Reliable real-time operation

Customer Stories

Learn about real Picovoice impacts

Warehouse Management

Voice-directed order fulfillment boosts worker productivity. A major warehousing company adopted Picovoice for its accuracy, low latency, and low power consumption.

Fortune 500 Communications Tech Provider

Hands-free "panic button" deployed across large campuses, e.g., schools, enhancing safety. A Fortune 500 critical communications technology provider chose Picovoice for its highly performant technology that effectively runs on embedded systems.

Laptop Manufacturer

Hands-free voice AI companions, elevating the user experience. A leading laptop manufacturer deployed Picovoice's on-device technology on AI PCs for its low latency and cost-effectiveness.

Dashcam Manufacturer

Hands-free control enhances the driving experience. A leading dashcam manufacturer chose Picovoice over Alexa for branded and custom voice commands.

Performance

Building & choosing the best voice AI technology

Voice AI is a complex and rapidly evolving technology. Vendors' claims like "the best," "revolutionary," and "most accurate" often fail to help enterprises make informed decisions. Recognizing the lack of scientific methods for choosing the best wake word engine, we developed an open-source wake word benchmark. Addressing a real need led to its adoption by the researchers in the industry and academia. As we introduced new products, we open-sourced our internal benchmarks, which were originally used to ensure that Picovoice's voice AI technology is always on par with — or better than — cloud-dependent voice AI APIs.

Open-source text-to-speech latency benchmark compares the response times of different voice generators when used in LLM-based voice assistants. [Amazon Polly, Azure Text-to-Speech, ElevenLabs, OpenAI TTS, Picovoice Orca Text-to-Speech]

Open-source wake word benchmark evaluates the performance of freely available wake word detection engines. Enterprises can add other alternatives to the comparison framework. [PocketSphinx Wake Word, Snowboy Wake Word, Porcupine Wake Word]

Open-source LLM Compression Benchmark compares compression techniques that are used to reduce large language models (LLMs) size and memory usage while preserving quality. [GPTQ, picoLLM Compression]

Offerings

On-device Voice AI Offerings

Each Picovoice offering has a unique advantage, creating new opportunities for enterprises to bring their vision to life.

Products

Services

Tools

Picovoice Voice Recorders eliminates one of the biggest problems in voice AI: audio processing.

Voice AI engines receive audio streams and process them to generate the desired output. Voice AI vendors focus on processing the audio streams. Creating audio streams is a challenge left to developers. Especially finding a solution for real-time audio processing blocks many developers.

We initially built voice recorders for Picovoice engines to simplify the development process. Acknowledging the challenges, we created separate libraries, enabling developers to use them freely.

Incorporating audio output capabilities into your software can be a challenging endeavour. Most developers have limited experience with digital audio beyond voice assistant apps or other audio file playback options. To make matters worse, audio library usability and platform support can vary significantly based on what framework you're working with.

To make life easier for developers, we have created a collection of open-source SDKs designed to streamline audio processing and output, making them as straightforward as possible.

1recorder = PvRecorder(frame_length=512)
2recorder.start()
3
4while True:
5 frame = recorder.read()
6 # process audio frame
1const frameLength = 512;
2const recorder = new PvRecorder(frameLength);
3recorder.start()
4
5while (true) {
6 const frame = await recorder.read();
7 // process audio frame
8}
1VoiceProcessor voiceProcessor = VoiceProcessor.getInstance();
2
3final VoiceProcessorFrameListener frameListener = frame -> {
4 // process audio frame
5};
6
7voiceProcessor.addFrameListener(frameListener);
8
9final int frameLength = 512;
10final int sampleRate = 16000;
11
12voiceProcessor.start(frameLength, sampleRate);
1let voiceProcessor = VoiceProcessor.instance
2
3let frameListener = VoiceProcessorFrameListener { frame in
4 // process audio frame
5}
6
7voiceProcessor.addFrameListener(frameListener);
8
9try voiceProcessor.start(frameLength: 512, sampleRate: 16000);
1const worker = new Worker("${WORKER_PATH}");
2const engine = {
3 onmessage: function(e) {
4 // process audio frame
5 }
6}
7
8await WebVoiceProcessor.subscribe([engine, worker]);
1PvRecorder recorder = PvRecorder.Create(frameLength: 512);
2recorder.Start();
3
4while (true) {
5 short[] frame = recorder.Read();
6 // process audio frame
7}
1VoiceProcessor? _voiceProcessor = VoiceProcessor.instance;
2
3VoiceProcessorFrameListener frameListener = (List<int> frame) {
4 // process audio frame
5}
6
7_voiceProcessor?.addFrameListener(frameListener);
8
9final int frameLength = 512;
10final int sampleRate = 16000;
11if (await _voiceProcessor?.hasRecordAudioPermission() ?? false) {
12 try {
13 await _voiceProcessor?.start(frameLength, sampleRate);
14 } on PlatformException catch (ex) {
15 // handle start error
16 }
17} else {
18 // user did not grant permission
19}
1let voiceProcessor = VoiceProcessor.instance;
2
3voiceProcessor.addFrameListener((frame: number[]) => {
4 // process audio frame
5});
6
7const frameLength = 512;
8const sampleRate = 16000;
9
10try {
11 if (await voiceProcessor.hasRecordAudioPermission()) {
12 await voiceProcessor.start(frameLength, sampleRate);
13 } else {
14 // user did not grant permission
15 }
16} catch (e) {
17 // handle start error
18}
1const int32_t frame_length = 512;
2const int32_t device_index = -1; // -1 == default device
3const int32_t buffered_frame_count = 10;
4
5pv_recorder_t *recorder = NULL;
6pv_recorder_status_t status = pv_recorder_init(
7 frame_length,
8 device_index,
9 buffered_frame_count,
10 &recorder);
11if (status != PV_RECORDER_STATUS_SUCCESS) {
12 // handle PvRecorder init error
13}
14
15pv_recorder_status_t status = pv_recorder_start(recorder);
16if (status != PV_RECORDER_STATUS_SUCCESS) {
17 // handle PvRecorder start error
18}
19
20// must have length equal to `frame_length` that was given to `pv_recorder_init()`
21int16_t *frame = malloc(frame_length * sizeof(int16_t));
22
23while (true) {
24 pv_recorder_status_t status = pv_recorder_read(recorder, frame);
25 if (status != PV_RECORDER_STATUS_SUCCESS) {
26 // handle PvRecorder read error
27 }
28
29 // use frame of audio data
30 // ...
31}
1speaker = PvSpeaker(sample_rate=22050, bits_per_sample=16)
2
3speaker.start()
4
5def get_next_audio_frame():
6 // get audio frame
7
8written_length = speaker.write(get_next_audio_frame())
1const sampleRate = 22050;
2const bitsPerSample = 16;
3const speaker = new PvSpeaker(sampleRate, bitsPerSample);
4
5speaker.start()
6
7function getNextAudioFrame(): ArrayBuffer {
8 // get audio frame
9}
10
11const writtenLength = speaker.write(getNextAudioFrame())
1var speaker = new PvSpeaker(
2 sampleRate: 22050,
3 bitsPerSample: 16);
4
5speaker.Start();
6
7public static byte[] GetNextAudioFrame() { }
8
9int writtenLength = speaker.Write(GetNextAudioFrame());
1const int32_t sample_rate = 22050;
2const int16_t bits_per_sample = 16;
3const int32_t buffer_size_secs = 20;
4const int32_t device_index = -1; // -1 == default device
5
6pv_speaker_t *speaker = NULL;
7pv_speaker_status_t status = pv_speaker_init(
8 sample_rate,
9 bits_per_sample,
10 buffer_size_secs,
11 device_index,
12 &speaker);
13if (status != PV_SPEAKER_STATUS_SUCCESS) {
14 // handle PvSpeaker init error
15}
16
17pv_speaker_status_t status = pv_speaker_start(speaker);
18if (status != PV_SPEAKER_STATUS_SUCCESS) {
19 // handle PvSpeaker start error
20}
21
22int32_t num_samples;
23int8_t *pcm = get_pcm_data(&num_samples);
24int32_t written_length = 0;
25
26pv_speaker_status_t status = pv_speaker_write(speaker, pcm, num_samples, &written_length);
27if (status != PV_SPEAKER_STATUS_SUCCESS) {
28 // handle PvSpeaker write error
29}