Design, develop, and ship useful voice features

The end-to-end platform for embedding private voice AI into any software in a few lines of code

Loved by developers, trusted by enterprises

What is the end-to-end on-device voice AI platform?

Picovoice end-to-end on-device voice AI platform consists of voice AI engines and models to empower enterprises to design, develop, and ship voice products without sacrificing user privacy or experience.

Picovoice end-to-end on-device platform features a complete set of modular voice AI engines delivered as cross-platform SDKs and a no-code platform to instantly train bespoke voice AI models to boost accuracy and efficiency.

End-to-end voice AI platform

Design

Design with no limits on top of a modular platform. Create use-case-specific voice AI models in seconds.

Develop

Develop voice features with a few lines of code using intuitive and cross-platform SDKs.

Ship

Deliver voice AI everywhere: on-device, mobile, web browsers, on-premise, or cloud.

Iterate

Measure adoption, learn, and iterate. Continuously re-design and re-train to optimize engagement.

Offerings

picoLLM

LLM Quantization & Inference

End-to-end platform compresses any LLM without sacrificing accuracy and runs across Linux, macOS, Windows, Android, iOS, Chrome, Safari, Edge, Firefox, Raspberry Pi, supporting both CPU and GPU.

Start Building Learn More

Model used: Llama 3.2

Hello, Llama!

Hello! Start the demo to begin a conversation.

Why choose Picovoice?

Building accurate, responsive, and private voice technology is difficult.
We learned the hard way, so you don't have to.

Innovative

Picovoice heavily invests in R&D to offer superior voice AI that surpasses even Big Tech in accuracy and efficiency. Picovoice researchers do not follow recent frameworks and techniques but build them.

Efficient

Performant on-device AI with minimal resource requirements is challenging. Picovoice's efficient models make cloud accuracy possible across platforms with no compromises.

Private

Picovoice returns control to enterprises as voice data never leaves the premises. Enterprises enjoy high accuracy without compromising privacy and reliability.

Don't just take our word!

Put us to the test, for free

Start Free

Speech-to-Text

A transcription engine that automatically converts audio and video recordings into text with high accuracy without sacrificing privacy.

Leopard Speech-to-Text

Streaming Speech-to-Text

A real-time transcription engine that automatically converts conversations into text with zero latency.

Cheetah Streaming Speech-to-Text

Noise Suppression and Cancellation

Noise cancellation software that removes background noise from audio in real time while preserving human speech

Koala Noise Suppression

Speaker Recognition and Identification

Speaker recognition and identification software that distinguishes individuals using their unique voice characteristics.

Eagle Speaker Recognition

Falcon Speaker Diarization

A speaker diarization engine that identifies “who spoke when” in an audio stream by finding speaker changes and grouping them.

Falcon Speaker Diarization

Wake Word Detection

A wake word detection engine that recognizes unique signals to transition software from passive to active listening.

Porcupine Wake Word

Speech-to-Intent

Natural Language Understanding engine fused with speech-to-text, allowing users to interact with applications via voice commands.

Rhino Speech-to-Intent

Voice Activity Detection

Voice activity detection (VAD) software scans audio streams to identify the presence of human speech in real time.

Cobra Voice Activity Detection

Text-to-Speech

A voice generator that converts written text into spoken audio output without network latency or jeopardizing user privacy.

Orca Text-to-Speech

The Enterprise Voice AI

Secure and Flexible Deployment

Secure and flexible deployment with embedded, mobile, web browsers, on-premise, and cloud options. Expert help to choose the best deployment and platform for unique needs.

Picovoice Consulting

Customizable Voice AI Models

Performant out-of-the-box voice AI models, proven by open-source benchmarks. Purpose-built models for specialized applications, use cases, domains, and industries.

Picovoice Consulting

Enterprise-Grade Support

Easy-to-follow docs covering 99% of questions and an active GitHub community addressing technical issues. Dedicated support for enterprise customers through paid plans.

Enterprise Support

The Edge Voice AI Platform

Private, reliable and powerful
voice products

Start Free

How to Create Subtitles for any Video with Python

Speech Recognition on Raspberry Pi

How to Record Audio using Python

Speech-to-Text using Node.js

Python Wake Word Detection Tutorial

How to Record Audio from a Web Browser

FAQ

Feature

Usage

Technical Questions

Custom Models & Support

Data Security & Privacy

Building with Picovoice

Feature

What does Picovoice on-device Voice AI Platform offer?

On-device voice AI platform offers everything that developers need to design, develop, and ship voice products: a complete set of modular voice AI engines delivered as cross-platform SDKs and a no-code platform to instantly train bespoke voice AI models to boost accuracy and efficiency.

What should I use to transcribe real-time conversations such as live events, conferences, and meetings, or enable note-taking and voice typing?

We recommend Cheetah Streaming Speech-to-Text for real-time conversations such as live events, conferences, and meetings, or enable note-taking and voice typing.

Please note that every use case is unique and the nuances may affect the performance of your product. If you're a Picovoice customer, please reach out to your Picovoice contact to get dedicated support. If you are not a customer yet, you can purchase Enterprise Support to discuss your use case to get your technical questions answered by the experts.

What should I use to convert audio and video files such as recordings of interviews, meetings, calls, podcasts, and voicemails into text?

We recommend Leopard Speech-to-Text to convert audio and video files such as recordings of interviews, meetings, or calls, podcasts, and voicemails into text.

What should I use to achieve crisp and clear conversations by removing background noise and enhancing speech?

We recommend Koala Noise Suppression to achieve crisp and clear conversations by removing background noise and enhancing speech

What should I use to diarize speakers in conversations to make transcripts readable and analyzable?

We recommend Falcon Speaker Diarization for speaker diarization to make transcripts readable and analyzable.

What should I use to identify and verify speakers, and personalize experiences simply by recognizing the user's voice?

We recommend Eagle Speaker Recognition to identify and verify speakers and personalize experiences simply by recognizing the user's voice.

What should I use to convert written text into spoken audio output?

We recommend Orca Streaming Text-to-Speech to convert written text into spoken audio output.

What should I use to add voice to an LLM-powered application to build an AI agent?

We recommend Orca Streaming Text-to-Speech to convert streaming LLM text output into voice.

What should I use to detect wake words, always listening commands, and monitor conversations for specific keywords?

We recommend Porcupine Wake Word to detect wake words (Alexa), always listening commands (turn the lights on), and monitor conversations for specific keywords (product name).

What should I use to add custom voice commands to software, create voicebots and IVRs, and navigate menus?

We recommend Rhino Speech-to-Intent to add custom voice commands to software (set the brightness at 60%), create voicebots and IVRs, and navigate in menus (2022 Hyundai IONIQ 5 AWD)

What should I use to activate software when someone starts or stops speaking?

We recommend Cobra Voice Activity Detection to detect when someone starts or stops speaking and trigger action accordingly.

What should I do to detect and clean silence in audio and video data?

We recommend Cobra Voice Activity Detection to detect and clean silence in audio and video data.

What should I use to record and process audio files to use Picovoice voice AI engines?

We recommend Picovoice Voice Recorders to record and process audio files to create audio streams and use Picovoice Voice AI engines.

How can I train custom voice AI models?

You can train voice AI models on the Picovoice Console . Picovoice Console is a no-code platform with a web-based type-and-train interface. You can create an account for the Picovoice Console account immediately and start building without engaging with the Picovoice team.

Before signing up for the Console, you can watch our tutorials to learn how to train custom voice AI models:

Usage

What are the hardware and software platforms supported by Picovoice voice AI engines?

Desktop & Server: Linux, Windows & macOS
Mobile: Android & iOS
Web Browsers: Chrome, Safari, Edge and Firefox
Single Board Computers: Raspberry Pi
Cloud Providers: AWS, Azure, Google, IBM, Oracle, and others.

Do Picovoice voice AI engines run in the cloud?

Yes, You can run all Picovoice voice AI engines (Speech-to-Text, Streaming Speech-to-Text, Noise Suppression, Speaker Recognition, Speaker Diarization, Text-to-Speech, Wake Word, Speech-to-Intent, Voice Activity Detection, Speech-to-Index) in the cloud.

Do Picovoice voice AI engines run on-prem?

Do Picovoice voice AI engines run in the serverless?

Do Picovoice voice AI engines run on mobile devices?

Do Picovoice voice AI engines run within web browsers?

Do Picovoice voice AI engines run on embedded devices?

Do Picovoice voice AI engines need a GPU?

No, Picovoice voice AI engines do not require a GPU.

Which SDKs are supported by Picovoice Voice AI?

Picovoice on-device Voice AI platform supports all modern SDKs. Android, C, .NET Flutter, iOS, Java, Nodejs, Python, React, React Native, Rust, Unity, Web. If you need another SDK, you can check our open-source SDKs and build it yourself or contact Picovoice Consulting. Picovoice Consulting experts can create a private library for the SDK of your choice and maintain it.

Why do Picovoice Voice AI engines need an AccessKey (i.e., internet connectivity) if engines process data offline?

Picovoice uses AccessKey, hence internet connectivity, to be able to offer its services according to your plan limits. Picovoice engines call home servers to validate the AccessKey and check your plan limits.

How does Picovoice track Voice AI engine usage?

Picovoice tracks usage in the amount of data processed -in hours or characters, or the number of users depending on the engines and project setup.

When does my Voice AI engine usage reset?

Your usage automatically resets every 30 days.

Picovoice tracks the usage accumulated in the last 30 days. You can see the real-time consumption on your Picovoice Console Profile.

How does Picovoice track custom Voice AI model training?

Downloaded models are counted toward your monthly model training usage. Once you hit download, your training usage will increase.

When does my Voice AI model training reset?

Picovoice tracks the training usage accumulated in the last 30 days. You can see the total number of models you trained in the last 30 days on your Picovoice Console Profile.

Can I use Picovoice Voice AI engines for research, non-commercial or commercial purposes?

Yes, you can use Picovoice Voice AI engines for research, non-commercial, and commercial purposes as long as you are within your plan limits and compliant with the Picovoice Terms of Use.

Technical Questions

Is Picovoice open-source?

Picovoice voice AI SDKs, voice recorders, and benchmarks are open-source and free to use.

How accurate is Picovoice?

To enable, data-driven decision-making and communicate its engines' accuracy, Picovoice publishes open-source benchmarks for each engine. You can reproduce them or run them with your data.

How are Picovoice's small voice AI models more accurate than large, cloud-dependent AI models?

Picovoice researchers continuously improve techniques and frameworks used to train algorithms. Picovoice applies transfer learning, hardware-aware training, and neural compression principles, resulting in efficient models competing with cloud-dependent AI models.

How fast is Picovoice?

It depends on your tech stack and design. Given the number of engines Picovoice offers and the platforms it supports, it's hard to communicate one number. We encourage developers to do their own tests and evaluations in their real environments.

Does Picovoice technology work across various accents and dialects?

Yes, Picovoice technology works well across accents and dialects. The best way to learn about it is to test Picovoice engines with your dataset. Picovoice offers a Free Plan that allows enterprises to evaluate and become familiar with the technology, as well as a Foundation Plan to run thorough tests before committing to an Enterprise Plan.

Picovoice aims to provide realistic benchmarks by leveraging various accents and noise. Yet, we encourage developers to test the engines in their real-world environments.

Can I use Picovoice software for telephony applications?

Picovoice engines expect audio with a 16kHz sampling rate. PSTN networks usually sample at 8kHz. It is possible to upsample but the frequency content above 4kHz is gone, and performance will be suboptimal. It is possible to train acoustic models for telephony applications for enterprise customers. Engage with Picovoice Consulting to find the best solution that works for you.

My audio source is 48kHz/44.1kHz. Does Picovoice software support that?

Picovoice software expects a 16kHz sampling rate. You will need to downsample. Typically, operating systems or sound cards (Audio codecs) provide such functionality; otherwise, you will need to implement it.

What's the 16kHz sampling rate?

Picovoice software expects a 16kHz sampling rate, as it strikes a balance between quality and file size, used in voice commands and speech recognition technologies. At 16kHz, audio files are small enough to store and transmit while offering reasonable audio quality. Secondly, the human voice's most critical frequencies lie between 300Hz and 3400Hz. The Nyquist-Shannon sampling theorem states that a sampling rate of at least twice the highest frequency is required for accurate signal representation. 16kHz is more than twice 3400Hz and sufficient for processing the human voice. That's why 16kHz has become a standard in applications using human speech and voice.

What are the other factors that affect the performance of voice AI engines?

There are several factors that affect the performance of voice AI engines: quality of audio data, environment - noise, echo, reverberation, tech stack, and design.

Custom Models & Support

How can I fine-tune Picovoice models?

You can leverage the self-service Picovoice Console to fine-tune voice AI models or engage with Picovoice Consulting for further improvement.

See how to fine-tune models on the Picovoice Console:

How do custom speech recognition models compare with general models?

Custom speech recognition models are created for specific tasks, specific use cases, and sometimes for specific environments. General-purpose models are jacks-of-all-trades and masters-of-none. For example, if you need a medical dictation app, you need a fine-tuned speech-to-text to be able to capture the jargon correctly. If you're building a sales enablement app, just like you train your salesforce to learn about your product names, you should adapt the general speech recognition model accordingly.

We need a new voice AI engine or model that the Picovoice voice AI platform doesn't offer. How can we get a new engine/model developed?

You can engage with Picovoice Consulting to discuss the opportunity.

My platform is not currently supported by Picovoice. How can I get Picovoice to support it?

Picovoice voice AI engines support the most popular and widely-used hardware and software out-of-the-box - from web, mobile, desktop, and on-prem to private cloud. However, there may be certain chipsets we do not currently support. (There are so many of them, yet only so much time and money, making it impossible to support everything.) You can engage with Picovoice Consulting and get any Picovoice voice AI engine ported to the platform of your choice.

Picovoice Voice AI engines don't offer the SDK we're using in production. How can I get a new SDK added?

Picovoice voice AI engines support the most popular and widely used SDKs. If you need another SDK, you can check our open-source SDKs and build it yourself or contact Picovoice Consulting. Picovoice Consulting experts can create a public or private library for the SDK of your choice and maintain it.

Current Picovoice Voice AI dictionaries do not include the words that I need. How can I add a new word?

Picovoice engines have a lexicon of hundreds of thousands of words in their lexicons. However, there might be some special words we missed. You can reach out to your Picovoice contacts when you become a Picovoice Enterprise Plan customer to get a new word added to the lexicon.

I am using official Picovoice voice AI demos, however, I get an error. How do I report bugs?

You can create a GitHub issue under the relevant repository/demo.

I need help with developing my PoC and product. How do I get help?

Enterprises face several challenges while building PoCs. Finding a talent experienced in machine learning is one of the biggest challenges to start with. We learned this the hard way, and experience it every day. On top of it, executives and clients may have unrealistic deadlines.

Experts at Picovoice Consulting help enterprises build PoCs, develop their AI strategy, and work with them hand-in-hand offering the guidance they need.

Data Security & Privacy

Where does Picovoice process data?

Picovoice voice AI engines process data in your environment, whether it's public or private cloud, on-prem, web, mobile, desktop, or embedded.

For how long do Picovoice Voice AI engines retain user data, audio, or text files?

Picovoice is private by design and has no access to user data. Thus, Picovoice doesn't retain user data as it never tracks or stores them in the first place.

Are Picovoice Voice AI engines HIPAA-compliant?

Yes. Enterprises using Picovoice don't need to share their user data with Picovoice or any other 3rd party to run voice AI models, making Picovoice voice AI platform intrinsically HIPAA-compliant.

Are Picovoice Voice AI engines CCPA-compliant?

Yes. Enterprises using Picovoice don't need to share their user data with Picovoice or any other 3rd party to run voice AI models, making Picovoice voice AI platform intrinsically CCPA-compliant.

Building with Picovoice

Can I use Picovoice Voice AI engines with picoLLM to build voice AI agents?

Yes, you can use voice AI with local LLMs and create private, accurate, and reliable AI agents. You can find examples of voice AI agents running across Linux, macOS, Windows, Android, iOS, Chrome, Safari, Edge, Firefox, Raspberry Pi on the picoLLM Inference platform page.

What are the best practices to develop and deploy voice AI engines and models?

The answer is “it depends”. Voice AI is complex technology and building products for production requires diligent work. It depends on your use case, other tools, and the tech stack used, along with hardware and software choice. Given the variables, it can be challenging. You can experiment different scenarios leveraging Picovoice's Free resources or engage with experts from Picovoice Consulting to find the best approach to deploying language models for production.

Can I use multiple Picovoice products together?

Yes! Picovoice engines are modular and work with other Picovoice products or competitive products. Check Picovoice Blog or GitHub to find more information, tutorials, and demos.

Which languages does Picovoice support?

Picovoice currently supports eight languages: English, French, German, Italian, Japanese, Korean, Portuguese, and Spanish.

Please visit the product page to check the public availability of languages for the engines you’re interested in. If you have a commercial opportunity requiring another language, you can engage with Picovoice Consulting once you become an Enterprise Plan customer.

Picovoice Voice AI engines don’t offer the languages that we’re interested in. How can I get a new language added?

Picovoice currently supports eight languages: English, French, German, Italian, Japanese, Korean, Portuguese, and Spanish.

Please check the product page if you’re looking for engine-specific information. If you have an opportunity requiring another language, engage with Picovoice Consulting to get a custom model trained for you!

Design, develop, and ship useful voice features

What is the end-to-end on-device voice AI platform?

End-to-end voice AI platform

Design

Develop

Ship

Iterate

Offerings

Why choose Picovoice?

Innovative

Efficient

Private

Don't just take our word!

What does the On-device Voice AI Platform offer?

Speech-to-Text

Streaming Speech-to-Text

Noise Suppression and Cancellation

Speaker Recognition and Identification

Falcon Speaker Diarization

Wake Word Detection

Speech-to-Intent

Voice Activity Detection

Text-to-Speech

The Enterprise Voice AI

Secure and Flexible Deployment

Customizable Voice AI Models

Enterprise-Grade Support

Private, reliable and powerful
voice products

More from Picovoice

How to Create Subtitles for any Video with Python

Speech Recognition on Raspberry Pi

How to Record Audio using Python

Speech-to-Text using Node.js

Python Wake Word Detection Tutorial

How to Record Audio from a Web Browser

FAQ

Design, develop, and ship useful voice features

What is the end-to-end on-device voice AI platform?

End-to-end voice AI platform

Design

Develop

Ship

Iterate

Offerings

Why choose Picovoice?

Innovative

Efficient

Private

Don't just take our word!

What does the On-device Voice AI Platform offer?

Speech-to-Text

Streaming Speech-to-Text

Noise Suppression and Cancellation

Speaker Recognition and Identification

Falcon Speaker Diarization

Wake Word Detection

Speech-to-Intent

Voice Activity Detection

Text-to-Speech

The Enterprise Voice AI

Secure and Flexible Deployment

Customizable Voice AI Models

Enterprise-Grade Support

Private, reliable and powerfulvoice products

More from Picovoice

How to Create Subtitles for any Video with Python

Speech Recognition on Raspberry Pi

How to Record Audio using Python

Speech-to-Text using Node.js

Python Wake Word Detection Tutorial

How to Record Audio from a Web Browser

FAQ

Private, reliable and powerful
voice products