Embedded AI Voice Assistant

Build embedded AI voice assistants, bringing complex control to MCUs

Wake word detection and speech-to-intent recognition running on a microcontroller. Adds voice command and control to any device with no cloud dependency.

Products used

Porcupine Wake Word Rhino Speech-to-Intent

Platforms supported

ArduinoSTM32ARM Cortex-M

Start Free

Loved by developers, trusted by enterprises

How Embedded AI Voice Assistant Works

Two on-device voice AI SDKs. One voice interface.

An embedded AI voice assistant listens for a wake word and infers the user's intent directly from the follow-on spoken command, all on a microcontroller with no cloud connection. Porcupine Wake Word and Rhino Speech-to-Intent compose into a lightweight pipeline that fits on ARM Cortex-M devices. Porcupine is always listening with minimal power draw. After the user says the wake word, Rhino processes the follow-on command and extracts a structured intent with slots, ready for the application to act on. No speech-to-text transcription, no NLU parsing, no cloud round-trip.

Why Porcupine Wake Word?

Always-on, low-power wake word detection for embedded devices.

3.8%

Single-Core CPU Utilization on Raspberry Pi 3

97.1%

Accuracy at 1 false alarm per 10 hours

~250K

Custom wake words trained and deployed in 2025

Porcupine Wake Word provides always-on, low-power wake word detection for embedded devices. It listens continuously on the MCU with minimal CPU usage and triggers Rhino only when the wake word is detected, keeping power consumption low between activations. Custom wake words can be trained in seconds using the Picovoice Console and exported for ARM Cortex-M, Arduino, and all supported platforms.

Wake Word Detection Accuracy - higher the better

Porcupine97.1%

Snowboy68%

PocketSphinx52%

* 1 False Alarm per 10 hours · Tested with noise at 10 dB SNR

CPU Utilization - lower the better

Porcupine3.8%

Snowboy24.8%

PocketSphinx31.8%

* Measured on a Raspberry Pi 3

Why Rhino Speech-to-Intent?

Structured voice commands with no cloud NLU.

6x

Higher accuracy than Big Tech average

97.3%

Accuracy tested across 6 to 24 dB Signal-to-Noise Ratio

∞

Unlimited voice interactions per user

Rhino Speech-to-Intent infers structured intents directly from spoken commands without an intermediate speech-to-text step, achieving a higher accuracy than Big Tech alternatives, like Google Dialogflow and Amazon Lex. This joint optimization produces a much smaller model than a separate STT + NLU pipeline, which is why Rhino fits on microcontrollers with tight memory constraints. Rhino's models are domain-specific where you define the intents and slots for your product (e.g., a coffee maker, a thermostat, an industrial controller) in the Picovoice Console, and Rhino only recognizes commands within that domain, which improves accuracy and eliminates hallucinations and out-of-domain misrecognition.

Voice Command Acceptance Accuracy

Higher is better

Rhino97.3%

Amazon Lex84.3%

Google Dialogflow77.3%

* Average of 7 benchmarks at 6, 9, 12, 15, 16, 21, and 24 dB SNR.

Voice Command Acceptance Accuracy at 21 dB SNR

Higher is better

Rhino99%

Amazon Lex87%

Google Dialogflow83%

* Measured at 21 dB SNR (Low Background Noise).

Embedded AI Voice Assistant Use Cases

From kitchen appliances to factory floors: embedded voice control for product teams shipping real hardware

Home Appliances

Voice-controlled kitchen and home appliances

Appliance manufacturers can add voice commands directly to ovens, coffee makers, washing machines, and thermostats. The user says "Hey Barista, make a large latte" or "Preheat to 220 degrees," and the device acts immediately. Porcupine and Rhino run on the appliance's embedded MCU without requiring a companion app.

Wearables

Voice interfaces for smart glasses, earbuds or helmets

Manufacturers of wearables and hearables can add hands-free voice commands to glasses, helmets, and wearables when users' hands and eyes are occupied. "Check air quality," "Log exercise," "Call supervisor." The pipeline runs within the power and memory budget of battery-operated wearable hardware.

Healthcare

Voice commands for clinical and diagnostic equipment

Medical device manufacturers can embed voice control into diagnostic kiosks, infusion pumps, or patient monitors. "Start test," "Pause infusion," "Show vitals." Clinicians interact with equipment by voice while their hands stay with the patient, the chart, or the procedure. All processing stays on the device, simplifying regulatory compliance.

Industrial

Hands-free control for machinery and instruments

Equipment manufacturers can add voice commands to CNC machines, test instruments, packaging lines, robotic arms, and lab equipment: 'Start cycle,' 'Set speed to 1200,' 'Emergency stop.' Operators keep their hands on the controls while issuing commands. No network infrastructure required on the factory floor. No out-of-domain misrecognition.

Get started

Embedded AI voice assistant on MCUs: Code example

Train a wake word and voice commands, flash to your MCU, and run. No cloud required.

recipe · embedded-ai-voice-assistant

Difficulty

Beginner

Runtime

100% on-device

Language

Python

Platforms supported

ArduinoSTM32ARM Cortex-M

*

* Contact sales for other ARM Cortex-M libraries.

Prerequisites

Picovoice AccessKey from Picovoice Console and GitHub Repo Clone.

Arduino IDE to use Arduino Library Manager.

Usage

These instructions assume your current working directory is recipes/intent-based-voice-assistant/mcu/arduino.

1

Train a custom wake word

In the Picovoice Console, go to Porcupine Wake Word, enter your wake phrase, select ARM Cortex-M as the platform, enter your board type (Arduino Nano 33 BLE Sense), enter your board UUID, and download the .ppn model file.

2

Train custom voice commands

In the Picovoice Console, go to Rhino Speech-to-Intent, use one of the templates or create an empty template and define your intents and slots. Train the .rhn model file for ARM Cortex-M, select your board type(Arduino Nano 33 BLE Sense), enter your board UUID, and download the .rhn model file.

3

Open IntentBasedVoiceAssistantExample

Open File > Examples > Picovoice_{LANGUAGE_CODE} >IntentBasedVoiceAssistantExample.

4

Import the custom wake word

Decompress the zip file. The model for Porcupine wake word is located in two files: A binary .ppn file, and as a.h header file containing a C array version of the binary model. Copy the contents of the array inside the .h header file and update update the DEFAULT_KEYWORD_ARRAY values inpv_params.h.

5

Import the custom context

Decompress the zip file. The model for Rhino speech to intent is located in two files: A binary .rhn file, and as a .h header file containing a C array version of the binary model. Copy the contents of the array inside the .h header file and update the CONTEXT_ARRAY values in pv_params.h.

6

Flash and run

Replace ACCESS_KEY in the source with the AccessKey obtained from Picovoice Console. Press Upload and check the Serial Monitor for outputs.

Have questions or looking for implementations in other languages? Visit the GitHub pico-cookbook Intent-Based Voice Assistant Recipe, where you can find the open-source demo code and create an issue for the demo-related technical questions.

On-device AI cookbook examples

More recipes from picoCookbook

Frequently asked questions

FAQ

+

What is an embedded AI voice assistant?

An embedded AI voice assistant is a voice interface running on a microcontroller that detects a wake word and infers the user's intent from a spoken command, all on-device with no cloud dependency. The device acts on structured intents directly without sending audio to an external server.

+

What hardware does this run on?

The pipeline runs on ARM Cortex-M microcontrollers, including Arduino Nano 33 BLE Sense and STM32 boards. For broader platform support, running on Raspberry Pi, mobile (Android, iOS), desktop (Linux, macOS, Windows), and browser (Chrome, Edge, Firefox, Safari), check our docs, our other recipes.

+

How is speech-to-intent different from speech-to-text?

Speech-to-text transcribes audio into free-form text, which then requires a separate NLU step to extract meaning. Speech-to-intent (Rhino) skips the transcription step and infers structured intents directly from the audio. This joint optimization produces a smaller model that fits on microcontrollers and is more accurate within the target domain because it only recognizes commands you define. Learn more about why speech-to-intent is a better fit than speech-to-text for voice assistants.

+

Can I customize the wake word and voice commands?

Yes. Train a custom wake word in seconds using the Picovoice Console and export it for your target platform. Design custom intents and slots in the Console or using YAML files. Both models are exported as lightweight files that run on MCUs.

+

Does it work without an internet connection?

Yes. Both Porcupine and Rhino run entirely on the device. No audio is transmitted externally.

+

How many voice commands can Rhino handle?

There is no hard technical limit on the number of intents or expressions. On MCUs, the total number of commands is constrained by available flash memory. Given the low compute requirements of Rhino, the memory budget on supported MCUs accommodates thousands of command variations.

+

Can the device talk back to the user?

Yes. Add Orca Streaming Text-to-Speech to the pipeline so the device confirms actions by voice ("Making a large cappuccino with oat milk"). Currently, Orca runs on Raspberry Pi and above. However, custom Orca that can run on MCUs with limited resources can be custom-built depending on the platform.

+

Does the embedded voice assistant store or transmit audio?

No. All audio is processed on the device and discarded. It is never transmitted to Picovoice or any third-party cloud. Picovoice has no data controller relationship with your end users.

+

How can I get technical support?

Visit the GitHub pico-cookbook Intent-Based Voice Assistant Recipe, where you can find the open-source demo code and create an issue for the demo-related technical questions.