Building voice control into C applications requires real-time audio processing and consistent performance across Linux, Windows, macOS, and Raspberry Pi. Many developers use cloud-based solutions like Google Dialogflow, Amazon Lex, and IBM Watson—but all require sending audio to remote servers, introducing latency, connectivity dependencies, and privacy concerns.
For custom voice control in domain-specific applications, Speech-to-Intent offers a better approach. Instead of transcribing "Turn on the bedroom lights" to text, Speech-to-Intent directly extracts structured meaning:
This tutorial shows you how to build cross-platform voice command recognition in C using Rhino Speech-to-Intent, an on-device Speech-to-Intent engine. Rhino Speech-to-Intent processes voice commands locally and maps spoken phrases directly to actionable intents—no cloud round-trips, transcription overhead, or external audio transmission required. An open-source benchmark demonstrates that Rhino is six times more accurate than Google Dialogflow, Amazon Lex, and IBM Watson.
By the end of this guide, you'll have a working C application that recognizes custom voice commands entirely on-device with better privacy, reliability, and accuracy than cloud alternatives.
Important: This tutorial builds on How to Record Audio in C. Ensure your audio capture environment is ready before continuing.
Prerequisites
- C99-compatible compiler
- Windows: MinGW
Supported Platforms
- Linux (x86_64)
- macOS (x86_64, arm64)
- Windows (x86_64, arm64)
- Raspberry Pi (Zero, 3, 4, 5)
Project Setup
The tutorial will use the following directory structure:
For instructions on setting up audio capture (with pvrecorder), see: How to Record Audio in C
Step 1: Add Rhino library files
- Create a folder named
rhino/. - Download the Rhino header files from GitHub and place them in:
- Download a Rhino model file and place it in:
Dynamic Loading Infrastructure
Rhino Speech-to-Intent ships as a shared library (.so, .dylib, .dll). Instead of linking at compile time, we'll load the library at runtime.
We'll build helpers to:
- open a shared library
- fetch function pointers
- close it gracefully
These helpers remain identical whether you're using PvRecorder, Cheetah Streaming Speech-to-Text, Porcupine Wake Word, Rhino Speech-to-Intent, or other Picovoice engines.
Step 2: Platform-specific headers
Explaining the headers
- On Windows systems,
windows.hprovides theLoadLibraryfunction to load a shared library andGetProcAddressto retrieve individual function pointers. - On Unix-based systems,
dlopenanddlsymfrom thedlfcn.hheader provide the same functionality. - Lastly,
signal.hallows us to handleCtrl-Clater in this example.
Step 3. Define dynamic loading helper functions
3a. Open the shared library
3b. Load function symbols
3c. Close the library
3d. Print platform-correct errors
Implement Speech-to-Intent Detection
Now that loading infrastructure is in place, it's time to initialize Rhino Speech-to-Intent, start capturing audio, and pass frames into the engine.
Step 4: Load the Rhino library
Downloaded the correct library file for your OS and point library_path to the file.
Step 5. Initialize Rhino
- Sign up for an account on Picovoice Console for free and obtain your
AccessKey - Replace
${ACCESS_KEY}with yourAccessKey - Create c custom context using the Picovoice Console and download the context file (
.rhn)
You can also download one of the existing example context files instead of creating your own custom context.
Call pv_rhino_init to create a Rhino instance:
Refer to pv_rhino_init for detailed explanation of parameters.
Step 6. See context info (optional)
You may want to see the context info so you know what commands you can say:
Step 7: Start listening for commands
Rhinorequires int16 PCM frames of a specific length. Query this frame length with pv_rhino_frame_length and configure your recorder to produce frames of that size.- Continuously feed the recorded audio frames into pv_rhino_process.
pv_rhino_processreturns a flagis_finalizedwhen an inference is complete.- When
is_finalizedistrue, use pv_rhino_is_understood to determine if the spoken command was recognized. - If recognized, call pv_rhino_get_intent to retrieve the intent and associated slots.
- Release memory allocated for slots and values using pv_rhino_free_slots_and_values.
- Call pv_rhino_reset before listening for the next command.
Step 7: Cleanup resources
When done, delete Rhino to free memory and close the library:
Complete Example: On-device Speech-to-Intent Detection in C
Here is the complete rhino_tutorial.c you can copy, build, and run, complete with proper error handling and PvRecorder implementation:
- Replace
${ACCESS_KEY}with yourAccessKeyfrom Picovoice Console - update
model_pathto point to theRhinomodel file (.pv) - update
library_pathto point to the correctRhinolibrary for your OS - update
context_path: to point to your chosen context file (.rhn) - update
pv_recorder_library_pathto point to the correctPvRecorderlibrary for your OS
This is a simplified example but includes all the necessary components to get started. Check out the Rhino C demo on GitHub for a complete demo application.
Build & Run
Build and run the application:
Linux (gcc) and Raspberry Pi (gcc)
macOS (clang)
Windows (MinGW)
Troubleshooting Common Issues
1. Voice Commands Never Trigger Inference Detection
Ensure that audio is coming from the intended microphone. If you're using PvRecorder for audio capture, verify that it's functioning correctly before troubleshooting Rhino Speech-to-Intent.
Tips:
- Confirm the microphone is not muted.
- Make sure your application is reading audio frames at the exact size returned by pv_rhino_frame_length.
- Check that your sample rate and PCM format match the engine's requirements (pv_sample_rate, single-channel).
2. Commands Are Frequently Missed
If Rhino Speech-to-Intent rarely detects commands, especially in noisy environments, your sensitivity settings may be too low.
Solution:
Increase the sensitivity value (range 0.0–1.0) during engine initialization. A higher sensitivity reduces missed detections but may slightly increase false positives.
3. High Rate of False Inferences
If Rhino Speech-to-Intent triggers in response to background speech or unrelated sounds, sensitivity may be too high.
Solution:
- Lower the sensitivity during pv_rhino_init.
- Ensure your microphone is not capturing unintended audio sources.
- Avoid overlapping speech or loud background noise near the microphone.
4. Rhino Fails to Initialize
Initialization errors usually indicate mismatched files or platform issues. Common causes include using the wrong library, model, or context file for your system.
Solution:
- Download the correct binaries for your OS and architecture from the Rhino repository.
- Match the context file (
.rhn) and model file (.pv) to the same language. - Ensure the context file and shared library (
.so,.dylib,.dll) are compatible with your target platform.
Example (English "coffee_maker" context on Linux x86_64):
- context file: coffee_maker_linux.rhn
- model file: rhino_params.pv
- library file: libpv_rhino.so







