🎯 Voice AI Consulting
Get dedicated support and consultation to ensure your specific needs are met.
Consult an AI Expert

Building voice control into C applications requires real-time audio processing and consistent performance across Linux, Windows, macOS, and Raspberry Pi. Many developers use cloud-based solutions like Google Dialogflow, Amazon Lex, and IBM Watson—but all require sending audio to remote servers, introducing latency, connectivity dependencies, and privacy concerns.

For custom voice control in domain-specific applications, Speech-to-Intent offers a better approach. Instead of transcribing "Turn on the bedroom lights" to text, Speech-to-Intent directly extracts structured meaning:

This tutorial shows you how to build cross-platform voice command recognition in C using Rhino Speech-to-Intent, an on-device Speech-to-Intent engine. Rhino Speech-to-Intent processes voice commands locally and maps spoken phrases directly to actionable intents—no cloud round-trips, transcription overhead, or external audio transmission required. An open-source benchmark demonstrates that Rhino is six times more accurate than Google Dialogflow, Amazon Lex, and IBM Watson.

By the end of this guide, you'll have a working C application that recognizes custom voice commands entirely on-device with better privacy, reliability, and accuracy than cloud alternatives.

Important: This tutorial builds on How to Record Audio in C. Ensure your audio capture environment is ready before continuing.

Prerequisites

  • C99-compatible compiler
  • Windows: MinGW

Supported Platforms

  • Linux (x86_64)
  • macOS (x86_64, arm64)
  • Windows (x86_64, arm64)
  • Raspberry Pi (Zero, 3, 4, 5)

Project Setup

The tutorial will use the following directory structure:

For instructions on setting up audio capture (with pvrecorder), see: How to Record Audio in C

Step 1: Add Rhino library files

  1. Create a folder named rhino/.
  2. Download the Rhino header files from GitHub and place them in:
  1. Download a Rhino model file and place it in:

Dynamic Loading Infrastructure

Rhino Speech-to-Intent ships as a shared library (.so, .dylib, .dll). Instead of linking at compile time, we'll load the library at runtime.

We'll build helpers to:

  1. open a shared library
  2. fetch function pointers
  3. close it gracefully

These helpers remain identical whether you're using PvRecorder, Cheetah Streaming Speech-to-Text, Porcupine Wake Word, Rhino Speech-to-Intent, or other Picovoice engines.

Step 2: Platform-specific headers

Explaining the headers

  • On Windows systems, windows.h provides the LoadLibrary function to load a shared library and GetProcAddress to retrieve individual function pointers.
  • On Unix-based systems, dlopen and dlsym from the dlfcn.h header provide the same functionality.
  • Lastly, signal.h allows us to handle Ctrl-C later in this example.

Step 3. Define dynamic loading helper functions

3a. Open the shared library

3b. Load function symbols

3c. Close the library

3d. Print platform-correct errors

Implement Speech-to-Intent Detection

Now that loading infrastructure is in place, it's time to initialize Rhino Speech-to-Intent, start capturing audio, and pass frames into the engine.

Step 4: Load the Rhino library

Downloaded the correct library file for your OS and point library_path to the file.

Step 5. Initialize Rhino

  1. Sign up for an account on Picovoice Console for free and obtain your AccessKey
  2. Replace ${ACCESS_KEY} with your AccessKey
  3. Create c custom context using the Picovoice Console and download the context file (.rhn)

You can also download one of the existing example context files instead of creating your own custom context.

Call pv_rhino_init to create a Rhino instance:

Refer to pv_rhino_init for detailed explanation of parameters.

Step 6. See context info (optional)

You may want to see the context info so you know what commands you can say:

Step 7: Start listening for commands

  1. Rhino requires int16 PCM frames of a specific length. Query this frame length with pv_rhino_frame_length and configure your recorder to produce frames of that size.
  2. Continuously feed the recorded audio frames into pv_rhino_process.
  3. pv_rhino_process returns a flag is_finalized when an inference is complete.
  4. When is_finalized is true, use pv_rhino_is_understood to determine if the spoken command was recognized.
  5. If recognized, call pv_rhino_get_intent to retrieve the intent and associated slots.
  6. Release memory allocated for slots and values using pv_rhino_free_slots_and_values.
  7. Call pv_rhino_reset before listening for the next command.

Step 7: Cleanup resources

When done, delete Rhino to free memory and close the library:

Complete Example: On-device Speech-to-Intent Detection in C

Here is the complete rhino_tutorial.c you can copy, build, and run, complete with proper error handling and PvRecorder implementation:

  • Replace ${ACCESS_KEY} with your AccessKey from Picovoice Console
  • update model_path to point to the Rhino model file (.pv)
  • update library_path to point to the correct Rhino library for your OS
  • update context_path: to point to your chosen context file (.rhn)
  • update pv_recorder_library_path to point to the correct PvRecorder library for your OS

This is a simplified example but includes all the necessary components to get started. Check out the Rhino C demo on GitHub for a complete demo application.

Build & Run

Build and run the application:

Linux (gcc) and Raspberry Pi (gcc)

macOS (clang)

Windows (MinGW)

Troubleshooting Common Issues

1. Voice Commands Never Trigger Inference Detection

Ensure that audio is coming from the intended microphone. If you're using PvRecorder for audio capture, verify that it's functioning correctly before troubleshooting Rhino Speech-to-Intent.

Tips:

  • Confirm the microphone is not muted.
  • Make sure your application is reading audio frames at the exact size returned by pv_rhino_frame_length.
  • Check that your sample rate and PCM format match the engine's requirements (pv_sample_rate, single-channel).

2. Commands Are Frequently Missed

If Rhino Speech-to-Intent rarely detects commands, especially in noisy environments, your sensitivity settings may be too low.

Solution:

Increase the sensitivity value (range 0.0–1.0) during engine initialization. A higher sensitivity reduces missed detections but may slightly increase false positives.

3. High Rate of False Inferences

If Rhino Speech-to-Intent triggers in response to background speech or unrelated sounds, sensitivity may be too high.

Solution:

  • Lower the sensitivity during pv_rhino_init.
  • Ensure your microphone is not capturing unintended audio sources.
  • Avoid overlapping speech or loud background noise near the microphone.

4. Rhino Fails to Initialize

Initialization errors usually indicate mismatched files or platform issues. Common causes include using the wrong library, model, or context file for your system.

Solution:

  • Download the correct binaries for your OS and architecture from the Rhino repository.
  • Match the context file (.rhn) and model file (.pv) to the same language.
  • Ensure the context file and shared library (.so, .dylib, .dll) are compatible with your target platform.

Example (English "coffee_maker" context on Linux x86_64):

Start Building

Frequently Asked Questions

What is Rhino Speech-to-Intent and how is it different from STT?
Rhino is an on-device Speech-to-Intent engine. Unlike standard speech-to-text (STT), it directly converts spoken commands into structured intents without sending audio to the cloud. This makes it faster, privacy-friendly, and suitable for offline voice control applications.
How do I add custom commands or intents?
Use the Picovoice Console to create custom context files (.rhn) tailored to your application. You can define multiple intents and expressions, then download the context for use in your C project.
Does Rhino require an internet connection?
Rhino processes voice commands entirely on-device and does not send audio to the cloud. Internet is only required for licensing and usage tracking.
What programming languages can I use with Rhino?
Rhino provides C, C++, Python, JavaScript, Java bindings, and more. Refer to the official Rhino documentation for the full list.