Rhino Speech-to-Intent

Build voice user interfaces with on-device intent detection from speech

Speech-to-Intent engine fuses Natural Language Understanding with Speech-to-Text, outperforming Google Dialogflow, Amazon Lex, and IBM Watson.

Start Free Contact Sales

Click to activate

Loved by developers, trusted by enterprises

What is Rhino Speech-to-Intent?

Rhino Speech-to-Intent infers user intents from utterances, allowing users to interact with applications via voice. Rhino Speech-to-Intent understands complex voice commands, such as "find the maintenance checklist for Boeing 707" or "call 987 655 4433".

Get started with just a few lines of code

1o = pvrhino.create(
2  access_key,
3  context_path)
4
5while not o.process(audio()):
6  pass
7
8inference = o.get_inference()

1let o = new Rhino(
2  accessKey,
3  contextPath);
4
5while (!o.process(audio())) { }
6
7let inference = o.getInference();

1RhinoManagerCallback
2callback =
3new RhinoManagerCallback() {
4  @Override
5  public void invoke(
6    RhinoInference inference) {
7    // Inference callback
8  }
9}
10
11RhinoManager o =
12new RhinoManager.Builder()
13  .setAccessKey(accessKey)
14  .setContextPath(contextPath)
15  .build(
16    appContext,
17    callback);
18
19o.start()

1let o = RhinoManager(
2  accessKey: accessKey,
3  contextPath: contextPath,
4  onInferenceCallback:
5    { inference in
6      // Inference callback
7    }
8  );
9
10try o.start()

1const {
2  inference,
3  contextInfo,
4  isLoaded,
5  isListening,
6  error,
7  init,
8  process,
9  release,
10} = useRhino();
11
12await init(
13  accessKey,
14  context,
15  model
16);
17
18await process();
19
20useEffect(() => {
21  if (inference !== null) {
22    // Handle inference
23  }
24}, [inference]);

1RhinoManager o =
2  await RhinoManager.create(
3    accessKey,
4    contextPath,
5    (inference) => {
6      // Inference callback
7    });
8
9await o.process()

1let o =
2  await RhinoManager.create(
3    accessKey,
4    contextPath,
5    (inference) => {
6      // Inference callback
7    });
8
9await o.process()

1RhinoManager o =
2  RhinoManager.Create(
3    accessKey,
4    contextPath,
5    (inference) => {
6      // Inference callback
7    });
8
9o.Start();

1Rhino o = Rhino.Create(
2  accessKey,
3  contextPath);
4
5while (o.Process(AudioFrame())) {
6
7}
8
9Inference inference =
10  o.GetInference();

1Rhino o = new Rhino.Builder()
2  .setAccessKey(accessKey)
3  .setContextPath(contextPath)
4  .build();
5
6while (!o.process(audioFrame())) {
7
8}
9
10RhinoInference inference =
11  o.getInference();

1pv_rhino_init(
2  access_key,
3  model_path,
4  context_path,
5  sensitivity,
6  require_endpoint,
7  &rhino);
8
9while (true) {
10  pv_rhino_process(
11    rhino,
12    audio_frame(),
13    &is_finalized);
14
15  if (is_finalized) {
16    pv_rhino_get_intent(
17      rhino,
18      &intent,
19      &num_slots,
20      &slots,
21      &values);
22  }
23}

Start Free View Docs

Why Rhino Speech-to-Intent is the best for domain-specific voice assistants and AI agents

Traditional voice systems use a two-step process: Speech-to-Text (convert voice to text) and Natural Language Understanding (analyze text for intent).

Rhino Speech-to-Intent uses an innovative single-step approach: Direct Speech-to-Intent - fusing speech recognition and intent detection into one optimized process. This unified approach eliminates the accuracy loss and latency issues common in conventional systems.

Why choose Rhino Speech-to-Intent over other Intent Detection Engines?

97%+ accuracy

6x higher accuracy

97%+ accuracy - open-source natural language understanding benchmark shows Rhino Speech-to-Intent outperforms cloud NLU APIs
Superior performance across different accents, noise levels, and reverb conditions
Optimized for domain-specific applications

Guaranteed response time

Real-time performance

Zero network latency - processes commands instantly on-device
No downtime - processes voice data without relying on internet connectivity
Consistent response times - not affected by network fluctuations

Click to activate

Privacy by design

Privacy and compliance

CCPA, GDPR, and HIPAA compliant - voice data never leaves your device
No data recording - commands processed locally without storage
Enterprise-ready - safe for meeting rooms, warehouses, examination rooms, call centers

Supported Platforms

Cross platform - unified experiences anywhere!

Desktop & Servers: Linux, macOS, Windows
Mobile: Android, iOS
Web Browsers: Chrome, Safari, Edge, Firefox
Embedded: Single Board Computers, Microcontrollers

Get started with

Rhino Speech-to-Intent

The best way to see how Rhino Speech-to-Intent differs from other natural language understanding solutions is to try it!

Start Free

Custom Voice Commands
Platform-optimized model training
Intuitive SDKs
Unlimited interactions per user
English, French, German, Italian, Japanese, Korean, Chinese (Mandarin), Portuguese, and Spanish

Voice Content Moderation with AI

Open-source Natural Language Understanding Datasets

Best Speech to Text for Voice Assistants

3 Challenges of Call Centers Implementing Speech Analytics

3 Tips for Implementing Voice AI in Contact Centres

Buyers’ Guide to Natural Language Understanding

Frequently asked questions

What are the use cases and applications of Natural Language Understanding?

Enterprise Voice Automation

Manufacturing floor voice commands
Warehouse management systems
Quality control voice interfaces
Industrial equipment control

Healthcare Applications

Medical device voice control
Patient data entry systems
Hands-free clinical workflows
HIPAA-compliant voice interfaces

Smart Home and IoT

Voice-controlled appliances
Home automation systems
IoT device integration
Custom voice assistants and AI agents

Automotive and Transportation

In-vehicle voice commands
Fleet management systems
Navigation voice control
Driver assistance interfaces

What is Natural Language Understanding (NLU)?

Natural language understanding deals with meaning, i.e., comprehending users’ intent. Researchers initially started with understanding user intents from the text. While spoken language understanding is a more specific term to refer to understanding user intent from speech, many people, including the industry and researchers, still use natural language understanding for both text and speech data. This is mainly due to the conventional approach of running speech-to-text and natural language understanding engines subsequently.

What is intent detection?

Intent Detection is a subtask of natural language processing and a critical component of any task-oriented system. Natural language understanding solutions match users' utterances with one of the predefined classes by understanding the user’s goal (i.e., intention). After matching utterances with intents, the software can initiate a task to achieve users’ goals. For example, users with the intention to turn the lights off may say: “Turn the lights off.”, “Switch off the lights.”, “Can you please turn the lights off?”. Intent detection captures the users’ goal: “change the state of the lights from on to off” despite the different ways to communicate it.

Can I use Rhino Speech-to-Intent to overcome the limitations of Amazon Lex and Google Dialogflow?

Rhino Speech-to-Intent is a more accurate, resource-efficient, and faster alternative to Amazon Lex, Google DialogFlow, or other NLU engines for use-case-specific intent detection. Picovoice’s Free Trial allows enterprises to evaluate Rhino Speech-to-Intent and compare it with the alternatives. However, if you’re still not sure how to overcome the limitations of Amazon Lex, Google DialogFlow, and other NLU engines with Rhino Speech-to-Intent or need help with migration, leverage Picovoice’s Consulting Services!

How does Rhino Speech-to-Intent differ from Natural Language Understanding (NLU) solutions such as Amazon Lex, Google DialogFlow, IBM Watson Natural Language Understanding, or Microsoft LUIS??

Rhino Speech-to-Intent -as the name suggests, converts speech into intent directly without relying on text, eliminating the need for text representation. Rhino Speech-to-Intent uses the modern end-to-end approach to infer intents and intent details directly from spoken commands. This enables developers to train jointly optimized automatic speech recognition (ASR) and natural language understanding (NLU) engines tailored to their specific domain, achieving higher accuracy.

Rhino Speech-to-Intent excels in use-case-specific applications, such as voice-enabled coffee machines or surgical robots, which involve a limited number of commands, offering high accuracy with minimal resources. In contrast, open-domain applications like voice-enabled ChatGPT handle a wide range of topics and variations. Thus, we recommend Cheetah Streaming Speech-to-Text and picoLLM for such applications.

How do I learn more about the terminology used for Natural Language Understanding (NLU) Engines?

Intents, expressions, and slots are commonly used in conversational AI and across various engines such as Amazon Lex, IBM Watson, Google Dialogflow, or Rasa NLU. They’re used to build voice assistants or bots. You can check out the Rhino Speech-to-Intent Syntax Cheat Sheet to start building or the Picovoice Glossary to learn the terminology.

Does Rhino Speech-to-Intent process voice data locally on the device?

Rhino Speech-to-Intent processes voice data locally on the device.

Which platforms does Rhino Speech-to-Intent support?

Single Board Computers: Raspberry Pi
Desktop and Servers: Linux, macOS, and Windows
Mobile Devices: Android and iOS
Web Browsers: Chrome, Safari, Edge, and Firefox
Single Board Computers: Raspberry Pi
Microcontrollers: Arm Cortex-M, STM32, and Arduino

How do I get technical support for Rhino Speech-to-Intent?

Picovoice docs, blog, Medium posts, and GitHub are great resources to learn about voice AI, Picovoice technology, and how to start building with voice commands. Enterprise customers get dedicated support specific to their applications from Picovoice Product & Engineering teams. While Picovoice customers reach out to their contacts, prospects can also purchase Enterprise Support before committing to any paid plan.

Which languages does Rhino Speech-to-Intent support?

Rhino Speech-to-Intent supports English, French, German, Italian, Japanese, Korean, Chinese (Mandarin), Portuguese, and Spanish.

What should I do if I need support for other languages?

Reach out to Picovoice Consulting team to get a custom language model trained for your use case.

How can I get informed about updates and upgrades?

Version changes appear in the and LinkedIn. Subscribing to GitHub is the best way to get notified of patch releases. If you enjoy building with Rhino Speech-to-Intent, show it by giving a GitHub star!