AI Call Screening

Build an On-Device AI Call Screening App

Screen calls in real time using streaming speech recognition, natural language understanding, and AI voice synthesis. Runs entirely on the device, with no audio ever leaving it.

Platforms supported
AndroidiOSLinuxmacOSWindowsChromeEdgeFirefoxSafariRaspberry Pi
How on-device AI call screening is built

Three SDKs. One pipeline.

On-device AI call screening automatically answers incoming calls, transcribes what the caller says in real time, understands their intent, and presents the phone owner with action options — with no audio ever sent to a cloud service. Picovoice's Cheetah Streaming Speech-to-Text, Rhino Speech-to-Intent, and Orca Text-to-Speech compose into a self-contained pipeline. All three run locally on the device.

PICOVOICE ON-DEVICE PIPELINE · NO NETWORK REQUIREDCallerStreaming TTSOrcaStreaming STTCheetahTranscripttext outputSpeech-to-IntentRhinoActionconnect · block · etc.Phone Ownerscreens the callincoming callAI responsecaller audioshown on screenselects actionintent↺ triggers Orca · loop repeatsaudio / dataAI spoken responsePicovoice SDKHumanOutput
Why Cheetah Streaming Speech-to-Text?

Lowest latency. Lowest compute. No accuracy tradeoff.

<20ms
first token latency
CPU
no GPU needed
12+
platforms

Cheetah Streaming Speech-to-Text beats Google Cloud STT in word error rate and word emission latency across all tested languages, and outperforms Azure STT in several benchmarks, per open-source real-time transcription benchmark — even before it's customized for the use case. Cheetah requires less compute than any other local engine tested. Cheetah Streaming Speech-to-Text →

Word Emission Latency
Lower is better
Azure Real-time530 ms
Cheetah Streaming590 ms
Moonshine Streaming Medium640 ms
Google Streaming830 ms
Amazon Streaming920 ms
Whisper.cpp Streaming Base1,240 ms
Vosk Streaming Large2,000 ms
Core Hour Ratio
Lower is better
Cheetah Streaming0.083x
Vosk Streaming Large0.12x
Whisper.cpp Streaming Base1.67x
Moonshine Streaming Medium3.36x
English Word Error Rate
Lower is better
Amazon Streaming5.6%
Azure Real-time8.2%
Cheetah Streaming10.1%
Moonshine Streaming Medium10.6%
Vosk Streaming Large11.5%
Google Streaming11.9%
Whisper.cpp Streaming Base19.8%
English Punctuation Error Rate
Lower is better
Cheetah Streaming16.1%
Azure Real-time16.4%
Amazon Streaming24.4%
Google Streaming36%
Moonshine Streaming Medium45.1%
Whisper.cpp Streaming Base54.1%
Why Rhino Speech-to-Intent?

End-to-end intent. No transcript. No hallucinations.

6x
Higher accuracy than Big Tech average
97.3%
Accuracy tested across 6 to 24 dB Signal-to-Noise Ratio
Unlimited voice interactions per user

Most voice command systems run a two-step pipeline: speech-to-text converts audio to a transcript, then a separate NLU model parses that transcript for intent. Every step accumulates error and compounds latency. Rhino Speech-to-Intent is an end-to-end engine with a single model that maps spoken audio directly to a structured intent with typed slot values. Higher accuracy even in noisy environments. No hallucinations. No intermediate transcript. Rhino Speech-to-Intent →

Voice Command Acceptance Accuracy
Higher is better
Rhino97.3%
Amazon Lex84.3%
Google Dialogflow77.3%
Acceptance Accuracy at 6 dB SNR
Higher is better
Rhino94%
Amazon Lex76%
Google Dialogflow67%
Why Orca Text-to-Speech?

Natural-sounding TTS at 29 MB peak memory.

29 MB
Peak Memory Usage
130 ms
First-token-to-speech latency
7 MB
Model Size

Most high-quality TTS solutions require hundreds of megabytes of RAM. Orca TTS uses 29 MB peak memory, 10–50× less than any other on-device alternative, except for ESpeak. This makes Orca the only natural-sounding TTS deployable in any environment, including browser tabs, mobile apps with strict out-of-memory limits, and embedded devices. Orca Text-to-Speech →

TTS Latency
Lower is better
Orca TTS Streaming128 ms
ElevenLabs TTS Streaming335 ms
ESpeak TTS1,430 ms
ElevenLabs TTS1,470 ms
Audio Quality
Listen and compare — grouped by peak memory usage.
Peak Memory Usage < 30 MB
ESpeak
Orca
Built for enterprise applications

From mobile OEMs to embedded hardware

Mobile OEMs & device makers

Ship beyond Pixel and iPhone

Mobile OEMs and device manufacturers can ship the same or even more advanced capabilities compared to Google Pixel Call Screening and Apple's native call screening. Picovoice SDKs run on the end-user device. No backend to operate, no privacy agreements to sign. Just a competitive edge over Big Tech.

Telcos & carriers

Beyond STIR/SHAKEN

On-device AI call screening goes beyond STIR/SHAKEN, which only validates the caller ID, and understands what the caller actually says, allowing users to classify their intent in real time. Deploy it as a differentiated feature in your dialer app, or as an upgrade to legacy IVR trees that still cost you DTMF licensing fees.

Healthcare, legal, FSI

Compliance without the friction

Healthcare, legal, and FSI teams can improve employee productivity without extra security steps. Keeping caller audio on-device eliminates an entire category of compliance obligation: no BAA for voice data, no processing agreement with a cloud provider, no breach surface through Picovoice systems.

Embedded hardware

Smart intercoms & access control

Smart intercoms and access control panels can screen visitor voice queries directly on embedded devices such as Raspberry Pi. No latency from network hops, no dependency on external uptime, no breach risks that contradict the premise of a physical security application. Full call screening capability with no cloud footprint.

Get started

On-device AI call screen code example

A complete working recipe in Python. Open-source on GitHub. Runs 100% on-device.

recipe · on-device-ai-call-screening
Difficulty
Beginner
Runtime
100% on-device
Language
Python
Platforms supported
AndroidiOSLinuxmacOSWindowsChromeEdgeFirefoxSafariRaspberry Pi

Prerequisites

Picovoice AccessKey from Picovoice Console and GitHub Repo Clone.
1

Create a virtual environment

Isolate the recipe's dependencies from your system Python.
2

Activate the virtual environment

Activation makes pip install into .venv instead of system Python.
Linux, macOS, or Raspberry Pi
Windows
3

Install dependencies

Pulls in the Cheetah, Rhino, and Orca Python SDKs along with audio I/O.
4

Train the Speech-to-Intent model

Open the Picovoice Console, go to Rhino Speech-to-Intent, create an empty context, and import the Rhino context YAML for this recipe. Download the generated .rhn file for your target platform.
5

Run the AI Call Screening demo

Pass your AccessKey and the path to the .rhn file you just downloaded.
Have questions or looking for implementations in other languages? Visit the GitHub pico-cookbook Call Screen Recipe, where you can find the code and create an issue for demo-related technical questions.
Frequently asked questions

FAQ

+
What is AI call screening?
AI call screening automatically answers an incoming call, transcribes what the caller says, understands their intent, and presents the recipient with action options — without picking up. On-device AI call screening runs entirely on the device with no audio sent to a cloud service, meaning it works offline and no caller data is ever transmitted.
+
How does call screening work on non-Pixel Android phones?
Google's Call Screen is exclusive to Pixel devices and depends on Google's cloud. Picovoice provides Android SDKs that any manufacturer or developer can embed. No dependency on Google services. No cloud round-trip.
+
Can I add call screening to an iOS app?
Yes. Picovoice SDKs support iOS natively. VoIP apps and business phone apps such as Grasshopper, OpenPhone, or Dialpad — and private healthcare communication apps — can use Cheetah Streaming Speech-to-Text, Rhino Speech-to-Intent, and Orca Text-to-Speech to add an on-device call screen even if Apple doesn't share its infrastructure with them.
+
Does on-device call screening work without an internet connection?
Yes. Cheetah Streaming Speech-to-Text, Rhino Speech-to-Intent, and Orca Text-to-Speech run locally, so call screening works fully offline with no dependency on cloud uptime.
+
How is the on-device AI call screen app different from Google Pixel Call Screen?
Pixel Call Screen is proprietary, cloud-dependent, and Pixel-only. Picovoice's pipeline runs entirely on-device using licensable SDKs that work on Android, iOS, Linux, Raspberry Pi, and embedded hardware — with no Google dependency.
+
Can this on-device AI call screen app replace a traditional IVR system?
Yes. Traditional IVR relies on DTMF tones and rigid menu trees. Rhino Speech-to-Intent understands natural spoken phrases and maps them to structured intents without requiring exact phrasing — a conversational IVR replacement that runs on-device with no telephony cloud backend.
+
How is this on-device AI call screen app different from STIR/SHAKEN?
STIR/SHAKEN authenticates caller ID at the carrier level to reduce spoofing. On-device AI call screening is a complementary application-layer capability that operates after the call connects — understanding what the caller says and inferring their intent. The two approaches address different parts of the spam call problem and can be deployed together.
+
Does the on-device AI call screen app store or transmit audio anywhere?
No. Caller and phone owner audio is processed in memory on the device and discarded. It is never transmitted to Picovoice or any third-party cloud. Picovoice has no data controller relationship with your end users, which removes cloud voice data compliance obligations, including BAAs under HIPAA.
+
Can I customize the intents and responses?
Yes. The Rhino context YAML defines all intents and accepted spoken phrases — you can add, remove, or modify it freely in the Picovoice Console. The Orca TTS response text is fully configurable, including custom pronunciation and speech speed control. You can also customize Cheetah Streaming Speech-to-Text to add industry jargon and proper nouns.
+
How can I get technical support for the on-device AI Call Screen App demo?
Visit the GitHub pico-cookbook Call Screen Recipe where you can find the open-source demo code and create an issue for demo-related technical questions.