AI Call Assist

On-Device AI Call Assist with Local LLM Reasoning

Build on-device AI call assist that answers calls, identifies who's calling and why using a local LLM, and gives users smart action options — entirely on-device, no call audio leaves the phone. Licensable SDKs for Android, iOS, Linux, and embedded.

Platforms supported
AndroidiOSLinuxmacOSWindowsChromeEdgeFirefoxSafariRaspberry Pi
How AI call assist is built

On-device Voice AI and Language SDKs in a single pipeline

On-device AI call assist combines Cheetah Streaming Speech-to-Text, Orca Streaming Text-to-Speech, picoLLM Inference, and Rhino Speech-to-Intent in a single local pipeline. Orca greets callers, Cheetah transcribes what they say in real time, picoLLM analyzes what's said and declines the call if it's suspicious. If it's not suspicious, Rhino captures the phone owner's decision — i.e., intent to direct Orca to respond accordingly. Most implementations require the user to select a source language upfront or still route at least one stage through a cloud API. This pipeline eliminates both constraints: every stage runs on the device.

Caller audioINPUTCheetahSTREAMING STTpicoLLMREASONINGOrcaSTREAMING TTSSummary &actionsUIpicoLLM and Rhino loop until the caller's identity and reason are clear0 ms network round-trip · Audio never leaves the device
Why Cheetah Streaming Speech-to-Text?

Lowest latency. Lowest compute. No accuracy tradeoff.

10.1%
WER (English) vs. 11.9% Google and 10.6% Moonshine Medium
0.08
CPU Core-Hour vs. 3.36 Moonshine Medium, 40x less
8.6%
WER (Spanish) vs. 11.6% Google and 9.4% Azure

Cheetah Streaming Speech-to-Text beats Google Cloud STT in word error rate and word emission latency across all tested languages, and outperforms Azure STT in several benchmarks, per the open-source real-time transcription benchmark — even before it's customized for the use case. It emits words at 590ms median latency, typically one word behind the speaker. Cheetah requires less compute than any other local engine tested.

English Word Error Rate
Lower is better
Amazon Streaming5.6%
Azure Real-time8.2%
Cheetah Streaming10.1%
Moonshine Streaming Medium10.6%
Vosk Streaming Large11.5%
Google Streaming11.9%
Whisper.cpp Streaming Base19.8%
English Punctuation Error Rate
Lower is better
Cheetah Streaming16.1%
Azure Real-time16.4%
Amazon Streaming24.4%
Google Streaming36%
Moonshine Streaming Medium45.1%
Whisper.cpp Streaming Base54.1%
Why picoLLM?

Local LLM reasoning with no accuracy tradeoff.

99.9%
Accuracy retained at 3-bit vs. 83.1% for GPTQ for Llama-3-8b
94.5%
Accuracy retained at 2-bit vs. 38.7% for GPTQ for Llama-3-8b
Any
Any transformer architecture on any platform

picoLLM Compression quantizes language models so they run on phones, browsers, and embedded boards with no cloud while preserving task accuracy. Its minimalistic memory usage allows multiple models to run alongside Cheetah, Rhino, and Orca during a live call without introducing any network latency or privacy issues.

3-bit Quantized Llama-3-8b MMLU
Higher is better
Float16 (Original Model)64.9
picoLLM64.8
GPTQ53.9
2-bit Quantized Llama-3-8b MMLU
Higher is better
Float16 (Original Model)64.9
picoLLM61.3
GPTQ25.1
Why Rhino Speech-to-Intent?

End-to-end intent. No transcript. No hallucinations.

6x
Higher accuracy than Big Tech average
97.3%
Accuracy tested across 6 to 24 dB Signal-to-Noise Ratio
Unlimited voice interactions per user

Most voice command systems run a two-step pipeline: speech-to-text converts audio to a transcript, then a separate NLU model parses that transcript for intent. Every step accumulates error and compounds latency. Rhino Speech-to-Intent is an end-to-end engine with a single model that maps spoken audio directly to a structured intent with typed slot values. Higher accuracy even in noisy environments. No hallucinations. No intermediate transcript.

Voice Command Acceptance Accuracy
Higher is better
Rhino97.3%
Amazon Lex84.3%
Google Dialogflow77.3%
Voice Command Acceptance Accuracy at 21 dB SNR
Higher is better
Rhino99%
Amazon Lex87%
Google Dialogflow83%
Why Orca Streaming Text-to-Speech?

Natural-sounding TTS at 29 MB peak memory.

29 MB
Peak Memory Usage
130 ms
First-token-to-speech latency
7 MB
Model Sizee

Most high-quality TTS solutions require hundreds of megabytes of RAM. Orca TTS uses 29 MB peak memory, 10–50× less than any other on-device alternative, except for ESpeak. This makes Orca the only natural-sounding TTS deployable in any environment, including browser tabs, mobile apps with strict out-of-memory limits, and embedded devices. Orca Streaming TTS is built for real-time LLM applications — it starts reading LLM responses as soon as the LLM creates a meaningful word or phrase.

TTS Latency
Lower is better
Orca TTS Streaming128 ms
ElevenLabs TTS Streaming335 ms
ESpeak TTS1,430 ms
ElevenLabs TTS1,470 ms
Audio Quality
Listen and compare — grouped by peak memory usage.
Peak Memory Usage < 30 MB
ESpeak
Orca
Built for enterprise applications

From mobile OEMs to small-business receptionists

Mobile OEMs

Ship beyond Pixel and iPhone

Google Pixel Call Assist and Apple Call Screening are exclusive to Pixel and iPhone hardware. Picovoice SDKs let any OEM ship a comparable or more advanced AI call assist as a built-in dialer feature on Android or as a third-party iOS app. No Google or Apple dependency. No backend to operate.

Telcos & carriers

On-device call assist for carrier dialer apps

STIR/SHAKEN authenticates the caller ID. AI call assist understands what the caller actually says and reasons about it. Carriers can deploy on-device call assist as a differentiated feature in their dialer app — better than any cloud-based answering bot, because it works even when the network is congested.

VoIP & business phone

AI receptionist for VoIP and business phone apps

VoIP and business phone apps embed an on-device AI receptionist that screens unknown calls, identifies prospects, and presents action options. The same SDKs work on iOS even though Apple does not expose Call Screening to third parties. No cloud API costs or privacy issues.

Healthcare, legal, FSI

Private call screening for HIPAA, GDPR, regulated industries

Cloud AI receptionists like Goodcall, Smith.ai, and Rosie process customer audio through their own servers, requiring agreements under HIPAA, GDPR, and relevant regulations. Picovoice has no data controller relationship with your callers — caller audio stays on the device. Compliance reviewers approve faster.

Get started

Build an on-device AI call assist in 3 steps: code example

A complete working recipe in Python. Open-source on GitHub. Runs 100% on-device.

recipe · ai-call-assist
Difficulty
Beginner
Runtime
100% on-device
Language
Python
Platforms supported
AndroidiOSLinuxmacOSWindowsChromeEdgeFirefoxSafariRaspberry Pi

Prerequisites

Picovoice AccessKey from Picovoice Console and GitHub Repo Clone.

Usage

These instructions assume your current working directory is recipes/call-assist/python.
1

Create a virtual environment

Isolate the recipe's dependencies from your system Python.
2

Activate the virtual environment

Activation makes pip install into .venv instead of system Python.
Linux, macOS, or Raspberry Pi
Windows
3

Install dependencies

Install Cheetah, Orca, picoLLM, and Rhino Python SDKs along with audio I/O (i.e., PvRecorder and PvSpeaker).
4

Download the on-device LLM

Download llama-3.2-1b-instruct-385.pllm or a similar model from the Picovoice Console. The reasoning loop runs entirely on your machine using picoLLM Inference.
5

Train the Speech-to-Intent model

Open the Picovoice Console, go to Rhino Speech-to-Intent, create an empty context named call assist, and import the Rhino context YAML for this recipe. Download the generated .rhn file for your target platform. Actions in the recipe include: greet, connect call, decline call, ask for details, ask to text, ask to email, ask to call back, and block caller.
6

Run the call assist demo

Pass your AccessKey, the picoLLM model file, and the path to the .rhn file you just downloaded. You can also pass --username to set the recipient name and --username_pronunciation to control how Orca pronounces it.
Have questions or looking for implementations in other languages? Visit the GitHub pico-cookbook Call Assist Recipe, where you can find the code and create an issue for demo-related technical questions.
Frequently asked questions

FAQ

+
What is AI call assist?
AI call assist automatically answers an incoming call on the user's behalf, transcribes what the caller says, uses a local LLM to identify who is calling and why, asks follow-up questions if information is missing, and finally presents the user with a structured summary and a set of actions — connect, decline, ask to call back, ask to text or email, or block. On-device AI call assist runs entirely on the device with no audio or call content sent to a cloud.
+
How is call assist different from call screening?
Call screening transcribes a caller's message and shows it to the user. Call assist goes further: it actively reasons about the caller using a local LLM, asks follow-up questions if the caller has not given enough information, and produces a structured summary the user can act on without listening to the full transcript. Both run on-device with Picovoice.
+
How is on-device AI call assist different from Pixel Call Assist or iPhone Call Screening?
Apple's Call Screening and Google's Call Assist are exclusive to their hardware and tied to their platforms. Picovoice provides licensable SDKs that any OEM, carrier, or app developer can embed into Android, iOS, Linux, or Raspberry Pi builds — with no Google or Apple dependency, no cloud round-trip, and no per-call API charge.
+
Is the LLM running locally?
Yes. picoLLM Inference runs compressed LLMs such as Llama 3.2 1B directly on-device. No prompt or response is sent to any external server. The same applies to Cheetah Streaming Speech-to-Text, Rhino Speech-to-Intent, and Orca Streaming Text-to-Speech.
+
Can AI call assist work without an internet connection?
Yes. Cheetah, Orca, Rhino, and picoLLM all run locally, so the entire call assist pipeline processes data offline with no dependency on cloud uptime — useful for areas with weak coverage, in-flight, or any environment where carrier signal is unreliable.
+
Can I add AI call assist to a VoIP or business phone app?
Yes. Picovoice SDKs support iOS, Android, and major desktop platforms natively. VoIP and business phone apps such as OpenPhone, Dialpad, Grasshopper, Aircall, RingCentral, or healthcare VoIP apps can embed an on-device call assist flow even on iOS where Apple does not expose Call Screening to third parties.
+
Does on-device AI call assist store or transmit call audio?
No. Caller and user audio is processed in memory on the device and discarded. It is never transmitted to Picovoice or any third-party cloud. Picovoice has no data controller relationship with end users, which removes cloud voice data compliance obligations including BAAs under HIPAA.
+
Can I customize what the AI says and how it reasons?
Yes. The Orca TTS responses are fully configurable, including custom pronunciation of names, speaking rate, and tone. The picoLLM prompt and reasoning behavior is yours to define. The Rhino context YAML defines all the actions the user can pick: connect, decline, ask to text, ask to email, ask to call back, block, and so on. Cheetah Streaming Speech-to-Text can be customized with industry jargon and proper nouns.
+
How can I get technical support for the on-device AI Call Assist demo?
Visit the GitHub pico-cookbook Call Assist Recipe where you can find the open-source demo code and create an issue for demo-related technical questions or reach out to your Picovoice contact.