Smart IVR: Python Tutorial for AI Call Center Automation

🚀 On-device Voice AI & LLMs

Build commercial, non-commercial, research projects using the Forever-Free Plan.

TLDR: Build a smart IVR system for AI call center automation in Python using on-device voice AI. This tutorial shows how to implement wake word detection, intent recognition, local LLM reasoning, and low-latency conversational IVR without cloud dependencies.

Contact center AI depends on fast, accurate voice processing to handle customer queries effectively. On-device processing eliminates network latency entirely, enabling conversational IVR that responds faster than cloud-based alternatives.

Traditional call center IVR or interactive voice response systems frustrate customers with rigid menu trees and slow cloud processing. Callers navigate numbered options, repeat information multiple times, and wait through network round-trips that add 1-2 seconds of latency per interaction. Cloud-based voice APIs compound this latency. For example, speech recognition takes 920ms with Amazon Transcribe Streaming, and text-to-speech adds another 340ms with services like ElevenLabs. Text-based intent classification also increases processing time while reducing accuracy compared to direct speech-to-intent.

This tutorial shows how to build a Python IVR system for an AI call center that routes customer service queries between intent recognition and LLM reasoning. The implementation uses Porcupine Wake Word for voice activation, Rhino Speech-to-Intent for intent recognition, Cheetah Streaming Speech-to-Text and picoLLM for complex user queries, and Orca Streaming Text-to-Speech for natural responses.

What You'll Build:

A conversational IVR system that:

Activates with a custom wake word
Handles common queries instantly using speech-to-intent recognition
Routes unrecognized queries to a local language model for reasoning
Responds with natural speech synthesis

What You'll Need:

Python 3.9+
A desktop or laptop with microphone and speakers
Picovoice AccessKey from the Picovoice Console

To learn more about the advantages and challenges of voice AI agents in customer service, see: Voice AI Agents in Customer Service.

Smart IVR Architecture: Intelligent Call Routing with Speech Recognition

The smart IVR system uses a two-tier approach to handle customer queries efficiently:

Intent Recognition: When a customer speaks, Rhino Speech-to-Intent processes the audio directly. If the customer service voicebot recognizes a known intent with required parameters (e.g., "check order status for order 12345"), it responds immediately. This handles the majority of routine customer service queries with minimal latency.

LLM Reasoning: If Rhino returns is_understood=False for ambiguous or complex queries (e.g., "why was I charged twice when I cancelled my order?"), the system prompts the customer to provide more details, then uses Cheetah Streaming Speech-to-Text to transcribe the explanation and routes it to picoLLM for intelligent reasoning.

This AI IVR architecture optimizes for common cases while handling edge cases flexibly.

Train a Wake Word for Voice Activation

Navigate to the Porcupine page in Picovoice Console.
Enter your wake phrase (e.g., "Hey Assistant") and test it.
Click "Train", select your platform, and download the .ppn model file.

Create Custom Voice Commands for Customer Service Automation

Rhino requires a context file that defines the specific intents the smart IVR will handle. A context specifies the phrases customers might say and what structured data to extract.

Sign up for a Picovoice Console account and navigate to the Rhino page.
Click "Create New Context" and name it CustomerService.
Click the "Import YAML" button in the top-right corner and paste the following context definition:

context:
  expressions:
    checkOrderStatus:
      - "@checkQuery (the) (order) status for order $pv.Alphanumeric:orderId"
      - "track (my) order (number) $pv.Alphanumeric:orderId"
      - "where is (my) order (number) $pv.Alphanumeric:orderId"
    
    checkAccountBalance:
      - "@checkQuery (my) account balance"
      - "how much credit do I have"
    
    returnPolicy:
      - "@checkQuery (your) return policy"
      - "how [do I, to] @returnAction (an) [order, item]"
      - "can I return (the) [order, item]"
    
    speakToHuman:
      - "[speak to, talk to, connect me to] (a) @agentRef in $department:dept"
      - "I want to talk to someone in $department:dept"
      - "(can I) speak with (a) @agentRef"
  
  slots:
    department:
      - billing
      - technical support
      - returns
      - sales
      - customer service
  
  macros:
    checkQuery:
      - check
      - what's
      - what is
    
    returnAction:
      - return
      - send back
      - exchange
    
    agentRef:
      - representative
      - agent
      - person
      - human

Test the context in the browser using the microphone button.
Download the .rhn context file for your target platform.

For production-ready customer service voicebots, expand the context to cover 10-15 common intents. Rhino's expression syntax supports optional phrases, synonyms, and slot types like numbers and dates. See the Rhino Expression Syntax Cheat Sheet for details.

Set up a Local LLM

picoLLM runs compressed language models efficiently on-device for call center automation. Download a model from the picoLLM Console:

Sign in to Picovoice Console and navigate to picoLLM.
Select a model. This tutorial uses llama-3.2-3b-instruct-505.pllm.
Click "Download" and place it in your project directory.

Set Up the Python Environment

Install all required SDKs:

pip install pvporcupine pvcheetah pvrhino pvorca pvrecorder pvspeaker picollm

Implement Wake Word Detection

The smart IVR begins by listening for the wake word before processing customer queries:

import pvporcupine
from pvrecorder import PvRecorder

ACCESS_KEY = "${ACCESS_KEY}"
KEYWORD_PATH = "./models/hey-assistant.ppn"

porcupine = pvporcupine.create(
    access_key=ACCESS_KEY, 
    keyword_paths=[KEYWORD_PATH])
recorder = PvRecorder(frame_length=porcupine.frame_length)
recorder.start()

print("Listening for wake word...")
while True:
    pcm = recorder.read()
    keyword_index = porcupine.process(pcm)
    if keyword_index >= 0:
        print("Wake word detected. Listening...")
        break

recorder.stop()

Implement Real-Time Speech-to-Text and Intent Recognition

After wake word detection, the conversational IVR captures audio and processes it through Rhino Speech-to-Intent to detect known intents:

import pvrhino

ACCESS_KEY = "${ACCESS_KEY}"
CONTEXT_PATH = "./models/customer-service.rhn"

rhino = pvrhino.create(
    access_key=ACCESS_KEY,
    context_path=CONTEXT_PATH)

recorder = PvRecorder(frame_length=rhino.frame_length)
recorder.start()

print("Speak your question...")

while True:
    pcm = recorder.read()
    is_finalized = rhino.process(pcm)
    if is_finalized:
        break

inference = rhino.get_inference()

print(f"[INTENT] {inference.intent if inference.is_understood else 'Not understood'}")

recorder.stop()

Build Intelligent Call Routing Logic for AI IVR Systems

The intelligent call routing logic determines whether to handle the query with intent recognition or route to picoLLM for reasoning:

def handle_customer_query(inference):
    if inference.is_understood:
        # Fast path: structured intent with slots
        return handle_structured_intent(inference.intent, inference.slots)
    else:
        # Fallback: prompt for more details and use LLM
        return None  # Signal to prompt user

def handle_structured_intent(intent, slots):
    """Handle known intents with direct data retrieval"""
    if intent == "checkOrderStatus":
        order_id = slots.get("orderId", "unknown")
        # Query database for order status
        return f"Order {order_id} is currently in transit and expected to arrive on February 2nd."
    
    elif intent == "checkAccountBalance":
        # Query user account
        return "Your current account balance is $127.50."
    
    elif intent == "returnPolicy":
        return "You can return items within 30 days of purchase for a full refund with the original receipt."
    
    elif intent == "speakToHuman":
        dept = slots.get("dept")
        if dept:
            return f"I'm connecting you to a {dept} representative now. Please hold."
        else:
            return "I'm connecting you to a customer service representative now. Please hold."
    
    return "I can help with that. Let me look that up for you."

response = handle_customer_query(inference)
if response:
    print(f"[RESPONSE] {response}")
else:
    print("[ROUTING] Query not understood, will prompt for details...")

Handle Complex Queries with Speech-to-Text

When Rhino Speech-to-Intent doesn't recognize an intent, prompt the customer for more details and use Cheetah Streaming Speech-to-Text to transcribe their explanation:

import pvcheetah

ACCESS_KEY = "${ACCESS_KEY}"

def get_detailed_explanation():
    """Prompt user and transcribe their detailed explanation"""
    # Play prompt via TTS (will implement in next section)
    prompt = "I'd be happy to help. Can you explain your question in more detail?"
    print(f"[PROMPT] {prompt}")
    
    # Use Cheetah to transcribe
    cheetah = pvcheetah.create(
        access_key=ACCESS_KEY,
        endpoint_duration_sec=1.0)
    
    recorder = PvRecorder(frame_length=cheetah.frame_length)
    recorder.start()
    
    print("Listening for detailed explanation...")
    transcript = ""
    
    while True:
        pcm = recorder.read()
        partial_transcript, is_endpoint = cheetah.process(pcm)
        transcript += partial_transcript
        print(partial_transcript, end="", flush=True)
        
        if is_endpoint:
            break
    
    final_transcript = cheetah.flush()
    transcript += final_transcript
    
    recorder.stop()
    cheetah.delete()
    
    print(f"\n[TRANSCRIPT] {transcript}")
    return transcript

# Example usage when intent not understood
if not inference.is_understood:
    transcript = get_detailed_explanation()

Add Local LLM Reasoning for Complex Queries

When Rhino Speech-to-Intent cannot extract a structured intent, picoLLM provides intelligent reasoning while maintaining on-device processing:

from picollm import picoLLM

PICOLLM_ACCESS_KEY = "${ACCESS_KEY}"
MODEL_PATH = "./models/llama-3.2-1b-instruct.pllm"

# Initialize picoLLM once at startup
llm = picoLLM.create(
    access_key=PICOLLM_ACCESS_KEY,
    model_path=MODEL_PATH)

# System prompt defines the bot's behavior
SYSTEM_PROMPT = """You are a helpful customer service assistant. 
Provide clear, concise answers to customer questions. 
If you need information you don't have, acknowledge the limitation and offer to connect them with a human agent."""

conversation_history = []

def handle_with_llm(transcript):
    """Generate response using local language model"""
    global conversation_history
    
    # Build conversation context
    conversation_history.append({"role": "user", "content": transcript})
    
    # Create dialog from history
    dialog = llm.get_dialog()
    dialog.add_system_prompt(SYSTEM_PROMPT)
    
    for msg in conversation_history:
        if msg["role"] == "user":
            dialog.add_human_request(msg["content"])
        else:
            dialog.add_llm_response(msg["content"])
    
    # Generate response
    response = ""
    for token in llm.generate(prompt=dialog.prompt(), completion_token_limit=150):
        response += token
    
    # Store in conversation history
    conversation_history.append({"role": "assistant", "content": response})
    
    # Keep only last 3 exchanges to manage context length
    if len(conversation_history) > 6:
        conversation_history = conversation_history[-6:]
    
    return response

# Example complex query
complex_transcript = "I was charged twice for an order I cancelled. Why did that happen?"
response = handle_with_llm(complex_transcript)
print(f"[LLM RESPONSE] {response}")

Add Text-to-Speech for Conversational IVR

The conversational IVR converts text responses into natural speech using Orca:

import pvorca
from pvspeaker import PvSpeaker
from collections import deque

ACCESS_KEY = "${ACCESS_KEY}"

orca = pvorca.create(access_key=ACCESS_KEY)
speaker = PvSpeaker(
    sample_rate=orca.sample_rate, 
    bits_per_sample=16)

def speak_response(text):
    """Convert text to speech and play audio"""
    pcm_out, _ = orca.synthesize(text)
    
    speaker.start()
    pcm_buffer = deque()
    pcm_buffer.append(pcm_out)
    
    while len(pcm_buffer) > 0:
        pcm = pcm_buffer.popleft()
        written = speaker.write(pcm)
        if written < len(pcm):
            pcm_buffer.appendleft(pcm[written:])
    
    speaker.flush()
    speaker.stop()

# Example usage
speak_response("Your order is on its way and will arrive tomorrow.")

Complete Python Code for Call Center Automation

This complete implementation combines all components into a smart IVR for call center automation:

import argparse
import sys
from collections import deque
import pvporcupine
import pvcheetah
import pvrhino
import pvorca
from pvrecorder import PvRecorder
from pvspeaker import PvSpeaker
from picollm import picoLLM


def handle_structured_intent(intent, slots):
    """Fast path: handle known intents with direct responses"""
    if intent == "checkOrderStatus":
        order_id = slots.get("orderId", "unknown")
        return f"Order {order_id} is currently in transit and expected to arrive on February 2nd."
    
    elif intent == "checkAccountBalance":
        return "Your current account balance is $127.50."
    
    elif intent == "returnPolicy":
        return "You can return items within 30 days of purchase for a full refund with the original receipt."
    
    elif intent == "speakToHuman":
        dept = slots.get("dept")
        if dept:
            return f"I'm connecting you to a {dept} representative now. Please hold."
        else:
            return "I'm connecting you to a customer service representative now. Please hold."
    
    return "I can help with that. Let me look that up for you."


def get_detailed_transcript(cheetah):
    """Capture detailed explanation using Cheetah"""
    recorder = PvRecorder(frame_length=cheetah.frame_length)
    recorder.start()
    
    print("Listening for detailed explanation...")
    transcript = ""
    
    while True:
        pcm = recorder.read()
        partial_transcript, is_endpoint = cheetah.process(pcm)
        transcript += partial_transcript
        print(partial_transcript, end="", flush=True)
        
        if is_endpoint:
            break
    
    final_transcript = cheetah.flush()
    transcript += final_transcript
    
    recorder.stop()
    recorder.delete()
    
    print(f"\n[TRANSCRIPT] {transcript}")
    return transcript


def handle_with_llm(llm, transcript, conversation_history):
    """Fallback path: use LLM for complex queries"""
    system_prompt = """You are a helpful customer service assistant.
Provide clear, concise answers to customer questions.
If you need information you don't have, acknowledge the limitation and offer to connect them with a human agent.
Keep responses under 50 words when possible."""
    
    conversation_history.append({"role": "user", "content": transcript})
    
    dialog = llm.get_dialog()
    dialog.add_system_prompt(system_prompt)
    
    for msg in conversation_history:
        if msg["role"] == "user":
            dialog.add_human_request(msg["content"])
        else:
            dialog.add_llm_response(msg["content"])
    
    response = ""
    for token in llm.generate(prompt=dialog.prompt(), completion_token_limit=150):
        response += token
    
    conversation_history.append({"role": "assistant", "content": response})
    
    # Keep only last 6 messages (3 exchanges)
    if len(conversation_history) > 6:
        conversation_history = conversation_history[-6:]
    
    return response


def speak_response(orca, speaker, text):
    """Convert text to speech and play audio"""
    pcm_out, _ = orca.synthesize(text)
    
    speaker.start()
    pcm_buffer = deque()
    pcm_buffer.append(pcm_out)
    
    while len(pcm_buffer) > 0:
        pcm = pcm_buffer.popleft()
        written = speaker.write(pcm)
        if written < len(pcm):
            pcm_buffer.appendleft(pcm[written:])
    
    speaker.flush()
    speaker.stop()


def main():
    parser = argparse.ArgumentParser(
        description="Smart IVR system with Picovoice for call center automation"
    )
    parser.add_argument("--access_key", required=True, 
                       help="Picovoice AccessKey")
    parser.add_argument("--keyword_path", required=True, 
                       help="Path to .ppn wake word model")
    parser.add_argument("--context_path", required=True, 
                       help="Path to .rhn Rhino context file")
    parser.add_argument("--model_path", required=True, 
                       help="Path to .pllm picoLLM model file")
    args = parser.parse_args()

    porcupine = None
    cheetah = None
    rhino = None
    llm = None
    orca = None
    recorder = None
    speaker = None

    try:
        # Initialize engines
        porcupine = pvporcupine.create(
            access_key=args.access_key,
            keyword_paths=[args.keyword_path])
        cheetah = pvcheetah.create(
            access_key=args.access_key,
            endpoint_duration_sec=1.0)
        rhino = pvrhino.create(
            access_key=args.access_key,
            context_path=args.context_path)
        llm = picoLLM.create(
            access_key=args.access_key,
            model_path=args.model_path)
        orca = pvorca.create(access_key=args.access_key)

        print(f'Porcupine version: {porcupine.version}')
        print(f'Cheetah version: {cheetah.version}')
        print(f'Rhino version: {rhino.version}')
        print(f'Orca version: {orca.version}\n')

        speaker = PvSpeaker(
            sample_rate=orca.sample_rate,
            bits_per_sample=16)
        
        conversation_history = []

        while True:
            # Wake word detection
            recorder = PvRecorder(frame_length=porcupine.frame_length)
            recorder.start()
            
            print("Listening for wake word... (Ctrl+C to stop)")
            while True:
                pcm = recorder.read()
                keyword_index = porcupine.process(pcm)
                if keyword_index >= 0:
                    print("[WAKE WORD DETECTED]")
                    break
            
            recorder.stop()
            recorder.delete()

            # Intent recognition with Rhino
            recorder = PvRecorder(frame_length=rhino.frame_length)
            recorder.start()
            
            print("Speak your question...")
            
            while True:
                pcm = recorder.read()
                is_finalized = rhino.process(pcm)
                if is_finalized:
                    break
            
            inference = rhino.get_inference()
            recorder.stop()
            recorder.delete()
            
            # Route query
            if inference.is_understood:
                print(f"[INTENT] {inference.intent}")
                response = handle_structured_intent(inference.intent, inference.slots)
            else:
                print("[ROUTING] Intent not recognized, prompting for details...")
                prompt = "I'd be happy to help. Can you explain your question in more detail?"
                speak_response(orca, speaker, prompt)
                
                transcript = get_detailed_transcript(cheetah)
                response = handle_with_llm(llm, transcript, conversation_history)
            
            print(f"[RESPONSE] {response}\n")
            
            # Speak response
            speak_response(orca, speaker, response)
            
            rhino.reset()

    except KeyboardInterrupt:
        print("\n[EXIT] Stopping...")
    except pvporcupine.PorcupineActivationLimitError:
        print("AccessKey has reached its processing limit")
    except pvcheetah.CheetahActivationLimitError:
        print("AccessKey has reached its processing limit")
    except pvrhino.RhinoActivationLimitError:
        print("AccessKey has reached its processing limit")
    except pvorca.OrcaActivationLimitError:
        print("AccessKey has reached its processing limit")
    finally:
        if speaker is not None:
            speaker.delete()
        if recorder is not None:
            recorder.delete()
        if orca is not None:
            orca.delete()
        if llm is not None:
            llm.delete()
        if rhino is not None:
            rhino.delete()
        if cheetah is not None:
            cheetah.delete()
        if porcupine is not None:
            porcupine.delete()

    return 0


if __name__ == "__main__":
    sys.exit(main())

Run the Smart IVR System

To run the Smart IVR system in Python, update the model paths to match your local files and have your Picovoice AccessKey ready:

python3 smart_ivr.py \
  --access_key "$ACCESS_KEY" \
  --keyword_path ./models/hey-assistant.ppn \
  --context_path ./models/customer-service.rhn \
  --model_path ./models/llama-3.2-1b-instruct.pllm

The customer service voicebot will listen for your wake word, process customer queries with intelligent call routing, and respond with natural speech.

Extending The AI Customer Service Voicebot

Connect to Phone Systems:

Integrate with VoIP platforms like Twilio or Asterisk.
Route PvSpeaker output back to the phone line for two-way conversations.

Add Multilingual Support:

Create Speech-to-Intent contexts for multiple languages and use Porcupine's multilingual wake word detection to automatically route to the appropriate language context.
Orca Streaming Text-to-Speech also supports multiple languages for voice responses.

Database Integration:

Replace the mock responses in handle_structured_intent() with actual database queries to retrieve real customer data, order statuses, and account information.

Conversation Analytics:

Log all transcripts, detected intents, and LLM responses to track common queries, measure resolution rates, and identify areas where the context needs expansion or LLM responses need refinement.

Human Handoff:

Implement a queue system for the speakToHuman intent that connects to your existing call center software or creates tickets for callback scheduling.

You can start building your own commercial or non-commercial call center automation projects using Picovoice's self-service Console.

Start Building

Frequently Asked Questions

What happens when picoLLM can't answer a question?

Design your system prompt to instruct picoLLM to acknowledge its limitations and offer human escalation. You can also implement confidence thresholding on LLM responses or pattern matching to detect phrases like "I don't know" and automatically route to the speakToHuman intent.

How do I customize the voice and speaking style for my conversational IVR?

Orca Text-to-Speech supports multiple voices with different characteristics. You can adjust the speech rate and select voices optimized for voice customer service interactions.