Python Tutorial for Smart TV Voice Assistant

🚀 On-device Voice AI & LLMs

Build commercial, non-commercial, research projects using the Forever-Free Plan.

TLDR: Add voice-control to smart TV with on-device voice AI in Python. Voice searches route through speech-to-intent for instant results, while open-ended requests go to Picovoice's local LLM. All voice models stay on-device, keeping latency low and user data private.

Smart TV voice search works best when responses feel instant. Cloud-based pipelines tend to add network latency at each processing stage, which can make the experience feel slower than expected. On-device voice AI keeps speech processing local to the device, cutting out network round-trips and keeping user data private.

This tutorial covers building a smart TV voice assistant in Python. A custom wake word activates the voice search hands-free, structured content commands route through speech-to-intent for instant catalog lookups, and open-ended requests go to a local LLM for personalized recommendations, all on-device.

What You'll Build:

A smart TV voice assistant that:

Activates using two custom wake phrases — one for voice commands (e.g., "Hey TV") and one for personalized recommendations (e.g., "Hey Assistant")
Searches the local content catalog instantly for structured queries
Routes open-ended requests to a local LLM for intelligent content matching
Responds with natural speech synthesis

The voice search system's fully on-device architecture ensures that it:

Delivers low-latency responses with all speech processing running locally on the device's hardware.
Keeps all user audio and viewing preferences on-device, meeting GDPR and CCPA privacy compliance expectations for in-home devices.

What You'll Need:

Python 3.9+
Laptop/Desktop with Microphone and speakers for testing
Picovoice AccessKey from the Picovoice Console

Smart TV Voice Search Architecture

This Python-based voice search system uses an on-device architecture designed for instant content discovery and personalized recommendations:

Always-Listening Activation — The voice search system sits in a low-power, idle state using Porcupine Wake Word to monitor the audio stream for two distinct wake phrases. Detecting "Hey TV" routes to instant content search, while "Hey Assistant" routes to the personalized recommendation assistant. This dual-keyword approach lets viewers choose the right path upfront.
Intent Recognition for Content Search — When "Hey TV" is detected, the audio is analyzed by Rhino Speech-to-Intent. Instead of transcribing words one by one, it maps the speech directly to a structured content query — like "search action movies" or "resume watching." The system queries the local content catalog and returns results immediately without further processing.
Speech-to-Text for Personalized Requests — When "Hey Assistant" is detected, the system routes directly to Cheetah Streaming Speech-to-Text. This engine captures the full detail of open-ended requests that can't be matched to a fixed intent structure.
On-Device Language Model — The transcribed request is passed to picoLLM along with the device's content catalog. The local language model interprets what the viewer is looking for and matches it against available titles, returning structured recommendations without any cloud processing.
Voice Response Generation — Orca Streaming Text-to-Speech converts the response into natural speech, completing the hands-free loop from query to recommendation.

Content Search Workflow:

Viewer: "Hey TV, find action movies"
   ↓
[Porcupine] Detects "Hey TV" → Routes to content search
   ↓
[Rhino] Recognizes intent → {"intent": "searchByGenre", "genre": "action"}
   ↓
[Content Catalog] Queries local library → Returns top-rated matches
   ↓
[Orca] Speaks results → "Here are some action movies for you..."

Personalized Recommendation Workflow:

Viewer: "Hey Assistant, something fun for the whole family"
   ↓
[Porcupine] Detects "Hey Assistant" → Routes to recommendation assistant
   ↓
[Cheetah] Transcribes: "something fun for the whole family"
   ↓
[picoLLM] Matches against catalog → Returns personalized recommendations
   ↓
[Orca] Speaks picks → "Here are some family-friendly options..."

All Picovoice models — Porcupine Wake Word, Rhino Speech-to-Intent, Cheetah Streaming Speech-to-Text, and Orca Streaming Text-to-Speech — support multiple languages including English, Spanish, German and more. Build multilingual voice search to serve international markets by training models in the languages your target regions speak.

Train Custom Wake Words for Smart TV Voice Search

Sign up for a Picovoice Console account and navigate to the Porcupine page.
Enter your first wake phrase for content commands (e.g., "Hey TV") and test it using the microphone button.
Click "Train," select the target platform, and download the .ppn model file.
Repeat steps 2 & 3 to train an additional wake word for personalized recommendations (e.g., "Hey Assistant").

Porcupine can detect multiple wake words simultaneously. For instance, support both "Hey TV" and "Hey Assistant" for different interaction modes. For tips on designing an effective wake word, review the choosing a wake word guide.

Define Voice Commands for Content Discovery

Create an empty Rhino Speech-to-Intent Context.
Click the "Import YAML" button in the top-right corner of the console and paste the YAML provided below to define intents for structured content search commands.
Test the model with the microphone button and download the .rhn context file for your target platform.

You can refer to the Rhino Syntax Cheat Sheet for more details on building custom contexts.

YAML Context for Smart TV Content Discovery:

context:
  expressions:
    searchByGenre:
      - "[search for, find, show me] $genre:genre (@contentType)"
      - "I want to watch (a) $genre:genre (@contentType)"
      - "play (a) $genre:genre (@contentType)"
    
    resumeContent:
      - "[resume, continue] watching"
      - "pick up where I left off"
      - "what was I watching"
    
    topRated:
      - "show me [top rated, highly rated, best, popular] (@contentType)"
      - "[what are, find] the [best, top, most popular] (@contentType)"
      - "what's popular (right now)"
    
    nowPlaying:
      - "what is [playing, on] (right now)"
      - "what's [on, playing] [right now, now]"
      - "show me what's on (right now)"
    
    addToWatchlist:
      - "add (that) to (my) watchlist"
      - "[save, bookmark] (that) for later"
      - "remember that for me"

  slots:
    genre:
      - action
      - comedy
      - drama
      - thriller
      - romance
      - horror
      - documentary
      - sci-fi
      - animation
      - family
      - adventure
      - mystery
      - fantasy
      - crime
  
  macros:
    contentType:
      - movie
      - movies
      - show
      - shows
      - series
      - something

This context handles the most common structured content search commands. For open-ended requests like "something relaxing to watch tonight" or "a movie similar to what I watched last night," the assistant will use the picoLLM recommendation path.

Set Up Local Large Language Model

Navigate to the picoLLM page in Picovoice Console.
Select a model. This tutorial uses llama-3.2-3b-instruct-505.pllm.
Download the .pllm file and place it in your project directory.

Install Required Python Libraries for Smart TV Voice Search

Install all required Python SDKs and dependencies using pip:

Porcupine Wake Word Python SDK: pvporcupine
Rhino Speech-to-Intent Python SDK: pvrhino
Cheetah Streaming Speech-to-Text Python SDK: pvcheetah
picoLLM Python SDK: picollm
Orca Streaming Text-to-Speech Python SDK: pvorca
Picovoice Python Recorder library: pvrecorder
Picovoice Python Speaker library: pvspeaker

pip install pvporcupine pvrhino pvcheetah picollm pvorca pvrecorder pvspeaker

Add Wake Word Detection for Hands-Free Activation

The following code captures audio from your microphone and detects the custom wake word locally:

import pvporcupine
from pvrecorder import PvRecorder

ACCESS_KEY = "${ACCESS_KEY}"
COMMAND_KEYWORD_PATH = "${COMMAND_KEYWORD_PATH}"  # Path to "Hey TV" .ppn file
QUERY_KEYWORD_PATH = "${QUERY_KEYWORD_PATH}"  # Path to "Hey Assistant" .ppn file

porcupine = pvporcupine.create(
    access_key=ACCESS_KEY,
    keyword_paths=[COMMAND_KEYWORD_PATH, QUERY_KEYWORD_PATH]
)

recorder = PvRecorder(frame_length=porcupine.frame_length)
recorder.start()

print("Listening for wake word...")

try:
    while True:
        pcm = recorder.read()
        keyword_index = porcupine.process(pcm)
        
        if keyword_index == 0:
            print("Content search wake word detected - routing to content commands")
            # Route to Rhino for structured content search
            break
        elif keyword_index == 1:
            print("Recommendation wake word detected - routing to personalized assistant")
            # Route to Cheetah + picoLLM for recommendations
            break
except KeyboardInterrupt:
    print("\nStopping...")
finally:
    recorder.stop()
    recorder.delete()
    porcupine.delete()

Porcupine Wake Word processes each audio frame on-device with acoustic models optimized for living room environments. By listening for multiple wake words simultaneously, it routes viewers to the right system path instantly — content search or personalized recommendations — without continuous cloud streaming.

Process Content Search Commands

Once the wake word is detected, Rhino Speech-to-Intent listens for structured content queries:

import pvrhino
from pvrecorder import PvRecorder

ACCESS_KEY = "${ACCESS_KEY}"
CONTEXT_PATH = "${CONTEXT_PATH}"  # Path to .rhn file

rhino = pvrhino.create(
    access_key=ACCESS_KEY,
    context_path=CONTEXT_PATH
)

recorder = PvRecorder(frame_length=rhino.frame_length)
recorder.start()

print("Listening for content command...")

try:
    while True:
        pcm = recorder.read()
        is_finalized = rhino.process(pcm)
        
        if is_finalized:
            inference = rhino.get_inference()
            
            if inference.is_understood:
                print('{')
                print("  intent : '%s'" % inference.intent)
                print('  slots : {')
                for slot, value in inference.slots.items():
                    print("    %s : '%s'" % (slot, value))
                print('  }')
                print('}\n')
                
                # Route to content catalog search
                handle_content_command(ACCESS_KEY, inference.intent, inference.slots)
            else:
                print("Didn't understand the command. Please try again.")
            
            break
except KeyboardInterrupt:
    print("\nStopping...")
finally:
    recorder.stop()
    recorder.delete()
    rhino.delete()

Rhino Speech-to-Intent directly infers intent from speech without requiring a separate transcription step, enabling instant content catalog lookups for structured queries.

Handle Personalized Recommendations with AI

When viewers say "Hey Assistant," the system routes directly to streaming speech-to-text and local LLM for open-ended content discovery:

import json
import pvcheetah
import picollm
from pvrecorder import PvRecorder

ACCESS_KEY = "${ACCESS_KEY}"
PICOLLM_MODEL_PATH = "${PICOLLM_MODEL_PATH}"  # Path to .pllm file

CONTENT_CATALOG = [
    {"id": "1001", "title": "The Last Frontier", "type": "movie", "genre": ["action", "adventure"],
     "year": 2023, "rating": 8.4, "description": "A retired soldier uncovers a conspiracy that reaches the highest levels of government."},
    {"id": "1002", "title": "Laugh Track", "type": "movie", "genre": ["comedy"],
     "year": 2024, "rating": 7.2, "description": "A stand-up comedian's life is turned upside down when a celebrity endorses their open mic set."},
    {"id": "1003", "title": "Midnight Echoes", "type": "movie", "genre": ["thriller", "mystery"],
     "year": 2022, "rating": 8.1, "description": "A detective traces a series of cryptic messages leading to a decades-old unsolved case."},
    {"id": "1004", "title": "Wild Kingdom", "type": "show", "genre": ["documentary", "family"],
     "year": 2023, "rating": 8.7, "description": "An immersive documentary series exploring ecosystems across six continents."},
    {"id": "1005", "title": "Star Odyssey", "type": "movie", "genre": ["sci-fi", "adventure"],
     "year": 2024, "rating": 8.9, "description": "Humanity's first interstellar crew faces impossible odds on a mission to find a new home."},
    {"id": "1006", "title": "Family Circus", "type": "movie", "genre": ["animation", "family", "comedy"],
     "year": 2023, "rating": 7.8, "description": "A chaotic circus family embarks on a road trip that brings them closer together."},
    {"id": "1007", "title": "Dark Horizon", "type": "movie", "genre": ["thriller", "action"],
     "year": 2024, "rating": 7.9, "description": "An international fugitive races against time to clear their name before a global summit."},
    {"id": "1008", "title": "Crimson City", "type": "show", "genre": ["crime", "drama"],
     "year": 2022, "rating": 8.5, "description": "A detective navigates corruption and rivalry in a city where everyone has secrets."},
]

RECOMMENDATION_PROMPT = """You are a smart TV content recommendation assistant.
Given a user's request and a catalog of available content, recommend the most relevant titles.
Respond only in JSON format: {"recommendations": [{"id": "content_id", "reason": "brief reason"}]}
Only recommend content that exists in the provided catalog. Limit to 3 recommendations."""


def handle_personalized_query():
    """Process open-ended content requests using Cheetah + picoLLM"""
    
    # Initialize Cheetah for speech-to-text
    cheetah = pvcheetah.create(
        access_key=ACCESS_KEY,
        endpoint_duration_sec=1.0
    )
    
    recorder = PvRecorder(frame_length=cheetah.frame_length)
    recorder.start()
    
    print("Speak your request...")
    transcript = ""
    
    try:
        while True:
            pcm = recorder.read()
            partial_transcript, is_endpoint = cheetah.process(pcm)
            transcript += partial_transcript
            print(partial_transcript, end="", flush=True)
            
            if is_endpoint:
                final_transcript = cheetah.flush()
                transcript += final_transcript
                print(final_transcript)
                break
    except KeyboardInterrupt:
        print("\nStopping...")
    finally:
        recorder.stop()
        recorder.delete()
        cheetah.delete()
    
    # Process with picoLLM
    pllm = picollm.create(
        access_key=ACCESS_KEY,
        model_path=PICOLLM_MODEL_PATH
    )
    
    catalog_summary = json.dumps([
        {"id": c["id"], "title": c["title"], "type": c["type"],
         "genre": c["genre"], "year": c["year"], "rating": c["rating"],
         "description": c["description"]}
        for c in CONTENT_CATALOG
    ])
    
    prompt = (f"{RECOMMENDATION_PROMPT}\n\n"
              f"Content catalog:\n{catalog_summary}\n\n"
              f'Viewer request: "{transcript}"\n\n'
              "Respond with valid JSON only:")
    
    print("\nGenerating recommendations...")
    
    response = pllm.generate(
        prompt=prompt,
        completion_token_limit=200
    )
    
    # Parse JSON recommendations from LLM output
    try:
        clean = response.completion.replace("```json", "").replace("```", "").strip()
        result = json.loads(clean)
        recommendations = result.get("recommendations", [])
    except json.JSONDecodeError:
        recommendations = []
    
    spoken = format_recommendations(recommendations)
    print(f"\nAssistant: {spoken}")
    
    speak_response(ACCESS_KEY, spoken)
    
    pllm.release()

This approach uses Cheetah Streaming Speech-to-Text to capture the viewer's open-ended request, then picoLLM to match it against the local content catalog and generate structured recommendations — all without leaving the device.

Add Voice Response Generation for Smart TV

Transform text responses into natural speech for TV playback:

import pvorca
from pvspeaker import PvSpeaker
from collections import deque

def speak_response(access_key: str, text: str):
    """Convert text to speech and play"""
    orca = pvorca.create(access_key=access_key)
    speaker = PvSpeaker(
        sample_rate=orca.sample_rate,
        bits_per_sample=16
    )
    
    try:
        # Synthesize speech
        pcm_out, _ = orca.synthesize(text)
        
        # Play audio
        speaker.start()
        
        pcm_buffer = deque()
        pcm_buffer.append(pcm_out)
        
        while len(pcm_buffer) > 0:
            pcm = pcm_buffer.popleft()
            written = speaker.write(pcm)
            if written < len(pcm):
                pcm_buffer.appendleft(pcm[written:])
        
        speaker.flush()
        speaker.stop()
    except KeyboardInterrupt:
        print("\nStopping playback...")
        speaker.stop()
    finally:
        # Cleanup
        speaker.delete()
        orca.delete()

Orca Streaming Text-to-Speech generates natural voice responses with first audio output in under 130ms, providing immediate verbal feedback when a viewer speaks a command.

Route Content Search Commands to Local Catalog

Map structured intents to content catalog queries and format results for voice delivery:

RESUME_STATE = {"id": "1003", "title": "Midnight Echoes", "position_sec": 4230}
WATCHLIST = []


def search_by_genre(genre):
    """Return content matching the requested genre, sorted by rating"""
    results = [c for c in CONTENT_CATALOG if genre.lower() in [g.lower() for g in c["genre"]]]
    results.sort(key=lambda x: x["rating"], reverse=True)
    return results[:3]


def get_top_rated(content_type=None):
    """Return top-rated content, optionally filtered by type"""
    results = CONTENT_CATALOG
    if content_type:
        results = [c for c in results if c["type"] == content_type]
    results.sort(key=lambda x: x["rating"], reverse=True)
    return results[:3]


def search_by_genre_and_year(genre, year):
    """Return content matching genre filtered by release year"""
    results = [c for c in CONTENT_CATALOG
               if genre.lower() in [g.lower() for g in c["genre"]] and c["year"] == year]
    results.sort(key=lambda x: x["rating"], reverse=True)
    return results[:3]


def format_content_results(results):
    """Format catalog results into a speakable response"""
    if not results:
        return "I couldn't find anything matching that. Try a different genre or ask me for recommendations."

    if len(results) == 1:
        c = results[0]
        return (f"I found {c['title']}, a {c['genre'][0]} {c['type']} from {c['year']} "
                f"rated {c['rating']} out of 10. {c['description']}")

    titles = [c["title"] for c in results]
    response = f"Here are some options: {', '.join(titles[:-1])}, and {titles[-1]}."
    response += f" {results[0]['title']} is the highest rated at {results[0]['rating']} out of 10."
    return response


def format_recommendations(recommendations):
    """Format picoLLM recommendations into a speakable response"""
    catalog_map = {c["id"]: c for c in CONTENT_CATALOG}
    matched = []
    for rec in recommendations:
        if rec["id"] in catalog_map:
            matched.append({**catalog_map[rec["id"]], "reason": rec.get("reason", "")})

    if not matched:
        return "I couldn't find a good match for that. Try searching by genre or ask for top-rated content."

    if len(matched) == 1:
        c = matched[0]
        return f"Based on what you're looking for, I'd suggest {c['title']}. {c['reason']}"

    titles = [c["title"] for c in matched]
    return f"Here are some picks for you: {', '.join(titles[:-1])}, and {titles[-1]}. {matched[0]['reason']}"


def handle_content_command(access_key: str, intent: str, slots: dict):
    """Execute content search commands and provide voice feedback"""
    
    if intent == "searchByGenre":
        genre = slots.get('genre', '')
        print(f"[Catalog] Searching for {genre} content")
        results = search_by_genre(genre)
        response = format_content_results(results)
        speak_response(access_key, response)
        
    elif intent == "resumeContent":
        if RESUME_STATE:
            print(f"[Playback] Resuming {RESUME_STATE['title']}")
            speak_response(access_key, f"Resuming {RESUME_STATE['title']} from where you left off.")
        else:
            speak_response(access_key, "You don't have anything in progress. Would you like me to suggest something?")
    
    elif intent == "topRated":
        print("[Catalog] Fetching top-rated content")
        results = get_top_rated()
        response = format_content_results(results)
        speak_response(access_key, response)
        
    elif intent == "searchByGenreAndYear":
        genre = slots.get('genre', '')
        year = int(slots.get('year', 2024))
        print(f"[Catalog] Searching for {genre} content from {year}")
        results = search_by_genre_and_year(genre, year)
        response = format_content_results(results)
        speak_response(access_key, response)
        
    elif intent == "nowPlaying":
        # Integration point: query TV tuner or live TV guide
        speak_response(access_key, "Live TV is available on input two. Switch to that input to browse live channels.")
        
    elif intent == "addToWatchlist":
        if RESUME_STATE:
            WATCHLIST.append(RESUME_STATE["title"])
            print(f"[Watchlist] Added {RESUME_STATE['title']}")
            speak_response(access_key, f"Added {RESUME_STATE['title']} to your watchlist.")
        else:
            speak_response(access_key, "Sure, but I need to know what to add. Can you tell me the title?")

Complete Python Code for Smart TV Voice Search

This implementation combines all components for a smart TV voice search system:

# Smart TV Voice Search for Content Discovery

import argparse
import json
import os
from collections import deque

import pvporcupine
import pvrhino
import pvcheetah
import picollm
import pvorca
from pvrecorder import PvRecorder
from pvspeaker import PvSpeaker


CONTENT_CATALOG = [
    {"id": "1001", "title": "The Last Frontier", "type": "movie", "genre": ["action", "adventure"],
     "year": 2023, "rating": 8.4, "description": "A retired soldier uncovers a conspiracy that reaches the highest levels of government."},
    {"id": "1002", "title": "Laugh Track", "type": "movie", "genre": ["comedy"],
     "year": 2024, "rating": 7.2, "description": "A stand-up comedian's life is turned upside down when a celebrity endorses their open mic set."},
    {"id": "1003", "title": "Midnight Echoes", "type": "movie", "genre": ["thriller", "mystery"],
     "year": 2022, "rating": 8.1, "description": "A detective traces a series of cryptic messages leading to a decades-old unsolved case."},
    {"id": "1004", "title": "Wild Kingdom", "type": "show", "genre": ["documentary", "family"],
     "year": 2023, "rating": 8.7, "description": "An immersive documentary series exploring ecosystems across six continents."},
    {"id": "1005", "title": "Star Odyssey", "type": "movie", "genre": ["sci-fi", "adventure"],
     "year": 2024, "rating": 8.9, "description": "Humanity's first interstellar crew faces impossible odds on a mission to find a new home."},
    {"id": "1006", "title": "Family Circus", "type": "movie", "genre": ["animation", "family", "comedy"],
     "year": 2023, "rating": 7.8, "description": "A chaotic circus family embarks on a road trip that brings them closer together."},
    {"id": "1007", "title": "Dark Horizon", "type": "movie", "genre": ["thriller", "action"],
     "year": 2024, "rating": 7.9, "description": "An international fugitive races against time to clear their name before a global summit."},
    {"id": "1008", "title": "Crimson City", "type": "show", "genre": ["crime", "drama"],
     "year": 2022, "rating": 8.5, "description": "A detective navigates corruption and rivalry in a city where everyone has secrets."},
]

RESUME_STATE = {"id": "1003", "title": "Midnight Echoes", "position_sec": 4230}
WATCHLIST = []

RECOMMENDATION_PROMPT = """You are a smart TV content recommendation assistant.
Given a user's request and a catalog of available content, recommend the most relevant titles.
Respond only in JSON format: {"recommendations": [{"id": "content_id", "reason": "brief reason"}]}
Only recommend content that exists in the provided catalog. Limit to 3 recommendations."""


def speak_response(access_key: str, text: str):
    """Convert text to speech and play"""
    orca = pvorca.create(access_key=access_key)
    speaker = PvSpeaker(
        sample_rate=orca.sample_rate,
        bits_per_sample=16
    )
    
    try:
        pcm_out, _ = orca.synthesize(text)
        
        speaker.start()
        
        pcm_buffer = deque()
        pcm_buffer.append(pcm_out)
        
        while len(pcm_buffer) > 0:
            pcm = pcm_buffer.popleft()
            written = speaker.write(pcm)
            if written < len(pcm):
                pcm_buffer.appendleft(pcm[written:])
        
        speaker.flush()
        speaker.stop()
    except KeyboardInterrupt:
        print("\nStopping playback...")
        speaker.stop()
    finally:
        speaker.delete()
        orca.delete()


def search_by_genre(genre):
    """Return content matching the requested genre, sorted by rating"""
    results = [c for c in CONTENT_CATALOG if genre.lower() in [g.lower() for g in c["genre"]]]
    results.sort(key=lambda x: x["rating"], reverse=True)
    return results[:3]


def get_top_rated(content_type=None):
    """Return top-rated content, optionally filtered by type"""
    results = CONTENT_CATALOG
    if content_type:
        results = [c for c in results if c["type"] == content_type]
    results.sort(key=lambda x: x["rating"], reverse=True)
    return results[:3]


def search_by_genre_and_year(genre, year):
    """Return content matching genre filtered by release year"""
    results = [c for c in CONTENT_CATALOG
               if genre.lower() in [g.lower() for g in c["genre"]] and c["year"] == year]
    results.sort(key=lambda x: x["rating"], reverse=True)
    return results[:3]


def format_content_results(results):
    """Format catalog results into a speakable response"""
    if not results:
        return "I couldn't find anything matching that. Try a different genre or ask me for recommendations."

    if len(results) == 1:
        c = results[0]
        return (f"I found {c['title']}, a {c['genre'][0]} {c['type']} from {c['year']} "
                f"rated {c['rating']} out of 10. {c['description']}")

    titles = [c["title"] for c in results]
    response = f"Here are some options: {', '.join(titles[:-1])}, and {titles[-1]}."
    response += f" {results[0]['title']} is the highest rated at {results[0]['rating']} out of 10."
    return response


def format_recommendations(recommendations):
    """Format picoLLM recommendations into a speakable response"""
    catalog_map = {c["id"]: c for c in CONTENT_CATALOG}
    matched = []
    for rec in recommendations:
        if rec["id"] in catalog_map:
            matched.append({**catalog_map[rec["id"]], "reason": rec.get("reason", "")})

    if not matched:
        return "I couldn't find a good match for that. Try searching by genre or ask for top-rated content."

    if len(matched) == 1:
        c = matched[0]
        return f"Based on what you're looking for, I'd suggest {c['title']}. {c['reason']}"

    titles = [c["title"] for c in matched]
    return f"Here are some picks for you: {', '.join(titles[:-1])}, and {titles[-1]}. {matched[0]['reason']}"


def handle_content_command(access_key: str, intent: str, slots: dict):
    """Execute content search commands and provide voice feedback"""
    
    if intent == "searchByGenre":
        genre = slots.get('genre', '')
        print(f"[Catalog] Searching for {genre} content")
        results = search_by_genre(genre)
        response = format_content_results(results)
        speak_response(access_key, response)
        
    elif intent == "resumeContent":
        if RESUME_STATE:
            print(f"[Playback] Resuming {RESUME_STATE['title']}")
            speak_response(access_key, f"Resuming {RESUME_STATE['title']} from where you left off.")
        else:
            speak_response(access_key, "You don't have anything in progress. Would you like me to suggest something?")
    
    elif intent == "topRated":
        print("[Catalog] Fetching top-rated content")
        results = get_top_rated()
        response = format_content_results(results)
        speak_response(access_key, response)
        
    elif intent == "searchByGenreAndYear":
        genre = slots.get('genre', '')
        year = int(slots.get('year', 2024))
        print(f"[Catalog] Searching for {genre} content from {year}")
        results = search_by_genre_and_year(genre, year)
        response = format_content_results(results)
        speak_response(access_key, response)
        
    elif intent == "nowPlaying":
        speak_response(access_key, "Live TV is available on input two. Switch to that input to browse live channels.")
        
    elif intent == "addToWatchlist":
        if RESUME_STATE:
            WATCHLIST.append(RESUME_STATE["title"])
            print(f"[Watchlist] Added {RESUME_STATE['title']}")
            speak_response(access_key, f"Added {RESUME_STATE['title']} to your watchlist.")
        else:
            speak_response(access_key, "Sure, but I need to know what to add. Can you tell me the title?")


def handle_content_search(access_key: str, context_path: str):
    """Process structured content commands using Rhino Speech-to-Intent"""
    
    try:
        rhino = pvrhino.create(
            access_key=access_key,
            context_path=context_path)
    except pvrhino.RhinoError as e:
        print("Failed to initialize Rhino")
        raise e

    print(f'Rhino version: {rhino.version}')

    recorder = PvRecorder(frame_length=rhino.frame_length)
    recorder.start()

    print('Listening for content command...')

    try:
        while True:
            pcm = recorder.read()
            is_finalized = rhino.process(pcm)

            if is_finalized:
                inference = rhino.get_inference()
                if inference.is_understood:
                    print('{')
                    print("  intent : '%s'" % inference.intent)
                    print('  slots : {')
                    for slot, value in inference.slots.items():
                        print("    '%s' : '%s'" % (slot, value))
                    print('  }')
                    print('}\n')
                    
                    handle_content_command(access_key, inference.intent, inference.slots)
                else:
                    print("Didn't understand the command. Please try again.")
                
                break

    except KeyboardInterrupt:
        print('\nStopping...')

    finally:
        recorder.stop()
        recorder.delete()
        rhino.delete()


def handle_personalized_query(access_key: str, pllm_model_path: str):
    """Process open-ended content requests using Cheetah + picoLLM"""
    
    # Initialize Cheetah for speech-to-text
    cheetah = pvcheetah.create(
        access_key=access_key,
        endpoint_duration_sec=1.0
    )
    
    recorder = PvRecorder(frame_length=cheetah.frame_length)
    recorder.start()
    
    print("Speak your request...")
    transcript = ""
    
    try:
        while True:
            pcm = recorder.read()
            partial_transcript, is_endpoint = cheetah.process(pcm)
            transcript += partial_transcript
            print(partial_transcript, end="", flush=True)
            
            if is_endpoint:
                final_transcript = cheetah.flush()
                transcript += final_transcript
                print(final_transcript)
                break
    except KeyboardInterrupt:
        print("\nStopping...")
    finally:
        recorder.stop()
        recorder.delete()
        cheetah.delete()
    
    # Process with picoLLM
    pllm = picollm.create(
        access_key=access_key,
        model_path=pllm_model_path
    )
    
    catalog_summary = json.dumps([
        {"id": c["id"], "title": c["title"], "type": c["type"],
         "genre": c["genre"], "year": c["year"], "rating": c["rating"],
         "description": c["description"]}
        for c in CONTENT_CATALOG
    ])
    
    prompt = (f"{RECOMMENDATION_PROMPT}\n\n"
              f"Content catalog:\n{catalog_summary}\n\n"
              f'Viewer request: "{transcript}"\n\n'
              "Respond with valid JSON only:")
    
    print("\nGenerating recommendations...")
    
    response = pllm.generate(
        prompt=prompt,
        completion_token_limit=200
    )
    
    # Parse JSON recommendations from LLM output
    try:
        clean = response.completion.replace("```json", "").replace("```", "").strip()
        result = json.loads(clean)
        recommendations = result.get("recommendations", [])
    except json.JSONDecodeError:
        recommendations = []
    
    spoken = format_recommendations(recommendations)
    print(f"\nAssistant: {spoken}")
    
    speak_response(access_key, spoken)
    
    pllm.release()


def main():
    parser = argparse.ArgumentParser()

    parser.add_argument(
        '--access_key',
        help='AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)',
        required=True)

    parser.add_argument(
        '--command_keyword_path',
        help='Absolute path to content search wake word model file (.ppn)',
        required=True)

    parser.add_argument(
        '--query_keyword_path',
        help='Absolute path to recommendation wake word model file (.ppn)',
        required=True)

    parser.add_argument(
        '--context_path',
        help='Absolute path to Rhino context file (.rhn)',
        required=True)

    parser.add_argument(
        '--pllm_model_path',
        help='Absolute path to picoLLM model file (.pllm)',
        required=True)

    args = parser.parse_args()

    print("Smart TV Voice Search")
    print("=" * 50)

    # Main loop for continuous operation
    while True:
        # Stage 1: Wake Word Detection with dual keywords
        try:
            porcupine = pvporcupine.create(
                access_key=args.access_key,
                keyword_paths=[args.command_keyword_path, args.query_keyword_path])
        except pvporcupine.PorcupineError as e:
            print("Failed to initialize Porcupine")
            raise e

        # Extract keyword names from filenames
        keywords = []
        for keyword_path in [args.command_keyword_path, args.query_keyword_path]:
            keyword_phrase_part = os.path.basename(keyword_path).replace('.ppn', '').split('_')
            if len(keyword_phrase_part) > 6:
                keywords.append(' '.join(keyword_phrase_part[0:-6]))
            else:
                keywords.append(keyword_phrase_part[0])

        print(f'Porcupine version: {porcupine.version}')

        recorder = PvRecorder(frame_length=porcupine.frame_length)
        recorder.start()

        print('Listening for wake word... (press Ctrl+C to exit)')
        print(f'  Say "{keywords[0]}" for content search')
        print(f'  Say "{keywords[1]}" for personalized recommendations')

        detected_keyword_index = -1

        try:
            while True:
                pcm = recorder.read()
                result = porcupine.process(pcm)

                if result >= 0:
                    print(f'Detected "{keywords[result]}"')
                    detected_keyword_index = result
                    break

        except KeyboardInterrupt:
            print('\nStopping...')
            recorder.stop()
            recorder.delete()
            porcupine.delete()
            break

        finally:
            recorder.stop()
            recorder.delete()
            porcupine.delete()

        # Stage 2: Route based on detected wake word
        if detected_keyword_index == 0:
            # Content search wake word - route to Rhino for structured queries
            handle_content_search(args.access_key, args.context_path)
        elif detected_keyword_index == 1:
            # Recommendation wake word - route to Cheetah + picoLLM
            handle_personalized_query(args.access_key, args.pllm_model_path)


if __name__ == '__main__':
    main()

Run the Smart TV Voice Assistant

To run the voice search system, update the model paths to match your local files and have your Picovoice AccessKey ready:

python3 smart_tv_voice_search.py \
  --access_key "$ACCESS_KEY" \
  --command_keyword_path ./models/hey-tv.ppn \
  --query_keyword_path ./models/hey-assistant.ppn \
  --context_path ./models/content-discovery.rhn \
  --pllm_model_path ./models/llama-3.2-3b-instruct-505.pllm

Example Interactions

Content Search:

Viewer: "Hey TV, find action movies."
System: "Here are some options: The Last Frontier, Dark Horizon, and Star Odyssey. The Last Frontier is the highest rated at 8.4 out of 10."

Resume Playback:

Viewer: "Hey TV, continue watching."
System: "Resuming Midnight Echoes from where you left off."

Personalized Recommendation:

Viewer: "Hey Assistant, something fun for the whole family."
System: "Here are some picks for you: Wild Kingdom, Family Circus, and Star Odyssey. Wild Kingdom is a great match for family viewing."

You can start building your own commercial or non-commercial projects using Picovoice's self-service Console.

Start Building

Frequently Asked Questions

Will the voice search work accurately in noisy environments, with different accents, or with varied content titles?

Yes. Porcupine Wake Word, Rhino Speech-to-Intent, and Cheetah Streaming Speech-to-Text are designed to work reliably with background noise and various accents across supported languages.

Can I use different wake words instead of 'Hey TV' and 'Hey Assistant'?

Yes. Train any custom wake phrases using Picovoice Console in seconds without collecting training data. Simply enter your desired phrases and download the trained models. Porcupine detects multiple wake words simultaneously with no added runtime footprint, so both activation paths stay responsive. The wake word guide covers best practices for choosing effective wake phrases.

When should I use Rhino Speech-to-Intent versus picoLLM for content queries?

Use Rhino Speech-to-Intent for structured, predictable content searches like genre filters, resume commands, and top-rated lists. Use picoLLM for open-ended requests where viewers might phrase things in unpredictable ways. The dual wake word architecture lets viewers choose the appropriate path upfront — "Hey TV" for direct searches and "Hey Assistant" for personalized recommendations.