Build a Voice-Powered Marketing & Customer Feedback Survey for Web

🎯 Voice AI Consulting

Get dedicated support and consultation to ensure your specific needs are met.

TLDR: Build a voice survey on web that captures Net Promoter Scores (NPS), multiple-choice selections, and open-ended customer feedback. This tutorial uses client-side spoken language understanding and streaming speech-to-text via WebAssembly to ensure privacy and accuracy without the latency or cost of cloud based LLM APIs.

Build a Voice Survey Web App for Marketing and Customer Feedback

Voice surveys collect spoken responses instead of typed answers, for example, product ratings, satisfaction scores, multiple-choice selections, and open-ended feedback captured through speech. Respondents dictate their answers, select choices, and explain their reasoning out loud, producing richer qualitative data and higher completion rates than text-only surveys. This tutorial builds a voice survey web app that runs entirely in the browser via WebAssembly — audio never leaves the device, keeping all user responses private and secure.

How do Voice Surveys Work?

A voice survey needs to handle two fundamentally different response types: structured answers with known valid values (product ratings, multiple choices, yes/no selections), and open-ended feedback where the user provides detailed reasoning.

Most implementations handle both with a single speech-to-text pipeline, then route the transcript through regex, Natural Language Processing, or an LLM to extract structured values. For the regex/NLP approach, this two-step pipeline has a critical failure mode: the transcription layer doesn't know the valid response options, so it can mishear "four" as "for" or "ate" as "eight". When these pipelines fail silently, they leave behind corrupted data that's difficult to detect or correct after the session ends.

Routing the transcripts through an LLM can resolve these ambiguities from context, but for a survey with a fixed set of valid responses, an LLM adds unnecessary complexity, cost, and latency.

This tutorial replaces that approach with a deterministic architecture: spoken language understanding for structured questions and streaming speech-to-text for open-ended ones. A domain-specific speech-to-intent model matches audio directly against the survey's valid response options i.e., a spoken "four" always returns the integer 4, not the word "for". Unlike the multi-step STT-then-parse pipeline, this architecture is simpler and more reliable. Moreover, there is no intermediate transcription layer to increase the surface area for errors.

What You'll Build:

A customer feedback survey with four question types: an NPS rating (1–5), a multiple-choice satisfaction question, a yes/no question, and an open-ended follow-up. The same architecture extends to event feedback, patient satisfaction, employee engagement, and market research surveys.

What You'll Need:

Node.js (download page)
Picovoice AccessKey from the Picovoice Console
A microphone-equipped laptop or desktop for testing

How to Process Voice Survey Responses Without an LLM

This tutorial uses four on-device voice models sharing a single microphone stream through the Web Voice Processor:

Cobra Voice Activity Detection runs continuously on the microphone stream, tracking voice probability in real time. Rather than keeping all voice models active at once, Cobra acts as the trigger — when it detects speech above the probability threshold, it activates the right model for the question type on screen.

Rhino Speech-to-Intent handles NPS ratings, multiple-choice selections, and yes/no answers. It matches audio directly against a predefined set of valid responses and outputs structured data:

Voice Input: "I'd give it a four"
JSON Output: { intent: "giveRating", slots: { score: "4" } }

Invalid scores, out-of-range values, and hallucinated choices are structurally impossible — the model either returns a valid response or returns nothing.

Cheetah Streaming Speech-to-Text handles open-ended questions. It transcribes the respondent's speech in real time, displaying words as they speak.

Porcupine Wake Word listens for three keywords or navigation phrases continuously alongside Cobra Voice Activity Detection: "Next Page", "Previous Page", and "Submit Survey".

To go further with the captured responses, e.g., summarizing feedback trends or analyzing responses, a local LLM like picoLLM can process the data without sending it to the cloud.

Set Up the Voice Survey Project

Initialize a new project and install the required packages:

mkdir voice-survey && cd voice-survey
npm init -y

Install the speech SDKs and a local development server:

npm install http-server @picovoice/cobra-web @picovoice/porcupine-web @picovoice/rhino-web @picovoice/cheetah-web @picovoice/web-voice-processor

@picovoice/cobra-web: Voice activity detection model
@picovoice/porcupine-web: Keyword detection model for voice control
@picovoice/rhino-web: Speech-to-intent model for structured responses
@picovoice/cheetah-web: Streaming speech-to-text for open-ended questions
@picovoice/web-voice-processor: Shared microphone audio pipeline
http-server: Local server

Train Custom Keywords for Voice Control

Sign up for a Picovoice Console account and navigate to the Porcupine page.
Enter your keyword such as "Submit Survey" and test it using the microphone button.
Click "Train", select "Web (WASM)" as the target platform, and download the .ppn model file in the project root.
Repeat steps 2 & 3 for additional keywords:

"Next Page"
"Previous Page"

For tips on designing effective keywords, review the choosing a wake word guide.

Define Voice Commands for Survey Responses

The speech-to-intent model needs a context that maps spoken phrases to structured data. This section defines valid responses for each structured question in the survey.

In the Rhino section of Picovoice Console, create a new context for your survey.
Click the "Import YAML" button in the top-right corner of the Console. Paste the YAML below to add the survey response commands.
Train the context for the "Web (WASM)" platform and download the .rhn model file.

YAML Context for a Customer Feedback Survey:

context:
 expressions:
   giveRating:
     - "(@prefix) score $pv.SingleDigitInteger:score (out of five)"
     - "[my, the] [score, rating, number] [is, would be] (a) $pv.SingleDigitInteger:score"
     - "[probably, maybe, definitely] (a) $pv.SingleDigitInteger:score"

   selectChoice:
     - "(@prefix) $satisfactionLevel:choice (about it)"
     - "[it's, it was, the experience was] $satisfactionLevel:choice"
     - "[I feel, I felt] $satisfactionLevel:choice (about it)"

   answerYesNo:
     - "(@prefix) $yesNo:answer"
     - "[definitely, absolutely, probably] $yesNo:answer"

 slots:
   satisfactionLevel:
     - "very satisfied"
     - "satisfied"
     - "neutral"
     - "dissatisfied"
     - "very dissatisfied"

   yesNo:
     - "yes"
     - "no"

 macros:
   prefix:
     - "I would"
     - "I'd"
     - "I would say"
     - "I'd say"
     - "I would give it"
     - "I'd give it"
     - "I would give it a"
     - "I'd give it a"

This defines three response intents: giveRating for NPS scores, selectChoice for multiple-choice answers, and answerYesNo for binary questions. The giveRating intent uses Rhino's built-in pv.SingleDigitInteger slot, which automatically recognizes spoken numbers one through nine and returns the numeric value (e.g., "four" → "4"). The code validates that only scores between 1 and 5 are accepted.

The bracket syntax handles how people naturally speak in a survey context. "Four out of five", "I'd give it a four", "probably a four", and just "four" all resolve to the same intent with slot value 4. The voice model handles variation deterministically. To support additional phrasings, add more expressions to the YAML and retrain.

Refer to the Rhino Syntax Cheat Sheet for details on expression syntax, optional words, and slot types.

Download Default Voice Models

The web application requires default model files to initialize the specialized voice models locally via WebAssembly. Download the following parameter files and place them in the project root:

Cobra VAD: cobra_params.pv
Porcupine Wake Word: porcupine_params.pv
Rhino Speech-to-Intent: rhino_params.pv
Cheetah Streaming STT: cheetah_params.pv

Define Questions for Web Survey

Before building the UI, define the survey as a data structure. Each question specifies its type, the text shown to the respondent, and the response mode — either speech-to-intent for structured answers or speech-to-text for open-ended responses:

const SURVEY_QUESTIONS = [
 {
   id: "nps",
   type: "rating",
   text: "On a scale of 1 to 5, how likely are you to recommend us to a friend or colleague?",
   responseMode: "intent",
   intentName: "giveRating",
   slotName: "score"
 },
 {
   id: "satisfaction",
   type: "choice",
   text: "How would you describe your overall experience?",
   options: ["Very Satisfied", "Satisfied", "Neutral", "Dissatisfied", "Very Dissatisfied"],
   responseMode: "intent",
   intentName: "selectChoice",
   slotName: "choice"
 },
 {
   id: "recommend",
   type: "yesno",
   text: "Would you use our service again?",
   options: ["Yes", "No"],
   responseMode: "intent",
   intentName: "answerYesNo",
   slotName: "answer"
 },
 {
   id: "feedback",
   type: "open",
   text: "What could we do to improve your experience?",
   responseMode: "transcription"
 }
];

This structure makes it easy to add, remove, or reorder questions without changing any voice logic. Questions with responseMode: "intent" use the speech-to-intent model; questions with responseMode: "transcription" use the speech-to-text model. To build a different survey — event feedback, patient satisfaction, market research — update this array and the Rhino YAML context.

Create Voice Survey HTML File

Create an index.html file in the project root. The application loads all SDKs from node_modules:

<!DOCTYPE html>
<html lang="en">
<head>
 <meta charset="UTF-8">
 <meta name="viewport" content="width=device-width, initial-scale=1.0">
 <title>Voice-Powered Customer Feedback Survey</title>


 <script src="node_modules/@picovoice/cobra-web/dist/iife/index.js"></script>
 <script src="node_modules/@picovoice/porcupine-web/dist/iife/index.js"></script>
 <script src="node_modules/@picovoice/rhino-web/dist/iife/index.js"></script>
 <script src="node_modules/@picovoice/cheetah-web/dist/iife/index.js"></script>
 <script src="node_modules/@picovoice/web-voice-processor/dist/iife/index.js"></script>
</head>
<body>
 <!-- Survey UI goes here -->
 <script>
   // Voice logic goes here
 </script>
</body>
</html>

The survey UI consists of three screens: a start screen with a button to begin, a question screen that renders each question type (rating bar, choice chips, yes/no buttons, transcription text area), and an end screen with a response summary. The complete HTML, CSS, and JavaScript are included at the end of this tutorial.

Add Voice Activity Detection for Automatic Activation

Cobra Voice Activity Detection runs continuously and detects when the user starts speaking. Based on the current question type, it activates either Rhino (for structured questions) or Cheetah (for open-ended questions):

const VAD_THRESHOLD = 0.5;
const VAD_FRAMES_REQUIRED = 1;


let voiceActivityCounter = 0;


function cobraVoiceCallback(voiceProbability) {
 if (!surveyActive) return;
 if (rhinoActive) return;
 if (isTranscribing) return;
 if (Date.now() < keywordCooldownUntil) return;


 if (voiceProbability > VAD_THRESHOLD) {
   voiceActivityCounter++;
   if (voiceActivityCounter >= VAD_FRAMES_REQUIRED) {
     activateEngineForCurrentQuestion();
     voiceActivityCounter = 0;
   }
 } else {
   voiceActivityCounter = 0;
 }
}

The activateEngineForCurrentQuestion function checks the question's responseMode and subscribes the appropriate voice model:

async function activateEngineForCurrentQuestion() {
 const question = SURVEY_QUESTIONS[currentQuestionIndex];
 if (!question) return;


 if (question.responseMode === "intent") {
   if (rhinoActive) return;
   rhinoActive = true;
   await WebVoiceProcessor.WebVoiceProcessor.subscribe(rhino);
   setStatus("listening", "Listening — speak your answer");
 } else if (question.responseMode === "transcription") {
   if (isTranscribing) return;
   isTranscribing = true;
   await WebVoiceProcessor.WebVoiceProcessor.subscribe(cheetah);
   setStatus("listening", "Listening — speak freely");
 }
}

A threshold of 0.5 in an audio frame provides responsive activation. The keywordCooldownUntil check prevents Cobra Voice Activity Detection from immediately reactivating a model after a keyword is detected.

Add Voice Control to Web Survey

Porcupine Wake Word listens for three navigation keywords continuously alongside Cobra. When a keyword is detected, it triggers the corresponding survey action:

const porcupineModel = {
 publicPath: "porcupine_params.pv",
};


const keywordModels = [
 { publicPath: "${NEXT_PAGE_PPN}.ppn", label: "nextPage" },
 { publicPath: "${PREVIOUS_PAGE_PPN}.ppn", label: "previousPage" },
 { publicPath: "${SUBMIT_SURVEY_PPN}.ppn", label: "submitSurvey" },
];


const porcupine = await PorcupineWeb.PorcupineWorker.create(
 ACCESS_KEY,
 keywordModels,
 porcupineKeywordCallback,
 porcupineModel,
);

await WebVoiceProcessor.WebVoiceProcessor.subscribe(porcupine);

Replace ${NEXT_PAGE_PPN}, ${PREVIOUS_PAGE_PPN}, and ${SUBMIT_SURVEY_PPN} with the filenames of your downloaded .ppn files.

The callback routes each keyword to its survey action. A short cooldown prevents Cobra from immediately reactivating a model after a keyword is processed:

const KEYWORD_COOLDOWN_MS = 300;
let keywordCooldownUntil = 0;


async function porcupineKeywordCallback(detection) {
 console.log("Porcupine detected:", detection.label);
 keywordCooldownUntil = Date.now() + KEYWORD_COOLDOWN_MS;


 if (rhinoActive) {
   rhinoActive = false;
   await WebVoiceProcessor.WebVoiceProcessor.unsubscribe(rhino);
 }


 if (isTranscribing) {
   await stopTranscription();
   cleanKeywordFromTranscription();
 }


 switch (detection.label) {
   case "nextPage":
     goToNext();
     break;
   case "previousPage":
     goToPrevious();
     break;
   case "submitSurvey":
     submitSurvey();
     break;
 }
}

When a keyword is detected while Rhino is active, the callback unsubscribes Rhino first. If detected during transcription, the callback stops Cheetah and cleans any keyword text that may have been transcribed.

Clean Keyword Text from Transcription

Since Porcupine and Cheetah run simultaneously during open-ended questions, Cheetah may transcribe the keyword phrase (e.g., "next page") before Porcupine detects it. The cleanKeywordFromTranscription function strips these phrases from the end of the transcription:

function cleanKeywordFromTranscription() {
 const el = document.getElementById("transcriptionBox");
 const keywords = ["next page", "previous page", "submit survey"];
 let text = el.value;
 for (const kw of keywords) {
   const regex = new RegExp("\\s*" + kw + "\\s*\\.?\\s*$", "i");
   text = text.replace(regex, "");
 }
 el.value = text.trimEnd();
 responses[SURVEY_QUESTIONS[currentQuestionIndex].id] = el.value.trim();
}

Route Survey Responses with Spoken Language Understanding

When Cobra detects speech on a structured question, Rhino Speech-to-Intent activates and listens for a response. When it recognizes a complete utterance, the callback fires with the inference result:

function rhinoInferenceCallback(inference) {
 if (!inference.isFinalized) return;


 rhinoActive = false;
 WebVoiceProcessor.WebVoiceProcessor.unsubscribe(rhino);
 console.log("Rhino inference:", JSON.stringify(inference));


 if (inference.isUnderstood) {
   handleSurveyIntent(inference.intent, inference.slots);
 } else {
   setStatus("listening", "Didn't catch that — try again");
 }
}

The handleSurveyIntent function routes each intent to the correct response handler, updating the stored answer and highlighting the selected option in the UI:

function handleSurveyIntent(intent, slots) {
 console.log("handleSurveyIntent:", intent, "slots:", JSON.stringify(slots));
 const question = SURVEY_QUESTIONS[currentQuestionIndex];
 if (!question) return;


 if (question.intentName && intent === question.intentName) {
   if (question.type === "rating") {
     const slotValue = slots[question.slotName];
     const numericScore = slotValue ? parseInt(slotValue) : null;
     if (numericScore && numericScore >= 1 && numericScore <= 5) {
       responses[question.id] = numericScore;
       highlightRating(numericScore);
       setStatus("listening", `Got it — ${numericScore} out of 5`);
     } else {
       setStatus("listening", "Please say a number between 1 and 5");
     }
   } else {
     const slotValue = slots[question.slotName];
     if (slotValue) {
       responses[question.id] = slotValue;
       highlightChoice(slotValue);
       setStatus("listening", `Got it — "${slotValue}"`);
     }
   }
 }
}

For the giveRating intent, the built-in pv.SingleDigitInteger slot returns numeric strings directly — saying "four" produces { score: "4" }. The handler validates that only scores between 1 and 5 are accepted; values outside this range prompt the respondent to try again. Cobra Voice Activity Detection will reactivate Rhino Speech-to-Intent if the respondent speaks again, so they can correct their answer before navigating.

Transcribe Open-Ended Responses in Real Time

When Cobra Voice Activity Detection detects speech on an open-ended question, Cheetah Streaming Speech-to-Text subscribes to the microphone and transcribes the user's speech as they speak:

function cheetahTranscriptCallback(cheetahTranscript) {
 if (!isTranscribing) return;
 const el = document.getElementById("transcriptionBox");
 if (cheetahTranscript.transcript) {
   el.value += cheetahTranscript.transcript;
   responses[SURVEY_QUESTIONS[currentQuestionIndex].id] = el.value.trim();
 }
 if (cheetahTranscript.isEndpoint) {
   el.value += "\n";
 }
}

The callback receives partial transcripts as each audio frame is processed. Words appear in the text area in real time, giving the respondent immediate visual confirmation that their speech is being captured. When Cheetah detects a pause (endpoint), a line break is inserted.

Transcription is stopped and flushed when navigating away from the question:

async function stopTranscription() {
 if (!isTranscribing) return;
 isTranscribing = false;
 const flushed = await cheetah.flush();
 await WebVoiceProcessor.WebVoiceProcessor.unsubscribe(cheetah);


 const el = document.getElementById("transcriptionBox");
 if (flushed && flushed.transcript) {
   el.value += flushed.transcript;
 }
 el.value = el.value.trimEnd();
 responses[SURVEY_QUESTIONS[currentQuestionIndex].id] = el.value.trim();
}

Replace Placeholders in the Code

Locate and replace the following values in the index.html file:

AccessKey: Replace ${ACCESS_KEY_HERE} with your AccessKey obtained from the Picovoice Console dashboard.
Navigation Keywords: Replace ${NEXT_PAGE_PPN}, ${PREVIOUS_PAGE_PPN}, and ${SUBMIT_SURVEY_PPN} with the exact filenames of your downloaded Porcupine Wake Word .ppn files.
Survey Context: Replace ${CONTEXT_FILE_NAME} with the filename of your custom Rhino Speech-to-Intent .rhn file.

Run the Voice Survey

Your project directory should contain:

voice-survey/
├── index.html
├── package.json
├── porcupine_params.pv
├── ${NEXT_PAGE_PPN}.ppn
├── ${PREVIOUS_PAGE_PPN}.ppn
├── ${SUBMIT_SURVEY_PPN}.ppn
├── rhino_params.pv
├── ${CONTEXT_FILE_NAME}.rhn
├── cheetah_params.pv
└── node_modules/

Start the local server with the required cross-origin headers:

npx http-server -a localhost -p 5000 \
 --cors \
 -c-1 \
 --header "Cross-Origin-Opener-Policy: same-origin" \
 --header "Cross-Origin-Embedder-Policy: require-corp"

Open http://localhost:5000 in your browser. Click "Start Survey" to initialize the voice models and begin.

The click interface works in parallel with voice. Every rating, choice, and navigation action is available via mouse or touch. Respondents can mix both — speak a rating, then click "Next Page."

Complete Voice Survey Code

Here is the complete index.html with all HTML, CSS, and JavaScript:

<!DOCTYPE html>
<html lang="en">
<head>
 <meta charset="UTF-8">
 <meta name="viewport" content="width=device-width, initial-scale=1.0">
 <title>Voice-Powered Customer Feedback Survey</title>


 <script src="node_modules/@picovoice/cobra-web/dist/iife/index.js"></script>
 <script src="node_modules/@picovoice/porcupine-web/dist/iife/index.js"></script>
 <script src="node_modules/@picovoice/rhino-web/dist/iife/index.js"></script>
 <script src="node_modules/@picovoice/cheetah-web/dist/iife/index.js"></script>
 <script src="node_modules/@picovoice/web-voice-processor/dist/iife/index.js"></script>


 <link rel="preconnect" href="https://fonts.googleapis.com">
 <link href="https://fonts.googleapis.com/css2?family=DM+Sans:opsz,wght@9..40,400;9..40,500;9..40,600;9..40,700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">


 <style>
   *, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }


   :root {
     --bg: #f5f7fa;
     --surface: #ffffff;
     --border: #e8ecf1;
     --text: #1a1a2e;
     --text-secondary: #6b7280;
     --text-muted: #9ca3af;
     --accent: #377DFF;
     --accent-hover: #2563db;
     --accent-light: #eef4ff;
     --green: #22c55e;
     --green-light: #f0fdf4;
     --radius: 8px;
     --radius-lg: 16px;
   }


   body {
     font-family: 'DM Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
     background: var(--bg);
     min-height: 100vh;
     display: flex;
     align-items: center;
     justify-content: center;
     color: var(--text);
     -webkit-font-smoothing: antialiased;
   }


   .survey-container {
     background: var(--surface);
     border-radius: var(--radius-lg);
     box-shadow: 0 4px 24px rgba(0,0,0,0.08);
     max-width: 640px;
     width: 100%;
     padding: 48px 40px;
     text-align: center;
   }


   .survey-title {
     font-size: 22px;
     font-weight: 700;
     color: var(--text);
     margin-bottom: 8px;
   }


   .survey-progress {
     font-size: 13px;
     color: var(--text-muted);
     margin-bottom: 24px;
   }


   .progress-bar {
     height: 4px;
     background: var(--border);
     border-radius: 2px;
     margin-bottom: 32px;
     overflow: hidden;
   }
   .progress-fill {
     height: 100%;
     background: var(--accent);
     border-radius: 2px;
     transition: width 0.4s ease;
   }


   .question-text {
     font-size: 20px;
     font-weight: 600;
     color: var(--text);
     line-height: 1.5;
     margin-bottom: 28px;
   }


   /* ── Rating Bar ── */
   .rating-bar {
     display: flex;
     gap: 8px;
     justify-content: center;
     margin-bottom: 24px;
   }
   .rating-num {
     width: 44px;
     height: 44px;
     border: 2px solid var(--border);
     border-radius: 50%;
     display: flex;
     align-items: center;
     justify-content: center;
     font-size: 15px;
     font-weight: 600;
     color: var(--text-secondary);
     cursor: pointer;
     transition: all 0.2s;
   }
   .rating-num:hover { border-color: var(--accent); color: var(--accent); }
   .rating-num.selected {
     background: var(--accent);
     border-color: var(--accent);
     color: #fff;
   }


   /* ── Choice Chips ── */
   .response-options {
     display: flex;
     flex-wrap: wrap;
     gap: 10px;
     justify-content: center;
     margin-bottom: 24px;
   }
   .option-chip {
     padding: 10px 20px;
     border: 2px solid var(--border);
     border-radius: 24px;
     font-size: 14px;
     font-weight: 500;
     color: var(--text-secondary);
     cursor: pointer;
     transition: all 0.2s;
   }
   .option-chip:hover { border-color: var(--accent); color: var(--accent); }
   .option-chip.selected {
     background: var(--accent);
     border-color: var(--accent);
     color: #fff;
   }


   /* ── Transcription Box ── */
   .transcription-box {
     width: 100%;
     min-height: 120px;
     border: 2px solid var(--border);
     border-radius: 12px;
     padding: 16px;
     font-family: 'DM Sans', sans-serif;
     font-size: 15px;
     line-height: 1.6;
     color: var(--text);
     text-align: left;
     margin-bottom: 24px;
     resize: vertical;
     outline: none;
     transition: border-color 0.15s;
   }
   .transcription-box:focus { border-color: var(--accent); }
   .transcription-box::placeholder { color: var(--text-muted); }


   /* ── Voice Status ── */
   .voice-status {
     display: inline-flex;
     align-items: center;
     gap: 8px;
     padding: 8px 16px;
     border-radius: 20px;
     font-size: 13px;
     font-weight: 600;
     margin-bottom: 24px;
   }
   .status-idle { background: #f0f0f0; color: var(--text-muted); }
   .status-listening { background: var(--green-light); color: #16a34a; }
   .status-dot {
     width: 8px;
     height: 8px;
     border-radius: 50%;
     background: currentColor;
   }
   .status-listening .status-dot { animation: pulse 1.2s infinite; }
   @keyframes pulse {
     0%, 100% { opacity: 1; }
     50% { opacity: 0.3; }
   }


   /* ── Navigation ── */
   .nav-buttons {
     display: flex;
     gap: 12px;
     justify-content: center;
     margin-top: 16px;
   }
   .btn {
     padding: 12px 28px;
     border: none;
     border-radius: var(--radius);
     font-family: 'DM Sans', sans-serif;
     font-size: 15px;
     font-weight: 600;
     cursor: pointer;
     transition: all 0.2s;
   }
   .btn-primary { background: var(--accent); color: #fff; }
   .btn-primary:hover { background: var(--accent-hover); }
   .btn-secondary { background: #f0f0f0; color: var(--text-secondary); }
   .btn-secondary:hover { background: #e5e5e5; }
   .btn:disabled { opacity: 0.4; cursor: not-allowed; }


   /* ── Start / End Screens ── */
   .start-screen, .end-screen {
     display: flex;
     flex-direction: column;
     align-items: center;
     gap: 16px;
   }
   .start-screen p, .end-screen p {
     color: var(--text-secondary);
     font-size: 15px;
     line-height: 1.6;
   }
   .mic-icon {
     width: 64px;
     height: 64px;
     background: var(--accent-light);
     border-radius: 50%;
     display: flex;
     align-items: center;
     justify-content: center;
     font-size: 28px;
     margin-bottom: 8px;
   }


   /* ── Summary ── */
   .response-summary {
     text-align: left;
     width: 100%;
     margin: 16px 0;
   }
   .summary-item {
     padding: 12px 0;
     border-bottom: 1px solid var(--border);
   }
   .summary-label { font-size: 13px; color: var(--text-muted); margin-bottom: 4px; }
   .summary-value { font-size: 15px; color: var(--text); font-weight: 500; }


   /* ── Voice Hint ── */
   .voice-hint {
     font-size: 12px;
     color: var(--text-muted);
     margin-top: 8px;
   }
   .voice-hint kbd {
     font-family: 'JetBrains Mono', monospace;
     font-size: 11px;
     background: #f0f0f0;
     padding: 1px 6px;
     border-radius: 4px;
   }


   @media (max-width: 640px) {
     .survey-container { padding: 32px 20px; }
     .rating-bar { gap: 4px; }
     .rating-num { width: 36px; height: 36px; font-size: 13px; }
   }
 </style>
</head>
<body>
<div class="survey-container">


 <!-- Start Screen -->
 <div id="startScreen" class="start-screen">
   <div class="mic-icon"><svg width="28" height="28" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="9" y="1" width="6" height="11" rx="3"/><path d="M5 10a7 7 0 0 0 14 0"/><line x1="12" y1="17" x2="12" y2="21"/><line x1="8" y1="21" x2="16" y2="21"/></svg></div>
   <h1 class="survey-title">Customer Feedback Survey</h1>
   <p>Answer four quick questions using your voice. You can also click to select answers.</p>
   <button class="btn btn-primary" id="btnStart">Start Survey</button>
 </div>


 <!-- Question Screen -->
 <div id="questionScreen" style="display:none;">
   <div class="survey-progress" id="progressText">Question 1 of 4</div>
   <div class="progress-bar"><div class="progress-fill" id="progressFill" style="width:25%"></div></div>


   <div class="question-text" id="questionText"></div>


   <!-- Rating UI -->
   <div class="rating-bar" id="ratingBar" style="display:none;"></div>


   <!-- Multiple Choice UI -->
   <div class="response-options" id="choiceOptions" style="display:none;"></div>


   <!-- Transcription UI -->
   <textarea class="transcription-box" id="transcriptionBox" style="display:none;" placeholder="Your spoken response will appear here…"></textarea>


   <div class="voice-status status-idle" id="voiceStatus">
     <span class="status-dot"></span>
     <span id="statusText">Waiting…</span>
   </div>


   <div class="nav-buttons">
     <button class="btn btn-secondary" id="btnBack" disabled>Previous Page</button>
     <button class="btn btn-primary" id="btnNext">Next Page</button>
   </div>


   <div class="voice-hint">
     Say <kbd>Next Page</kbd> <kbd>Previous Page</kbd> or <kbd>Submit Survey</kbd> to navigate
   </div>
 </div>


 <!-- End Screen -->
 <div id="endScreen" style="display:none;" class="end-screen">
   <div class="mic-icon"><svg width="28" height="28" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><polyline points="20 6 9 17 4 12"/></svg></div>
   <h1 class="survey-title">Thank You!</h1>
   <p>Your feedback has been recorded.</p>
   <div class="response-summary" id="responseSummary"></div>
   <button class="btn btn-primary" id="btnRestart">Start New Survey</button>
 </div>


</div>


<script>


// Configuration
const ACCESS_KEY = "${ACCESS_KEY_HERE}";


const VAD_THRESHOLD = 0.5;
const VAD_FRAMES_REQUIRED = 1;
const KEYWORD_COOLDOWN_MS = 300;


// Survey Questions
const SURVEY_QUESTIONS = [
 {
   id: "nps",
   type: "rating",
   text: "On a scale of 1 to 5, how likely are you to recommend us to a friend or colleague?",
   responseMode: "intent",
   intentName: "giveRating",
   slotName: "score"
 },
 {
   id: "satisfaction",
   type: "choice",
   text: "How would you describe your overall experience?",
   options: ["Very Satisfied", "Satisfied", "Neutral", "Dissatisfied", "Very Dissatisfied"],
   responseMode: "intent",
   intentName: "selectChoice",
   slotName: "choice"
 },
 {
   id: "recommend",
   type: "yesno",
   text: "Would you use our service again?",
   options: ["Yes", "No"],
   responseMode: "intent",
   intentName: "answerYesNo",
   slotName: "answer"
 },
 {
   id: "feedback",
   type: "open",
   text: "What could we do to improve your experience?",
   responseMode: "transcription"
 }
];


// State
let cobra = null;
let porcupine = null;
let rhino = null;
let cheetah = null;
let currentQuestionIndex = 0;
let responses = {};
let isTranscribing = false;
let rhinoActive = false;
let surveyActive = false;
let voiceActivityCounter = 0;
let keywordCooldownUntil = 0;


// DOM References
const startScreen = document.getElementById("startScreen");
const questionScreen = document.getElementById("questionScreen");
const endScreen = document.getElementById("endScreen");
const progressText = document.getElementById("progressText");
const progressFill = document.getElementById("progressFill");
const questionText = document.getElementById("questionText");
const ratingBar = document.getElementById("ratingBar");
const choiceOptions = document.getElementById("choiceOptions");
const transcriptionBox = document.getElementById("transcriptionBox");
const voiceStatus = document.getElementById("voiceStatus");
const statusText = document.getElementById("statusText");
const btnStart = document.getElementById("btnStart");
const btnBack = document.getElementById("btnBack");
const btnNext = document.getElementById("btnNext");
const btnRestart = document.getElementById("btnRestart");
const responseSummary = document.getElementById("responseSummary");




// Initialize Engines
async function initEngines() {
 // 1. Cobra VAD
 cobra = await CobraWeb.CobraWorker.create(ACCESS_KEY, cobraVoiceCallback);


 // 2. Porcupine for navigation keywords
 const porcupineModel = { publicPath: "porcupine_params.pv" };
 const keywordModels = [
   { publicPath: "next-page.ppn", label: "nextPage" },
   { publicPath: "prev-page.ppn", label: "previousPage" },
   { publicPath: "submit-survey.ppn", label: "submitSurvey" },
 ];
 porcupine = await PorcupineWeb.PorcupineWorker.create(
   ACCESS_KEY, keywordModels, porcupineKeywordCallback, porcupineModel);


 // 3. Rhino Speech-to-Intent
 const rhinoModel = { publicPath: "rhino_params.pv" };
 const rhinoContext = { publicPath: "voice-survey.rhn" };
 rhino = await RhinoWeb.RhinoWorker.create(
   ACCESS_KEY, rhinoContext, rhinoInferenceCallback, rhinoModel);


 // 4. Cheetah Streaming STT
 const cheetahModel = { publicPath: "cheetah_params.pv" };
 cheetah = await CheetahWeb.CheetahWorker.create(
   ACCESS_KEY, cheetahTranscriptCallback, cheetahModel);


 // Subscribe always-on engines
 await WebVoiceProcessor.WebVoiceProcessor.subscribe(cobra);
 await WebVoiceProcessor.WebVoiceProcessor.subscribe(porcupine);
}


// Cobra VAD
function cobraVoiceCallback(voiceProbability) {
 if (!surveyActive) return;
 if (rhinoActive) return;
 if (isTranscribing) return;
 if (Date.now() < keywordCooldownUntil) return;


 if (voiceProbability > VAD_THRESHOLD) {
   voiceActivityCounter++;
   if (voiceActivityCounter >= VAD_FRAMES_REQUIRED) {
     activateEngineForCurrentQuestion();
     voiceActivityCounter = 0;
   }
 } else {
   voiceActivityCounter = 0;
 }
}


async function activateEngineForCurrentQuestion() {
 const question = SURVEY_QUESTIONS[currentQuestionIndex];
 if (!question) return;


 if (question.responseMode === "intent") {
   if (rhinoActive) return;
   rhinoActive = true;
   await WebVoiceProcessor.WebVoiceProcessor.subscribe(rhino);
   setStatus("listening", "Listening — speak your answer");
 } else if (question.responseMode === "transcription") {
   if (isTranscribing) return;
   isTranscribing = true;
   await WebVoiceProcessor.WebVoiceProcessor.subscribe(cheetah);
   setStatus("listening", "Listening — speak freely");
 }
}


// Porcupine Keywords
async function porcupineKeywordCallback(detection) {
 console.log("Porcupine detected:", detection.label);
 keywordCooldownUntil = Date.now() + KEYWORD_COOLDOWN_MS;


 if (rhinoActive) {
   rhinoActive = false;
   await WebVoiceProcessor.WebVoiceProcessor.unsubscribe(rhino);
 }


 if (isTranscribing) {
   await stopTranscription();
   cleanKeywordFromTranscription();
 }


 switch (detection.label) {
   case "nextPage":
     goToNext();
     break;
   case "previousPage":
     goToPrevious();
     break;
   case "submitSurvey":
     submitSurvey();
     break;
 }
}


function cleanKeywordFromTranscription() {
 const el = document.getElementById("transcriptionBox");
 const keywords = ["next page", "previous page", "submit survey"];
 let text = el.value;
 for (const kw of keywords) {
   const regex = new RegExp("\\s*" + kw + "\\s*\\.?\\s*$", "i");
   text = text.replace(regex, "");
 }
 el.value = text.trimEnd();
 responses[SURVEY_QUESTIONS[currentQuestionIndex].id] = el.value.trim();
}


// Rhino Voice Commands
function rhinoInferenceCallback(inference) {
 if (!inference.isFinalized) return;


 rhinoActive = false;
 WebVoiceProcessor.WebVoiceProcessor.unsubscribe(rhino);
 console.log("Rhino inference:", JSON.stringify(inference));


 if (inference.isUnderstood) {
   handleSurveyIntent(inference.intent, inference.slots);
 } else {
   setStatus("listening", "Didn't catch that — try again");
 }
}


function handleSurveyIntent(intent, slots) {
 console.log("handleSurveyIntent:", intent, "slots:", JSON.stringify(slots));
 const question = SURVEY_QUESTIONS[currentQuestionIndex];
 if (!question) return;


 if (question.intentName && intent === question.intentName) {
   if (question.type === "rating") {
     const slotValue = slots[question.slotName];
     const numericScore = slotValue ? parseInt(slotValue) : null;
     if (numericScore && numericScore >= 1 && numericScore <= 5) {
       responses[question.id] = numericScore;
       highlightRating(numericScore);
       setStatus("listening", `Got it — ${numericScore} out of 5`);
     } else {
       setStatus("listening", "Please say a number between 1 and 5");
     }
   } else {
     const slotValue = slots[question.slotName];
     if (slotValue) {
       responses[question.id] = slotValue;
       highlightChoice(slotValue);
       setStatus("listening", `Got it — "${slotValue}"`);
     }
   }
 }
}


// Cheetah Transcription
function cheetahTranscriptCallback(cheetahTranscript) {
 if (!isTranscribing) return;
 const el = document.getElementById("transcriptionBox");
 if (cheetahTranscript.transcript) {
   el.value += cheetahTranscript.transcript;
   responses[SURVEY_QUESTIONS[currentQuestionIndex].id] = el.value.trim();
 }
 if (cheetahTranscript.isEndpoint) {
   el.value += "\n";
 }
}


async function stopTranscription() {
 if (!isTranscribing) return;
 isTranscribing = false;
 const flushed = await cheetah.flush();
 await WebVoiceProcessor.WebVoiceProcessor.unsubscribe(cheetah);


 const el = document.getElementById("transcriptionBox");
 if (flushed && flushed.transcript) {
   el.value += flushed.transcript;
 }
 el.value = el.value.trimEnd();
 responses[SURVEY_QUESTIONS[currentQuestionIndex].id] = el.value.trim();
}


// Survey Flow
async function startSurvey() {
 surveyActive = true;
 currentQuestionIndex = 0;
 responses = {};


 startScreen.style.display = "none";
 endScreen.style.display = "none";
 questionScreen.style.display = "block";


 showQuestion(0);
}


function showQuestion(index) {
 const question = SURVEY_QUESTIONS[index];
 if (!question) return;


 currentQuestionIndex = index;


 progressText.textContent = `Question ${index + 1} of ${SURVEY_QUESTIONS.length}`;
 progressFill.style.width = `${((index + 1) / SURVEY_QUESTIONS.length) * 100}%`;


 questionText.textContent = question.text;


 // Hide all response UIs
 ratingBar.style.display = "none";
 choiceOptions.style.display = "none";
 transcriptionBox.style.display = "none";


 // Show appropriate response UI
 if (question.type === "rating") {
   buildRatingUI();
   ratingBar.style.display = "flex";
   if (responses[question.id]) highlightRating(responses[question.id]);
 } else if (question.type === "choice" || question.type === "yesno") {
   buildChoiceUI(question.options);
   choiceOptions.style.display = "flex";
   if (responses[question.id]) highlightChoice(responses[question.id]);
 } else if (question.type === "open") {
   transcriptionBox.style.display = "block";
   transcriptionBox.value = responses[question.id] || "";
 }


 btnBack.disabled = (index === 0);
 btnNext.textContent = (index === SURVEY_QUESTIONS.length - 1) ? "Submit Survey" : "Next Page";


 setStatus("listening", question.responseMode === "transcription"
   ? "Listening — speak freely"
   : "Listening — speak your answer");
}


async function goToNext() {
 if (isTranscribing) {
   await stopTranscription();
   cleanKeywordFromTranscription();
 }
 if (rhinoActive) {
   rhinoActive = false;
   try { await WebVoiceProcessor.WebVoiceProcessor.unsubscribe(rhino); } catch(e) {}
 }


 if (currentQuestionIndex < SURVEY_QUESTIONS.length - 1) {
   showQuestion(currentQuestionIndex + 1);
 } else {
   submitSurvey();
 }
}


async function goToPrevious() {
 if (isTranscribing) {
   await stopTranscription();
   cleanKeywordFromTranscription();
 }
 if (rhinoActive) {
   rhinoActive = false;
   try { await WebVoiceProcessor.WebVoiceProcessor.unsubscribe(rhino); } catch(e) {}
 }


 if (currentQuestionIndex > 0) {
   showQuestion(currentQuestionIndex - 1);
 }
}


async function submitSurvey() {
 surveyActive = false;


 if (isTranscribing) {
   await stopTranscription();
   cleanKeywordFromTranscription();
 }
 if (rhinoActive) {
   rhinoActive = false;
   try { await WebVoiceProcessor.WebVoiceProcessor.unsubscribe(rhino); } catch(e) {}
 }


 questionScreen.style.display = "none";
 endScreen.style.display = "block";


 let summaryHTML = "";
 SURVEY_QUESTIONS.forEach(q => {
   const value = responses[q.id] || "—";
   const label = q.type === "rating" ? "NPS Rating"
     : q.type === "choice" ? "Overall Experience"
     : q.type === "yesno" ? "Would Use Again"
     : "Open Feedback";
   summaryHTML += `<div class="summary-item">
     <div class="summary-label">${label}</div>
     <div class="summary-value">${value}</div>
   </div>`;
 });
 responseSummary.innerHTML = summaryHTML;


 console.log("Survey responses:", responses);
}


// UI Builders
function buildRatingUI() {
 ratingBar.innerHTML = "";
 for (let i = 1; i <= 5; i++) {
   const el = document.createElement("div");
   el.className = "rating-num";
   el.textContent = i;
   el.addEventListener("click", () => {
     responses[SURVEY_QUESTIONS[currentQuestionIndex].id] = i;
     highlightRating(i);
   });
   ratingBar.appendChild(el);
 }
}


function buildChoiceUI(options) {
 choiceOptions.innerHTML = "";
 options.forEach(opt => {
   const el = document.createElement("div");
   el.className = "option-chip";
   el.textContent = opt;
   el.addEventListener("click", () => {
     responses[SURVEY_QUESTIONS[currentQuestionIndex].id] = opt.toLowerCase();
     highlightChoice(opt.toLowerCase());
   });
   choiceOptions.appendChild(el);
 });
}


function highlightRating(score) {
 document.querySelectorAll(".rating-num").forEach(el => {
   el.classList.toggle("selected", parseInt(el.textContent) === score);
 });
}


function highlightChoice(value) {
 document.querySelectorAll(".option-chip").forEach(el => {
   el.classList.toggle("selected", el.textContent.toLowerCase() === value.toLowerCase());
 });
}


function setStatus(state, text) {
 voiceStatus.className = `voice-status status-${state}`;
 statusText.textContent = text;
}


// Event Listeners
btnStart.addEventListener("click", async () => {
 btnStart.disabled = true;
 btnStart.textContent = "Loading voice engines…";


 try {
   await initEngines();
   startSurvey();
 } catch (error) {
   console.error("Failed to initialize:", error);
   btnStart.textContent = "Error — check console";
 }
});


btnNext.addEventListener("click", goToNext);
btnBack.addEventListener("click", goToPrevious);
btnRestart.addEventListener("click", () => {
 endScreen.style.display = "none";
 startScreen.style.display = "flex";
 btnStart.disabled = false;
 btnStart.textContent = "Start Survey";
});


// Cleanup
window.addEventListener("beforeunload", async () => {
 try {
   await WebVoiceProcessor.WebVoiceProcessor.reset();
   if (cobra) { await cobra.release(); await cobra.terminate(); }
   if (porcupine) { await porcupine.release(); await porcupine.terminate(); }
   if (rhino) { await rhino.release(); await rhino.terminate(); }
   if (cheetah) { await cheetah.release(); await cheetah.terminate(); }
 } catch (e) {}
});
</script>
</body>
</html>

Adapt the Survey for Different Use Cases

The data-driven question structure makes it straightforward to adapt this survey for different marketing and feedback scenarios. Update the SURVEY_QUESTIONS array and the Rhino YAML context to match your domain.

Event Feedback: Replace the questions with event-specific prompts — "How would you rate today's event?", "Which session was most valuable?", "What topics would you like to see next time?" Add session names as slot values so respondents can reference specific sessions by voice.

Patient Satisfaction: In healthcare settings, on-device processing simplifies compliance. No patient audio leaves the device. Questions can include satisfaction with wait times, care quality, and staff communication. The open-ended follow-up captures qualitative feedback that structured scales miss.

In-Product Feedback: Embed the survey inside a web application triggered after a user completes a workflow. Voice captures the responses faster than typing, and all processing stays client-side.

Kiosk / Point-of-Sale: A tablet at checkout asks three quick questions after purchase. On-device processing means the terminal works without reliable WiFi.

For each use case, the changes are the same: update the questions array, update the YAML with new intents and slots, retrain the Rhino Speech-to-Intent context, and add handling for any new intent names in the handleSurveyIntent function.

Frequently Asked Questions

Is voice survey data private if speech is processed in the browser?

With on-device processing via WebAssembly, microphone audio is processed locally and never leaves the browser. No audio is uploaded, stored, or accessible to any third party. Only the structured response data (e.g., a rating of 8 or a transcribed sentence) is available to your application. This simplifies compliance with GDPR, CCPA, HIPAA, and other privacy regulations.

Can a voice survey work offline on a kiosk or tablet?

Yes. After the initial page load downloads the WebAssembly models, all speech processing runs locally with no network requests. This makes voice surveys suitable for kiosk deployments, retail terminals, and locations with unreliable connectivity. The Picovoice AccessKey requires an initial online validation, after which it caches for offline use.

How accurate is speech recognition for collecting NPS scores and ratings by voice?

For structured responses like NPS scores and multiple-choice answers, speech-to-intent is significantly more accurate than transcribing speech and parsing text. Because every valid response is defined in advance, the model only needs to distinguish between known options, not transcribe arbitrary speech. This eliminates misheard values and invalid responses.

Can I customize a voice survey for different industries or question types?

Yes. The survey is data-driven — update the questions array and the speech-to-intent YAML context to match your domain. For healthcare patient satisfaction, add medical terminology. For event feedback, add session names as slot values. For product research, add product-specific terms. Open-ended questions use general speech-to-text and require no YAML changes.