Build a Voice-Powered Inspection Form for the Web

🎯 Voice AI Consulting

Get dedicated support and consultation to ensure your specific needs are met.

TLDR: Build a voice-powered inspection form for the web with hands-free voice form filling and voice data entry. Structured voice commands map directly to form fields without an LLM, and all processing runs locally in the browser via WebAssembly. Adapt this template for inspection reporting, safety audits, maintenance logs, and insurance documentation.

Hands-Free Voice Data Entry for Inspections and Field Reporting

Inspection and field reporting workflows often require capturing structured data while hands are busy, gloved, or focused on equipment. To reduce this friction, users can fill forms by voice, setting dropdowns, toggling checklist items, and dictating notes in real time.

This guide shows how to build a voice-powered inspection form for the web that combines:

Voice activity detection to automatically activate voice input when speech is detected
Spoken language understanding for deterministic voice data entry without routing commands through an LLM pipeline
Keyword detection for form actions like starting notes, clearing form, and submitting form
Streaming speech-to-text for free-form voice notes

All speech processing runs locally in the browser using WebAssembly, so microphone audio does not need to be streamed to a cloud speech API. This eliminates network latency for real-time voice input and helps keep voice data private and GDPR/CCPA compliant in regulated environments.

What You'll Build

As a working example, this tutorial builds a roof inspection reporting form that can be completed entirely by voice. The resulting interface supports:

Hands-free voice data entry activated automatically by voice activity detection
Multi-checkbox toggling from a single voice command
Keyword-triggered actions for starting notes, clearing the form, and submitting the form
Real-time dictation for free-form voice notes
Full inspection completion without any keyboard interaction

This same architecture applies to field inspections, equipment maintenance logs, insurance claims, safety audits, healthcare intake workflows, and any form with structured fields and free-form notes.

What You'll Need

Node.js (download page)
Picovoice AccessKey from the Picovoice Console
A microphone-equipped device for testing (laptop or desktop)

How to Fill Web Forms by Voice without an LLM

For inspection reporting, the goal is simple: user speech should map to the correct form field, and notes should stay open-ended. Many voice-enabled forms take a speech-to-text first approach, then use natural language understanding (NLU) or a large language model (LLM) to map the transcript into dropdown values, checkboxes, and select fields. This pipeline adds extra steps (transcription + parsing), which increase latency and create avoidable edge cases like misheard values, invalid dropdown options, or routing a value to the wrong field.

For voice form filling with known fields and fixed option values, an LLM is unnecessary overhead. A cleaner pattern is speech-to-intent, which extracts intent and slot values directly from audio without any intermediate text:

Voice input: "set priority to urgent"
JSON output: { intent: "setPriority", slots: { priority: "urgent" } }

Since valid commands and field values are defined in advance, results are deterministic — the same command always produces the same structured output. This tutorial uses speech-to-intent for all structured fields (dropdowns, checkboxes) and reserves streaming speech-to-text for free-form notes, where open-ended input is expected. To go further, a local LLM like picoLLM can process the captured notes for summarization, report generation, and more advanced features.

Voice Inspection Form System Architecture

The application uses four specialized voice engines working together through a shared audio stream:

Cobra Voice Activity Detection: Monitors the microphone continuously and activates spoken language understanding or speech-to-intent when voice probability exceeds a threshold.
Rhino Speech-to-Intent: Activates when Cobra detects speech, outputting intent and slot values directly from user audio, achieving 97%+ accuracy in noisy, real-world environments.
Porcupine Wake Word: Listens continuously for action keywords e.g., "Start Notes", "Clear Form", and "Submit Form" to trigger the corresponding form action when detected.
Cheetah Streaming Speech-to-Text: Transcribes open-ended notes in real time, appending words to the notes field as the inspector speaks.

All four voice engines share a single microphone stream through Web Voice Processor.

Set Up the Web Voice Form Project

Initialize a new project and install the required packages:

mkdir voice-inspection-form && cd voice-inspection-form
npm init -y

Install the speech SDKs and a local development server:

npm install http-server @picovoice/cobra-web @picovoice/porcupine-web @picovoice/rhino-web @picovoice/cheetah-web @picovoice/web-voice-processor

@picovoice/cobra-web: Cobra Voice Activity Detection SDK
@picovoice/porcupine-web: Porcupine Wake Word detection SDK
@picovoice/rhino-web: Rhino Speech-to-Intent SDK
@picovoice/cheetah-web: Cheetah Streaming Speech-to-Text SDK
@picovoice/web-voice-processor: Voice Processor SDK for microphone audio
http-server: Local server for testing

Train Custom Wake Words for Voice Form

Porcupine Wake Word detects three action keywords that control the form: "Start Notes", "Clear Form", and "Submit Form". Each keyword is trained as a separate .ppn model.

Sign up for a Picovoice Console account and navigate to the Porcupine page.
Enter your keyword such as "Start Notes" and test it using the microphone button.
Click "Train", select "Web (WASM)" as the target platform, and download the .ppn model file in the project root.
Repeat steps 2 & 3 for additional keywords:
- "Clear Form"
- "Submit Form"

For tips on designing effective keywords, review the choosing a wake word guide.

Define Voice Commands for the Inspection Form

Rhino Speech-to-Intent needs a context that maps spoken phrases to intents and slot values. Unlike an LLM prompt, this context is deterministic: every phrase you define will always produce the same structured output.

In the Rhino section of Picovoice Console, create a new context for your voice form.
Click the "Import YAML" button in the top-right corner of the Console. Paste the YAML provided below to add the inspection form voice commands.
Train the context for the "Web (WASM)" platform and download the .rhn model file.
Download the Rhino default model (rhino_params.pv) and place both files in the project root.

Train Custom Voice Commands to Fill Web Form using YAML Context:

context:
  expressions:
    setInspectionType:
      - (@polite) set (the) inspection (type) to $inspectionType:inspectionType
      - (@polite) make it [a, an] $inspectionType:inspectionType inspection
      - $inspectionType:inspectionType inspection please
      - "[it's, this is] [a, an] $inspectionType:inspectionType inspection"
    setPriority:
      - (@polite) set (the) priority [to, level] $priority:priority
      - (@polite) mark [it, this] (as) $priority:priority priority
      - $priority:priority priority
      - priority is $priority:priority
    setRoofType:
      - (@polite) set (the) roof (type) to $roofType:roofType
      - "[it's, this is] (a) $roofType:roofType roof"
      - roof type is $roofType:roofType
    setCondition:
      - (@polite) set (the) condition to $condition:condition
      - (the) condition is $condition:condition
      - "[it's, this is] in $condition:condition condition"
    toggleDamage:
      - (@polite) [mark, add] $damageType:damageType (damage)
      - "[there is, I see] $damageType:damageType"
      - "[there are, I see] $damageType:damageType and $damageType2:damageType2
        (in the roof)"
      - "[there are, I see] $damageType:damageType $damageType2:damageType2 and
        $damageType3:damageType3"
      - (@polite) [mark, add] $damageType:damageType and $damageType2:damageType2
      - $damageType:damageType (damage)
  slots:
    inspectionType:
      - initial
      - follow up
      - emergency
      - annual
      - pre purchase
      - insurance
    priority:
      - low
      - medium
      - high
      - urgent
    roofType:
      - shingle
      - metal
      - tile
      - flat
      - slate
      - wood shake
    condition:
      - good
      - fair
      - poor
      - critical
    damageType:
      - cracks
      - water damage
      - rust
      - mold
    damageType2:
      - cracks
      - water damage
      - rust
      - mold
    damageType3:
      - cracks
      - water damage
      - rust
      - mold
  macros:
    polite:
      - please
      - can you
      - could you

This speech-to-intent context defines five intents:

Four dropdown intents (setInspectionType, setPriority, setRoofType, setCondition) map voice commands directly to form dropdown values.
One checkbox intent (toggleDamage) toggles damage checkboxes. It supports multiple damage types in a single utterance using separate slot names and slot types (damageType, damageType2, damageType3), so saying "I see water damage and cracks" checks both boxes at once.

The bracket syntax handles natural phrasing variations — "set priority to high", "mark it as high", "high priority", and "priority is high" all resolve to the same intent with the same slot value. You define the vocabulary once, and the voice model handles the matching deterministically. To support additional phrasing, add more expressions to the YAML.

Refer to the Rhino Syntax Cheat Sheet for details on expression syntax, optional words, and slot types.

Download the Streaming Speech-to-Text Model

Cheetah Streaming Speech-to-Text requires a default language model file. Download cheetah_params.pv from the Cheetah repository and place it in the project root.

Create the Inspection Form HTML

Create an index.html file in the project root. The application loads all SDKs from node_modules:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Voice-Powered Roof Inspection Form</title>

  <script src="node_modules/@picovoice/cobra-web/dist/iife/index.js"></script>
  <script src="node_modules/@picovoice/porcupine-web/dist/iife/index.js"></script>
  <script src="node_modules/@picovoice/rhino-web/dist/iife/index.js"></script>
  <script src="node_modules/@picovoice/cheetah-web/dist/iife/index.js"></script>
  <script src="node_modules/@picovoice/web-voice-processor/dist/iife/index.js"></script>
</head>
<body>
  <!-- Form HTML goes here -->
  <script>
    // Voice logic goes here
  </script>
</body>
</html>

The complete HTML and CSS for the form UI is included in the full code at the end of this tutorial.

Add Voice Activity Detection for Automatic Voice Activation

Cobra Voice Activity Detection runs continuously and detects when someone is speaking. When the voice probability crosses a threshold, the system activates Rhino Speech-to-Intent to listen for a command:

const cobra = await CobraWeb.CobraWorker.create(
  ACCESS_KEY,
  cobraVoiceCallback,
);

await WebVoiceProcessor.WebVoiceProcessor.subscribe(cobra);

The callback monitors voice probability and activates Rhino when it detects speech:

let voiceActivityCounter = 0;

function cobraVoiceCallback(voiceProbability) {
  if (isDictating) return;
  if (rhinoActive) return;
  if (Date.now() < keywordCooldownUntil) return;

  if (voiceProbability > VAD_THRESHOLD) {
    voiceActivityCounter++;
    if (voiceActivityCounter >= VAD_FRAMES_REQUIRED) {
      activateRhino();
      voiceActivityCounter = 0;
    }
  } else {
    voiceActivityCounter = 0;
  }
}

The VAD_THRESHOLD and VAD_FRAMES_REQUIRED values prevent false activations from background noise. A threshold of 0.5 in a frame of detected speech provides a responsive activation trigger. The keywordCooldownUntil check prevents Cobra from immediately reactivating Rhino after a Porcupine keyword is detected.

Add Keyword Detection for Form Actions

Porcupine Wake Word listens for three action keywords continuously alongside Cobra Voice Activity Detection. When a keyword is detected, it triggers the corresponding form action.

const porcupineModel = {
  publicPath: "porcupine_params.pv",
};

const keywordModels = [
  { publicPath: "${START_NOTES_PPN}.ppn", label: "startNotes" },
  { publicPath: "${CLEAR_FORM_PPN}.ppn", label: "clearForm" },
  { publicPath: "${SUBMIT_FORM_PPN}.ppn", label: "submitForm" },
];

const porcupine = await PorcupineWeb.PorcupineWorker.create(
  ACCESS_KEY,
  keywordModels,
  porcupineKeywordCallback,
  porcupineModel,
);

await WebVoiceProcessor.WebVoiceProcessor.subscribe(porcupine);

A short cooldown prevents Cobra from immediately reactivating Rhino after a keyword is detected:

const KEYWORD_COOLDOWN_MS = 300;
let keywordCooldownUntil = 0;

async function porcupineKeywordCallback(detection) {
  console.log("Porcupine detected:", detection.label);
  keywordCooldownUntil = Date.now() + KEYWORD_COOLDOWN_MS;

  if (rhinoActive) {
    rhinoActive = false;
    await WebVoiceProcessor.WebVoiceProcessor.unsubscribe(rhino);
  }

  switch (detection.label) {
    case "startNotes":
      if (!isDictating) {
        await startDictation();
      }
      break;
    case "clearForm":
      if (isDictating) await stopDictation();
      cleanKeywordFromNotes();
      resetForm();
      break;
    case "submitForm":
      if (isDictating) await stopDictation();
      cleanKeywordFromNotes();
      submitForm();
      break;
  }
}

When a keyword is detected while Rhino is active, the callback unsubscribes Rhino first to avoid conflicting inferences.

Strip Keywords from Dictated Notes

Since Porcupine Wake Word and Cheetah Streaming Speech-to-Text run simultaneously during dictation, Cheetah may transcribe the keyword phrase (e.g., "submit form") before Porcupine detects it. The cleanKeywordFromNotes function strips these keyword phrases from the end of the notes field:

function cleanKeywordFromNotes() {
  const el = document.getElementById("f-notes");
  const keywords = ["submit form", "submit from", "clear form", "start notes", "stop notes"];
  let text = el.value;
  for (const kw of keywords) {
    const regex = new RegExp("\\s*" + kw + "\\s*\\.?\\s*$", "i");
    text = text.replace(regex, "");
  }
  el.value = text.trimEnd();
}

Fill Form Fields with Voice Commands

When Cobra Voice Activity Detection detects speech, Rhino Speech-to-Intent activates and listens for a voice command. The endpointDurationSec is set to 0.5 seconds for fast responses after the user finishes speaking.

const rhinoModel = {
  publicPath: "rhino_params.pv",
};

const rhinoContext = {
  publicPath: "${CONTEXT_FILE_NAME}.rhn",
};

const rhinoOptions = {
  endpointDurationSec: 0.5,
};

const rhino = await RhinoWeb.RhinoWorker.create(
  ACCESS_KEY,
  rhinoContext,
  rhinoInferenceCallback,
  rhinoModel,
  rhinoOptions,
);

Start Rhino when Cobra detects voice activity, and stop after the inference is finalized:

let rhinoActive = false;

async function activateRhino() {
  if (rhinoActive) return;
  rhinoActive = true;
  await WebVoiceProcessor.WebVoiceProcessor.subscribe(rhino);
}

function rhinoInferenceCallback(inference) {
  if (!inference.isFinalized) return;

  rhinoActive = false;
  WebVoiceProcessor.WebVoiceProcessor.unsubscribe(rhino);
  console.log("Rhino inference:", JSON.stringify(inference));

  if (inference.isUnderstood) {
    handleIntent(inference.intent, inference.slots);
  }
}

The handleIntent function routes each intent to the correct form field. The toggleDamage intent iterates over all damage-related slot values and toggles each one:

function handleIntent(intent, slots) {
  console.log("handleIntent:", intent, "slots:", JSON.stringify(slots));

  switch (intent) {
    case "setInspectionType":
      if (slots.inspectionType) setSelectField("f-inspectionType", slots.inspectionType);
      break;
    case "setPriority":
      if (slots.priority) setSelectField("f-priority", slots.priority);
      break;
    case "setRoofType":
      if (slots.roofType) setSelectField("f-roofType", slots.roofType);
      break;
    case "setCondition":
      if (slots.condition) setSelectField("f-condition", slots.condition);
      break;
    case "toggleDamage":
      const damageKeys = Object.keys(slots);
      console.log("Damage slot keys:", damageKeys);
      for (const key of damageKeys) {
        toggleDamageByVoice(slots[key]);
      }
      break;
  }
}

function setSelectField(id, val) {
  const el = document.getElementById(id);
  for (const o of el.options) {
    if (o.value.toLowerCase() === val.toLowerCase()) {
      el.value = o.value;
      return;
    }
  }
}

For example, saying "I see water damage and cracks" activates Rhino Speech-to-Intent via Cobra Voice Activity Detection, which toggles both checkboxes at once and returns:

{ intent: "toggleDamage", slots: { damageType: "water damage", damageType2: "cracks" } }

Saying "set priority to urgent" updates the priority dropdown and returns:

{ intent: "setPriority", slots: { priority: "urgent" } }

Add Real-Time Speech-to-Text for Voice Notes

Cheetah Streaming Speech-to-Text handles free-form voice notes. It requires a default language model file:

const cheetahModel = {
  publicPath: "cheetah_params.pv",
};

const cheetah = await CheetahWeb.CheetahWorker.create(
  ACCESS_KEY,
  cheetahTranscriptCallback,
  cheetahModel,
);

The callback appends transcribed text to the notes field in real time:

function cheetahTranscriptCallback(cheetahTranscript) {
  if (!isDictating) return;
  const el = document.getElementById("f-notes");
  if (cheetahTranscript.transcript) {
    el.value += cheetahTranscript.transcript;
  }
  if (cheetahTranscript.isEndpoint) {
    el.value += "\n";
  }
}

Dictation is controlled with explicit start and stop functions. When stopping, cheetah.flush() is called to capture any remaining buffered audio:

async function startDictation() {
  if (isDictating) return;
  isDictating = true;
  await WebVoiceProcessor.WebVoiceProcessor.subscribe(cheetah);
}

async function stopDictation() {
  if (!isDictating) return;
  isDictating = false;
  const flushed = await cheetah.flush();
  await WebVoiceProcessor.WebVoiceProcessor.unsubscribe(cheetah);

  const el = document.getElementById("f-notes");
  if (flushed && flushed.transcript) {
    el.value += flushed.transcript;
  }
  el.value = el.value.trimEnd();
}

The inspector starts notes by saying "Start Notes". Saying "Clear Form" or "Submit Form" while notes are active automatically stops dictation first and cleans any keyword text from the notes, since Porcupine keywords are always listening.

Complete Code Example: Voice Inspection Form

Here is the complete index.html with all HTML, CSS, and JavaScript:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Voice-Powered Roof Inspection Form</title>

  <script src="node_modules/@picovoice/cobra-web/dist/iife/index.js"></script>
  <script src="node_modules/@picovoice/porcupine-web/dist/iife/index.js"></script>
  <script src="node_modules/@picovoice/rhino-web/dist/iife/index.js"></script>
  <script src="node_modules/@picovoice/cheetah-web/dist/iife/index.js"></script>
  <script src="node_modules/@picovoice/web-voice-processor/dist/iife/index.js"></script>

  <link rel="preconnect" href="https://fonts.googleapis.com">
  <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">

  <style>
    *, *::before, *::after { margin: 0; padding: 0; box-sizing: border-box; }

    :root {
      --bg: #f8f9fb;
      --surface: #ffffff;
      --border: #e5e7eb;
      --border-strong: #d1d5db;
      --text: #111827;
      --text-secondary: #6b7280;
      --text-muted: #9ca3af;
      --accent: #4f46e5;
      --accent-light: #eef2ff;
      --accent-hover: #4338ca;
      --green: #059669;
      --green-light: #ecfdf5;
      --blue: #2563eb;
      --blue-light: #eff6ff;
      --red: #dc2626;
      --red-light: #fef2f2;
      --orange: #ea580c;
      --orange-light: #fff7ed;
      --radius: 10px;
      --radius-lg: 16px;
      --shadow: 0 1px 3px rgba(0,0,0,0.06), 0 1px 2px rgba(0,0,0,0.04);
      --shadow-lg: 0 10px 25px rgba(0,0,0,0.08);
    }

    body {
      font-family: 'Inter', -apple-system, BlinkMacSystemFont, sans-serif;
      background: var(--bg);
      color: var(--text);
      min-height: 100vh;
      line-height: 1.5;
      -webkit-font-smoothing: antialiased;
    }

    /* ── Header ── */
    .header {
      background: var(--surface);
      border-bottom: 1px solid var(--border);
      padding: 16px 24px;
      display: flex; align-items: center; justify-content: space-between;
    }
    .header-left { display: flex; align-items: center; gap: 12px; }
    .header-icon {
      width: 36px; height: 36px; border-radius: 10px;
      background: var(--accent-light); color: var(--accent);
      display: flex; align-items: center; justify-content: center;
      font-size: 18px;
    }
    .header h1 { font-size: 16px; font-weight: 700; letter-spacing: -0.3px; }
    .header-date { font-size: 13px; color: var(--text-muted); }

    /* ── Floating Mic Visualizer ── */
    .mic-fab {
      position: fixed; bottom: 28px; right: 28px; z-index: 200;
      width: 56px; height: 56px; border-radius: 50%;
      background: var(--surface); border: 2px solid var(--border);
      display: flex; align-items: center; justify-content: center;
      box-shadow: var(--shadow-lg);
      transition: all 0.3s;
    }
    .mic-fab.listening {
      border-color: var(--blue);
      box-shadow: 0 0 0 6px rgba(37,99,235,0.12), var(--shadow-lg);
    }
    .mic-fab.dictating {
      border-color: var(--blue);
      box-shadow: 0 0 0 6px rgba(37,99,235,0.15), var(--shadow-lg);
      animation: mic-pulse 1.5s ease-in-out infinite;
    }
    .mic-fab.error { border-color: var(--red); }

    @keyframes mic-pulse {
      0%, 100% { box-shadow: 0 0 0 6px rgba(37,99,235,0.15), var(--shadow-lg); }
      50% { box-shadow: 0 0 0 12px rgba(37,99,235,0.08), var(--shadow-lg); }
    }

    .mic-icon {
      width: 22px; height: 22px; transition: transform 0.2s;
      color: var(--text-muted);
    }
    .mic-fab.listening .mic-icon { color: var(--blue); transform: scale(1.1); }
    .mic-fab.dictating .mic-icon { color: var(--blue); transform: scale(1.1); }

    .mic-rings {
      position: absolute; inset: -4px; border-radius: 50%;
      border: 2px solid transparent; transition: all 0.15s;
    }
    .mic-fab.listening .mic-rings { border-color: rgba(37,99,235,0.2); }
    .mic-fab.dictating .mic-rings { border-color: rgba(37,99,235,0.25); }

    .mic-vad {
      position: absolute; bottom: -10px; left: 50%; transform: translateX(-50%);
      width: 32px; height: 3px; border-radius: 2px;
      background: var(--border); overflow: hidden;
    }
    .mic-vad-fill {
      height: 100%; width: 0%; border-radius: 2px;
      background: var(--blue); transition: width 0.08s;
    }

    /* ── Layout ── */
    .container { max-width: 720px; margin: 0 auto; padding: 24px 20px 100px; }

    /* ── Section ── */
    .section {
      background: var(--surface);
      border: 1px solid var(--border);
      border-radius: var(--radius-lg);
      margin-bottom: 16px;
      box-shadow: var(--shadow);
      overflow: hidden;
    }
    .section-header {
      padding: 16px 20px 12px;
      display: flex; align-items: center; justify-content: space-between;
    }
    .section-title {
      font-size: 15px; font-weight: 700; color: var(--text);
      display: flex; align-items: center; gap: 8px;
    }
    .section-body { padding: 0 20px 20px; }

    /* ── Form Fields ── */
    .field { margin-bottom: 16px; }
    .field:last-child { margin-bottom: 0; }

    .field-label {
      display: flex; align-items: center; justify-content: space-between;
      margin-bottom: 6px;
    }
    .field-label span {
      font-size: 13px; font-weight: 600; color: var(--text);
    }

    select, textarea {
      width: 100%; padding: 10px 14px;
      font-family: 'Inter', sans-serif; font-size: 14px;
      color: var(--text); background: var(--bg);
      border: 1.5px solid var(--border); border-radius: var(--radius);
      outline: none; transition: all 0.15s;
      appearance: none;
      background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 12 12'%3E%3Cpath d='M3 4.5L6 7.5L9 4.5' stroke='%239ca3af' stroke-width='1.5' fill='none' stroke-linecap='round'/%3E%3C/svg%3E");
      background-repeat: no-repeat;
      background-position: right 12px center;
    }
    textarea {
      background-image: none;
      resize: vertical; min-height: 100px; line-height: 1.6;
    }
    select:focus, textarea:focus {
      border-color: var(--accent);
      background-color: var(--surface);
      box-shadow: 0 0 0 3px rgba(79,70,229,0.08);
    }
    textarea::placeholder { color: var(--text-muted); }

    .voice-filled {
      border-color: var(--green) !important;
      background-color: var(--green-light) !important;
      animation: field-flash 0.3s ease;
    }
    @keyframes field-flash {
      0% { box-shadow: 0 0 0 4px rgba(5,150,105,0.15); }
      100% { box-shadow: 0 0 0 0 rgba(5,150,105,0); }
    }

    /* ── Damage Grid ── */
    .damage-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 8px; }
    .damage-item {
      display: flex; align-items: center; gap: 8px;
      padding: 10px 12px; background: var(--bg);
      border: 1.5px solid var(--border); border-radius: var(--radius);
      font-size: 13px; font-weight: 500;
      cursor: pointer; transition: all 0.15s; user-select: none;
    }
    .damage-item:hover { border-color: var(--border-strong); background: var(--surface); }
    .damage-item.checked {
      border-color: var(--accent); background: var(--accent-light); color: var(--accent);
      animation: field-flash 0.5s ease;
    }
    .damage-item input[type="checkbox"] {
      width: 0; height: 0; opacity: 0; position: absolute; pointer-events: none;
    }
    .damage-check {
      width: 16px; height: 16px;
      border: 1.5px solid var(--border-strong); border-radius: 4px;
      display: flex; align-items: center; justify-content: center;
      flex-shrink: 0; font-size: 10px; color: transparent; transition: all 0.15s;
    }
    .damage-item.checked .damage-check {
      background: var(--accent); border-color: var(--accent); color: white;
    }

    /* ── Toast ── */
    .toast-container {
      position: fixed; top: 16px; left: 50%; transform: translateX(-50%);
      z-index: 300; display: flex; flex-direction: column; gap: 8px; align-items: center;
    }
    .toast {
      background: var(--text); color: white;
      padding: 10px 20px; border-radius: var(--radius);
      font-size: 13px; font-weight: 600;
      opacity: 0; transform: translateY(-8px);
      transition: all 0.20s; pointer-events: none;
      white-space: nowrap; box-shadow: var(--shadow-lg);
    }
    .toast.show { opacity: 1; transform: translateY(0); }
    .toast.success { background: var(--green); }
    .toast.action { background: var(--accent); }

    /* ── Buttons ── */
    .form-actions {
      display: flex; gap: 10px; margin-top: 4px;
    }
    button {
      font-family: 'Inter', sans-serif; font-size: 14px; font-weight: 600;
      cursor: pointer; border: none; border-radius: var(--radius);
      padding: 11px 22px; transition: all 0.15s;
    }
    .btn-primary {
      background: var(--accent); color: white; flex: 1;
    }
    .btn-primary:hover { background: var(--accent-hover); }
    .btn-secondary {
      background: var(--surface); color: var(--text);
      border: 1.5px solid var(--border);
    }
    .btn-secondary:hover { background: var(--bg); }

    /* ── Action Chips ── */
    .actions-bar {
      display: flex; gap: 8px; margin-bottom: 16px;
    }
    .action-chip {
      display: flex; align-items: center; gap: 6px;
      padding: 8px 14px; border-radius: 100px;
      background: var(--surface); border: 1.5px solid var(--border);
      font-size: 12px; font-weight: 600; color: var(--text-secondary);
      cursor: default; transition: all 0.15s;
    }
    .action-chip .chip-icon { font-size: 14px; }
    .action-chip .chip-key {
      font-family: 'JetBrains Mono', monospace;
      font-size: 10px; font-weight: 500;
      background: var(--bg); padding: 1px 6px; border-radius: 4px;
      color: var(--text-muted);
    }

    /* ── Confirmation Page ── */
    .confirmation {
      display: none; text-align: center; padding: 48px 24px;
    }
    .confirmation.show { display: block; }
    .confirmation-icon {
      width: 64px; height: 64px; border-radius: 50%;
      background: var(--green-light); color: var(--green);
      display: inline-flex; align-items: center; justify-content: center;
      font-size: 32px; margin-bottom: 16px;
    }
    .confirmation h2 {
      font-size: 20px; font-weight: 700; margin-bottom: 4px;
    }
    .confirmation .subtitle {
      font-size: 14px; color: var(--text-secondary); margin-bottom: 28px;
    }
    .summary-card {
      background: var(--surface); border: 1px solid var(--border);
      border-radius: var(--radius-lg); text-align: left;
      max-width: 480px; margin: 0 auto 20px; overflow: hidden;
      box-shadow: var(--shadow);
    }
    .summary-card-title {
      padding: 12px 20px; border-bottom: 1px solid var(--border);
      font-size: 12px; font-weight: 700; text-transform: uppercase;
      letter-spacing: 0.6px; color: var(--text-muted); background: var(--bg);
    }
    .summary-row {
      display: flex; justify-content: space-between; align-items: baseline;
      padding: 10px 20px; border-bottom: 1px solid var(--border);
      font-size: 14px;
    }
    .summary-row:last-child { border-bottom: none; }
    .summary-label { color: var(--text-secondary); font-size: 13px; }
    .summary-value { font-weight: 600; text-align: right; max-width: 60%; }
    .summary-value.empty { color: var(--text-muted); font-weight: 400; font-style: italic; }
    .summary-notes {
      padding: 12px 20px; font-size: 13px; line-height: 1.6;
      white-space: pre-wrap; word-break: break-word;
    }
    .summary-notes.empty { color: var(--text-muted); font-style: italic; }

    @media (max-width: 640px) {
      .damage-grid { grid-template-columns: 1fr 1fr; }
      .container { padding: 16px 16px 100px; }
      .actions-bar { flex-wrap: wrap; }
    }
  </style>
</head>
<body>

<div class="toast-container" id="toastContainer"></div>

<!-- Floating Mic -->
<div class="mic-fab" id="micFab">
  <div class="mic-rings"></div>
  <svg class="mic-icon" id="micIcon" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
    <rect x="9" y="1" width="6" height="12" rx="3"/>
    <path d="M19 10v1a7 7 0 0 1-14 0v-1"/>
    <line x1="12" y1="19" x2="12" y2="23"/>
    <line x1="8" y1="23" x2="16" y2="23"/>
  </svg>
  <div class="mic-vad"><div id="vadFill" class="mic-vad-fill"></div></div>
</div>

<div class="header">
  <div class="header-left">
    <div class="header-icon">🏠</div>
    <h1>Roof Inspection Form</h1>
  </div>
  <span class="header-date" id="dateDisplay"></span>
</div>

<div class="container">
<div id="formView">

  <!-- Voice Action Chips -->
  <div class="actions-bar">
    <div class="action-chip"><span class="chip-icon">📝</span> Notes <span class="chip-key">say "Start Notes"</span></div>
    <div class="action-chip"><span class="chip-icon">🗑</span> Reset <span class="chip-key">say "Clear Form"</span></div>
    <div class="action-chip"><span class="chip-icon">✓</span> Done <span class="chip-key">say "Submit Form"</span></div>
  </div>

  <!-- Assessment -->
  <div class="section">
    <div class="section-header">
      <span class="section-title">Assessment</span>
    </div>
    <div class="section-body">
        <div class="field">
          <div class="field-label"><span>Inspection Type</span></div>
          <select id="f-inspectionType">
            <option value="">Select type…</option>
            <option value="initial">Initial</option>
            <option value="follow up">Follow-up</option>
            <option value="emergency">Emergency</option>
            <option value="annual">Annual</option>
            <option value="pre purchase">Pre-purchase</option>
            <option value="insurance">Insurance Claim</option>
          </select>
        </div>
        <div class="field">
          <div class="field-label"><span>Priority</span></div>
          <select id="f-priority">
            <option value="">Select priority…</option>
            <option value="low">Low</option>
            <option value="medium">Medium</option>
            <option value="high">High</option>
            <option value="urgent">Urgent</option>
          </select>
        </div>
        <div class="field">
          <div class="field-label"><span>Roof Type</span></div>
          <select id="f-roofType">
            <option value="">Select roof type…</option>
            <option value="shingle">Asphalt Shingle</option>
            <option value="metal">Metal</option>
            <option value="tile">Tile</option>
            <option value="flat">Flat / Built-up</option>
            <option value="slate">Slate</option>
            <option value="wood shake">Wood Shake</option>
          </select>
        </div>
        <div class="field">
          <div class="field-label"><span>Condition</span></div>
          <select id="f-condition">
            <option value="">Select condition…</option>
            <option value="good">Good</option>
            <option value="fair">Fair</option>
            <option value="poor">Poor</option>
            <option value="critical">Critical</option>
          </select>
        </div>
    </div>
  </div>

  <!-- Damage -->
  <div class="section">
    <div class="section-header">
      <span class="section-title">Damage Observed</span>
    </div>
    <div class="section-body">
      <div class="damage-grid">
        <div class="damage-item" onclick="toggleDmg(this)"><input type="checkbox" value="cracks"><span class="damage-check">✓</span> Cracks</div>
        <div class="damage-item" onclick="toggleDmg(this)"><input type="checkbox" value="water damage"><span class="damage-check">✓</span> Water Damage</div>
        <div class="damage-item" onclick="toggleDmg(this)"><input type="checkbox" value="rust"><span class="damage-check">✓</span> Rust</div>
        <div class="damage-item" onclick="toggleDmg(this)"><input type="checkbox" value="mold"><span class="damage-check">✓</span> Mold</div>
      </div>
    </div>
  </div>

  <!-- Notes -->
  <div class="section">
    <div class="section-header">
      <span class="section-title">Inspector Notes</span>
    </div>
    <div class="section-body">
      <textarea id="f-notes" placeholder="Dictate your findings or type here…"></textarea>
    </div>
  </div>

  <div class="form-actions">
    <button class="btn-secondary" onclick="resetForm()">Clear</button>
    <button class="btn-primary" onclick="submitForm()">Submit Inspection →</button>
  </div>
</div>

  <!-- Confirmation -->
  <div id="confirmView" class="confirmation">
    <div class="confirmation-icon">✓</div>
    <h2>Inspection Submitted</h2>
    <p class="subtitle">Your report has been recorded.</p>

    <div class="summary-card">
      <div class="summary-card-title">Summary</div>
      <div id="summaryRows"></div>
    </div>

    <div class="summary-card" id="summaryNotesCard" style="display:none;">
      <div class="summary-card-title">Notes</div>
      <div id="summaryNotes" class="summary-notes"></div>
    </div>

    <button class="btn-primary" onclick="startNewInspection()" style="margin-top: 8px; max-width: 480px;">New Inspection →</button>
  </div>
</div>

<script>
const ACCESS_KEY = "${YOUR_ACCESS_KEY_HERE}";

const VAD_THRESHOLD = 0.5;
const VAD_FRAMES_REQUIRED = 1;
const KEYWORD_COOLDOWN_MS = 300;

let cobra = null;
let porcupine = null;
let rhino = null;
let cheetah = null;
let isDictating = false;
let rhinoActive = false;
let voiceActivityCounter = 0;
let keywordCooldownUntil = 0;

document.getElementById("dateDisplay").textContent =
  new Date().toLocaleDateString("en-US", { weekday: "short", month: "short", day: "numeric", year: "numeric" });

// ── UI Helpers ──
function setMicState(state) {
  const fab = document.getElementById("micFab");
  fab.className = "mic-fab" + (state ? " " + state : "");
}

function showToast(message, type) {
  const container = document.getElementById("toastContainer");
  const toast = document.createElement("div");
  toast.className = "toast " + (type || "");
  toast.textContent = message;
  container.appendChild(toast);
  requestAnimationFrame(() => toast.classList.add("show"));
  setTimeout(() => {
    toast.classList.remove("show");
    setTimeout(() => toast.remove(), 300);
  }, 1200);
}

function flashField(id) {
  const el = document.getElementById(id);
  el.classList.remove("voice-filled");
  void el.offsetWidth;
  el.classList.add("voice-filled");
  setTimeout(() => el.classList.remove("voice-filled"), 2000);
}

function toggleDmg(el) {
  el.classList.toggle("checked");
  const cb = el.querySelector("input[type='checkbox']");
  if (cb) cb.checked = el.classList.contains("checked");
}

// ── Initialize all engines ──
async function startApp() {
  try {
    // 1. Cobra VAD
    cobra = await CobraWeb.CobraWorker.create(ACCESS_KEY, cobraVoiceCallback);

    // 2. Porcupine for action keywords
    const porcupineModel = { publicPath: "porcupine_params.pv" };
    const keywordModels = [
      { publicPath: "${START_NOTES_PPN}.ppn", label: "startNotes" },
      { publicPath: "${CLEAR_FORM_PPN}.ppn", label: "clearForm" },
      { publicPath: "${SUBMIT_FORM_PPN}.ppn", label: "submitForm" },
    ];
    porcupine = await PorcupineWeb.PorcupineWorker.create(
      ACCESS_KEY, keywordModels, porcupineKeywordCallback, porcupineModel);

    // 3. Rhino Speech-to-Intent
    const rhinoModel = { publicPath: "rhino_params.pv" };
    const rhinoContext = { publicPath: "${CONTEXT_FILE_NAME}.rhn" };
    const rhinoOptions = { endpointDurationSec: 0.5 };
    rhino = await RhinoWeb.RhinoWorker.create(
      ACCESS_KEY, rhinoContext, rhinoInferenceCallback, rhinoModel, rhinoOptions);

    // 4. Cheetah Streaming STT
    const cheetahModel = { publicPath: "cheetah_params.pv" };
    cheetah = await CheetahWeb.CheetahWorker.create(
      ACCESS_KEY, cheetahTranscriptCallback, cheetahModel);

    // Subscribe always-on engines
    await WebVoiceProcessor.WebVoiceProcessor.subscribe(cobra);
    await WebVoiceProcessor.WebVoiceProcessor.subscribe(porcupine);
    setMicState("");
  } catch (err) {
    console.error("Init error:", err);
    setMicState("error");
  }
}

// ── Cobra VAD ──
function cobraVoiceCallback(voiceProbability) {
  const fill = document.getElementById("vadFill");
  fill.style.width = (voiceProbability * 100) + "%";

  if (isDictating) {
    setMicState("dictating");
  } else if (voiceProbability > VAD_THRESHOLD) {
    setMicState("listening");
  } else if (!rhinoActive) {
    setMicState("");
  }

  if (isDictating) return;
  if (rhinoActive) return;
  if (Date.now() < keywordCooldownUntil) return;

  if (voiceProbability > VAD_THRESHOLD) {
    voiceActivityCounter++;
    if (voiceActivityCounter >= VAD_FRAMES_REQUIRED) {
      activateRhino();
      voiceActivityCounter = 0;
    }
  } else {
    voiceActivityCounter = 0;
  }
}

// ── Porcupine Keywords ──
async function porcupineKeywordCallback(detection) {
  console.log("Porcupine detected:", detection.label);
  keywordCooldownUntil = Date.now() + KEYWORD_COOLDOWN_MS;

  if (rhinoActive) {
    rhinoActive = false;
    await WebVoiceProcessor.WebVoiceProcessor.unsubscribe(rhino);
  }

  switch (detection.label) {
    case "startNotes":
      if (!isDictating) {
        showToast("📝 Notes started", "action");
        await startDictation();
      }
      break;
    case "clearForm":
      if (isDictating) await stopDictation();
      cleanKeywordFromNotes();
      if (document.getElementById("confirmView").classList.contains("show")) {
        startNewInspection();
      }
      resetForm();
      showToast("🗑 Form cleared", "");
      setMicState("");
      break;
    case "submitForm":
      if (isDictating) await stopDictation();
      cleanKeywordFromNotes();
      submitForm();
      showToast("✅ Form submitted", "success");
      break;
  }
}

function cleanKeywordFromNotes() {
  const el = document.getElementById("f-notes");
  const keywords = ["submit form", "submit from", "clear form", "start notes", "stop notes"];
  let text = el.value;
  for (const kw of keywords) {
    const regex = new RegExp("\\s*" + kw + "\\s*\\.?\\s*$", "i");
    text = text.replace(regex, "");
  }
  el.value = text.trimEnd();
}

// ── Rhino Voice Commands ──
async function activateRhino() {
  if (rhinoActive) return;
  rhinoActive = true;
  setMicState("listening");
  await WebVoiceProcessor.WebVoiceProcessor.subscribe(rhino);
}

function rhinoInferenceCallback(inference) {
  if (!inference.isFinalized) return;

  rhinoActive = false;
  WebVoiceProcessor.WebVoiceProcessor.unsubscribe(rhino);
  console.log("Rhino inference:", JSON.stringify(inference));

  if (inference.isUnderstood) {
    handleIntent(inference.intent, inference.slots);
    showToast("✓ " + formatIntent(inference.intent, inference.slots), "success");
  }
  setMicState("");
}

function handleIntent(intent, slots) {
  console.log("handleIntent:", intent, "slots:", JSON.stringify(slots));

  switch (intent) {
    case "setInspectionType":
      if (slots.inspectionType) {
        setSelectField("f-inspectionType", slots.inspectionType);
        flashField("f-inspectionType");
      }
      break;
    case "setPriority":
      if (slots.priority) {
        setSelectField("f-priority", slots.priority);
        flashField("f-priority");
      }
      break;
    case "setRoofType":
      if (slots.roofType) {
        setSelectField("f-roofType", slots.roofType);
        flashField("f-roofType");
      }
      break;
    case "setCondition":
      if (slots.condition) {
        setSelectField("f-condition", slots.condition);
        flashField("f-condition");
      }
      break;
    case "toggleDamage":
      const damageKeys = Object.keys(slots);
      console.log("Damage slot keys:", damageKeys);
      for (const key of damageKeys) {
        toggleDamageByVoice(slots[key]);
      }
      break;
  }
}

function formatIntent(intent, slots) {
  const vals = Object.values(slots).join(", ");
  const labels = {
    setInspectionType: "Inspection type",
    setPriority: "Priority",
    setRoofType: "Roof type",
    setCondition: "Condition",
    toggleDamage: "Damage",
  };
  return (labels[intent] || intent) + ": " + vals;
}

function setSelectField(id, val) {
  const el = document.getElementById(id);
  for (const o of el.options) {
    if (o.value.toLowerCase() === val.toLowerCase()) {
      el.value = o.value;
      return;
    }
  }
}

function toggleDamageByVoice(val) {
  const items = document.querySelectorAll(".damage-item");
  for (const item of items) {
    const cb = item.querySelector("input[type='checkbox']");
    if (cb && cb.value.toLowerCase() === val.toLowerCase()) {
      if (!item.classList.contains("checked")) {
        toggleDmg(item);
      }
      return;
    }
  }
}

// ── Cheetah Notes ──
function cheetahTranscriptCallback(cheetahTranscript) {
  if (!isDictating) return;
  const el = document.getElementById("f-notes");
  if (cheetahTranscript.transcript) {
    el.value += cheetahTranscript.transcript;
  }
  if (cheetahTranscript.isEndpoint) {
    el.value += "\n";
  }
}

async function startDictation() {
  if (isDictating) return;
  isDictating = true;
  await WebVoiceProcessor.WebVoiceProcessor.subscribe(cheetah);
  setMicState("dictating");
}

async function stopDictation() {
  if (!isDictating) return;
  isDictating = false;
  const flushed = await cheetah.flush();
  await WebVoiceProcessor.WebVoiceProcessor.unsubscribe(cheetah);

  const el = document.getElementById("f-notes");
  if (flushed && flushed.transcript) {
    el.value += flushed.transcript;
  }
  el.value = el.value.trimEnd();
  setMicState("");
}

// ── Form ──
function resetForm() {
  document.querySelectorAll("select").forEach(el => el.selectedIndex = 0);
  document.querySelectorAll("textarea").forEach(el => el.value = "");
  document.querySelectorAll(".damage-item").forEach(el => {
    el.classList.remove("checked");
    const cb = el.querySelector("input[type='checkbox']");
    if (cb) cb.checked = false;
  });
}

function submitForm() {
  const data = {};
  ["inspectionType","priority","roofType","condition","notes"]
    .forEach(k => { data[k] = document.getElementById("f-" + k).value; });
  data.damage = [];
  document.querySelectorAll(".damage-item.checked input")
    .forEach(cb => data.damage.push(cb.value));
  console.log("Inspection submitted:", data);

  const labels = {
    inspectionType: "Inspection Type",
    priority: "Priority",
    roofType: "Roof Type",
    condition: "Condition",
    damage: "Damage",
  };
  const rows = document.getElementById("summaryRows");
  rows.innerHTML = "";
  for (const key of ["inspectionType","priority","roofType","condition","damage"]) {
    const val = key === "damage"
      ? (data.damage.length ? data.damage.join(", ") : "")
      : data[key];
    const display = val
      ? val.charAt(0).toUpperCase() + val.slice(1)
      : "Not set";
    rows.innerHTML += '<div class="summary-row"><span class="summary-label">'
      + labels[key] + '</span><span class="summary-value'
      + (val ? "" : " empty") + '">' + display + '</span></div>';
  }

  const notesCard = document.getElementById("summaryNotesCard");
  const notesEl = document.getElementById("summaryNotes");
  if (data.notes.trim()) {
    notesEl.textContent = data.notes.trim();
    notesEl.className = "summary-notes";
    notesCard.style.display = "";
  } else {
    notesEl.textContent = "No notes recorded";
    notesEl.className = "summary-notes empty";
    notesCard.style.display = "";
  }

  document.getElementById("formView").style.display = "none";
  document.getElementById("confirmView").classList.add("show");
}

function startNewInspection() {
  resetForm();
  document.getElementById("confirmView").classList.remove("show");
  document.getElementById("formView").style.display = "";
  window.scrollTo(0, 0);
}

window.addEventListener("DOMContentLoaded", startApp);

window.addEventListener("beforeunload", async () => {
  try {
    await WebVoiceProcessor.WebVoiceProcessor.reset();
    if (cobra) { await cobra.release(); await cobra.terminate(); }
    if (porcupine) { await porcupine.release(); await porcupine.terminate(); }
    if (rhino) { await rhino.release(); await rhino.terminate(); }
    if (cheetah) { await cheetah.release(); await cheetah.terminate(); }
  } catch (e) {}
});
</script>
</body>
</html>

Configure Access Key and Model Files

Open index.html and replace the following placeholders in the <script> block:

${YOUR_ACCESS_KEY_HERE}: Your AccessKey from the Picovoice Console main dashboard.
${START_NOTES_PPN}: Filename of your trained "Start Notes" .ppn model (e.g., start-notes).
${CLEAR_FORM_PPN}: Filename of your trained "Clear Form" .ppn model (e.g., clear-form).
${SUBMIT_FORM_PPN}: Filename of your trained "Submit Form" .ppn model (e.g., submit-form).
${CONTEXT_FILE_NAME}: Filename of your trained Rhino .rhn context (e.g., inspection).

Run the Voice-Powered Inspection Form

Your project directory should now contain:

voice-inspection-form/
├── index.html
├── package.json
├── porcupine_params.pv
├── ${START_NOTES_PPN}.ppn
├── ${CLEAR_FORM_PPN}.ppn
├── ${SUBMIT_FORM_PPN}.ppn
├── rhino_params.pv
├── ${CONTEXT_FILE_NAME}.rhn
├── cheetah_params.pv
└── node_modules/

Start the local server with the required cross-origin headers:

npx http-server -a localhost -p 5000 \
  --cors \
  -c-1 \
  --header "Cross-Origin-Opener-Policy: same-origin" \
  --header "Cross-Origin-Embedder-Policy: require-corp"

Open http://localhost:5000 in your browser. The voice engines initialize automatically on page load.

Voice-Powered Inspection & Reporting: Alternative Use-Cases

The voice AI pipeline of voice activity detection, speech-to-intent, keyword detection, and streaming speech-to-text supports voice form filling and voice data entry for inspection reporting and any workflow with structured fields and free-form notes. To adapt for a different use case, update the Rhino Speech-to-Intent context with your intents and slot values, change the form fields to match, and update the handleIntent function:

Insurance claims: Adjusters speak damage categories, severity levels, and coverage types on site
Construction punch lists: Workers call out defect types, locations, and priority while walking a job site
Equipment maintenance: Technicians log equipment IDs, fault codes, and service actions while testing machinery
Safety audits: Inspectors complete compliance checklists by voice while keeping hands free for measurements and tools
Healthcare intake: Clinicians select symptom categories and severity via voice while evaluating patients

Frequently Asked Questions

How does speech-to-intent differ from using an LLM to parse voice commands?

Speech-to-intent processes audio directly against a predefined context and outputs structured data (intent + slots) without any intermediate text. An LLM-based approach transcribes speech to text first, then sends that text to a language model to extract structured fields. The speech-to-intent approach is deterministic, meaning the same command always produces the same output, and adds no LLM inference latency. LLM-based approaches are more flexible for unconstrained speech but can misroute data or hallucinate values that don't exist in the form. For forms with known fields and known option values, speech-to-intent is more reliable.

Can I add custom voice commands to any web form?

Yes. With Rhino Speech-to-Intent, you define your form's fields as intents and valid values as slots in a YAML context file. For example, a "setPriority" intent with slots like "low", "medium", "high", and "urgent" maps directly to a priority dropdown. Train the model on Picovoice Console, then route each intent to the corresponding form element in JavaScript — dropdowns, checkboxes, radio buttons, or any other input. The same pattern works for voice form filling in any HTML form with known fields and known values.

Can I add domain-specific terms to improve speech-to-text accuracy?

For domain-specific terminology (roofing materials, building codes), you can add custom vocabulary and boost words to improve accuracy for your specific field.