Hands-free Field Reporting

Build a hands-free field reporting app that runs on-device

Voice prompts guide each report field and responders answer in natural speech. Captures unit ID, incident type, patient condition, and destination as structured slots, along with free-form narrative dictation. Runs entirely on the device.

Platforms supported
AndroidiOSLinuxmacOSWindowsChromeEdgeFirefoxSafariRaspberry Pi
How on-device voice ePCR and field reporting works

Wake word, voice commands, dictation, speech synthesis, and noise suppression

The on-device voice guided reporting pipeline runs five Picovoice SDKs in a loop on the responder's device: Porcupine Wake Word, Rhino Speech-to-Intent, Cheetah Streaming Speech-to-Text, Orca Text-to-Speech, and Koala Noise Suppression. Porcupine listens for the wake word. Koala suppresses sirens, traffic, and ambient field noise. Orca speaks each report field aloud. Rhino captures structured slots — unit ID, incident type, patient condition, destination — directly from speech. Cheetah transcribes free-form narrative dictation. Audio never leaves the device.

MicrophoneResponder audioINPUTKoalaNOISE SUPPorcupineWAKE WORD"Start report"ResponderSpeaks the responseUSEROrcaTEXT-TO-SPEECHPrompt"What's the patient's condition?"RhinoReport slotsSPEECH-TO-INTENTCheetahNarrative dictationSTREAMING STTStructured field reportUNIT ID + INCIDENT + PATIENT CONDITION + NARRATIVENext promptin the loop
Why Koala Noise Suppression?

2× more effective at field noise. Same footprint.

17.3×
More effective than RNNoise at 0 dB SNR
5.4×
More effective than RNNoise at 5 dB SNR
4.3×
More effective than RNNoise on average

Ambulance bays, scene perimeters, and field sites are loud. Koala Noise Suppression cleans sirens, traffic, generator hum, and crowd noise before the audio reaches Porcupine, Rhino, and Cheetah. Koala suppresses background noise twice as effectively as RNNoise with the same compute footprint, which leaves headroom for the rest of the pipeline on embedded devices, legacy phones, and rugged tablets from Honeywell or Zebra.

STOI Distance to Clean Speech at 0 dB
Lower is better
Original0.232
RNNoise0.226
Koala0.128
STOI Distance to Clean Speech at 5 dB
Lower is better
Original0.156
RNNoise0.142
Koala0.080
Why Porcupine Wake Word?

Always-on reporting trigger at low CPU and battery cost.

3.8%
Single-Core CPU Utilization on Raspberry Pi 3
97.1%
Accuracy at 1 false alarm per 10 hours
~250K
Custom wake words trained and deployed in 2025

Porcupine Wake Word starts the reporting workflow when the responder says the chosen phrase. Crews can train a branded wake word or always-listening commands, such as “New incident” or “Begin charting” in the Picovoice Console, and deploy them across mobile and embedded devices. Porcupine runs always-on at low CPU and battery cost so the rest of the pipeline only spins up when needed, critical for EMS shifts that run twelve hours on battery.

Wake Word Detection Accuracy - higher the better
Porcupine97.1%
Snowboy68%
PocketSphinx52%
CPU Utilization - lower the better
Porcupine3.8%
Snowboy24.8%
PocketSphinx31.8%
Why Rhino Speech-to-Intent?

Structured ePCR slots directly from speech.

6x
Higher accuracy than Big Tech average
97.3%
Accuracy tested across 6 to 24 dB Signal-to-Noise Ratio
Unlimited voice interactions per user

Rhino Speech-to-Intent captures structured ePCR slots, such as unit ID, incident type, patient condition, transport destination, handoff status, and handoff time, directly from speech. Most voice command systems run a two-step pipeline: speech-to-text produces a transcript, then a separate NLU model parses that transcript for intent. Each step accumulates error and compounds latency. Rhino infers intent and typed slot values directly from audio, holding higher accuracy in noisy environments without hallucinations or compounding errors.

Voice Command Acceptance Accuracy
Higher is better
Rhino97.3%
Amazon Lex84.3%
Google Dialogflow77.3%
Voice Command Acceptance Accuracy at 21 dB SNR
Higher is better
Rhino99%
Amazon Lex87%
Google Dialogflow83%
Why Cheetah Streaming Speech-to-Text?

Free-form narrative dictation transcribed in real time.

10.1%
WER (English) vs. 11.9% Google and 10.6% Moonshine Medium
0.08
CPU Core-Hour vs. 3.36 Moonshine Medium, 40x less
8.6%
WER (Spanish) vs. 11.6% Google and 9.4% Azure

Cheetah Streaming Speech-to-Text transcribes the responder's free-form narrative dictation in real time, including patient details, scene description, treatments administered, and witness statements. Per the open-source real-time transcription benchmark, Cheetah beats Google Cloud STT on word error rate and word emission latency across all tested languages, and outperforms Azure STT on several. It emits words at 590 ms median latency, typically one word behind the speaker, and requires less compute than any other local engine tested. Cheetah accepts custom vocabulary for clinical terminology, drug names, hospital destinations, and proper nouns, which raises accuracy further on the words responders actually say.

English Word Error Rate
Lower is better
Amazon Streaming5.6%
Azure Real-time8.2%
Cheetah Streaming10.1%
Moonshine Streaming Medium10.6%
Vosk Streaming Large11.5%
Google Streaming11.9%
Whisper.cpp Streaming Base19.8%
English Punctuation Error Rate
Lower is better
Cheetah Streaming16.1%
Azure Real-time16.4%
Amazon Streaming24.4%
Google Streaming36%
Moonshine Streaming Medium45.1%
Whisper.cpp Streaming Base54.1%
Why Orca Text-to-Speech?

Natural-sounding TTS at 29 MB peak memory.

29 MB
Peak Memory Usage
130 ms
First-token-to-speech latency
7 MB
Model Sizee

Orca Streaming Text-to-Speech reads each report prompt aloud, for example, “What's your unit?”, “What's the patient's condition?”, or “What's the transport destination?”, so the responder never has to look at the screen while driving, scene-managing, or treating a patient. Most high-quality TTS engines require hundreds of megabytes of RAM. Orca uses 29 MB peak memory, 10 to 50 times less than any natural-sounding on-device alternative, which leaves enough headroom to run all five engines on a single rugged tablet without OOM crashes. First-token latency is 130 ms, fast enough that prompts feel conversational rather than robotic.

TTS Latency
Lower is better
Orca TTS Streaming128 ms
ElevenLabs TTS Streaming335 ms
ESpeak TTS1,430 ms
ElevenLabs TTS1,470 ms
Audio Quality
Listen and compare — grouped by peak memory usage.
Peak Memory Usage < 30 MB
ESpeak
Orca
On-device voice ePCR and field reporting use cases

From EMS units to insurance adjusters

EMS & ePCR

Voice ePCR for paramedics and EMTs

Electronic Patient Care Reports are the run records EMS crews complete for every patient encounter. Voice ePCR apps capture unit ID, incident type, patient condition, transport destination, and free-form narrative dictation while crews are en route, ready to map to ESO, ImageTrend, ZOLL RescueNet, HealthEMS, or a NEMSIS-compatible schema.

Police & fire

Hands-free incident reports for public safety

Officers and firefighters document incidents from the scene without typing. Voice reporting apps capture call type, location, person of interest, response time, and handoff status as structured slots, plus witness statements as free-form dictation. Output can map to municipal incident-report formats and CAD-RMS schemas.

Field service

Voice field reporting for utilities, telecom, and oil & gas

Utility crews, telecom field technicians, and oil & gas inspectors capture structured observations hands-free where cell signal drops. Each Rhino context tunes per asset class: pole IDs and circuit numbers for utilities, tower IDs for telecom, and well or pipeline segment IDs for oil & gas.

Remote work

Claims, surveys, conservation, and humanitarian reporting

Insurance claims at incident scenes, environmental surveys, wildlife observations, and humanitarian field reporting need a structured-plus-narrative voice flow that does not depend on a cell tower. The on-device pipeline runs in basements, canyons, and offshore platforms where Wi-Fi and cellular do not reach.

Get started

On-device voice ePCR and field reporting Python code example

A complete working recipe in Python. Open-source on GitHub. Runs 100% on-device.

recipe · voice-field-reporting
Difficulty
Beginner
Runtime
100% on-device
Language
Python
Platforms supported
AndroidiOSLinuxmacOSWindowsChromeEdgeFirefoxSafariRaspberry Pi

Prerequisites

Picovoice AccessKey from Picovoice Console and GitHub Repo Clone.

Usage

These instructions assume your current working directory is recipes/voice-guided-field-reporting/python.
1

Create a virtual environment

Isolate the recipe's dependencies from your system Python.
2

Activate the virtual environment

Activation makes pip install into .venv instead of system Python.
Linux, macOS, or Raspberry Pi
Windows
3

Install dependencies

Pulls in the Porcupine, Rhino, Cheetah, Orca, and Koala Python SDKs along with PvRecorder and PvSpeaker.
4

Pick or train a wake word

Open the Picovoice Console and train any phrase your responders will say in the field, such as “Start report”, “New incident”, or “Begin charting”. Download the .ppn file for your target platform.
5

Train the Speech-to-Intent model

Open the Picovoice Console, go to Rhino Speech-to-Intent, create an empty context, and import the Rhino context YAML for this recipe. Download the generated .rhn file for your target platform. The context defines slots such as unit ID, incident type, patient condition, transport destination, handoff status, and handoff time.
6

Run the field reporting demo

Pass your AccessKey and the paths to the .ppn and .rhn files.
Have questions or looking for implementations in other languages? Visit the GitHub pico-cookbook Voice Guided Field Reporting Recipe, where you can find the open-source demo code and create an issue for the demo-related technical questions.
Frequently asked questions

FAQ

+
What is voice ePCR and voice guided field reporting?
Voice ePCR and voice guided field reporting are hands-free reporting workflows where a wake word activates the app, the app reads each report field aloud, the responder answers in natural speech, and the app captures structured slots, such as unit ID, incident type, patient condition, and transport destination, plus free-form narrative dictation.
+
Does voice ePCR work without an internet connection?
Yes. Porcupine Wake Word, Rhino Speech-to-Intent, Cheetah Streaming Speech-to-Text, Orca Streaming Text-to-Speech, and Koala Noise Suppression all run locally. The full reporting workflow runs offline, which is useful for rural EMS calls, wilderness rescue, basement and tunnel responses, and field service at remote sites where cellular coverage drops.
+
Is voice ePCR HIPAA compliant?
Yes. No patient audio leaves the device. Audio is processed in memory on the responder's device and discarded. Picovoice cannot access end-user audio, which removes the need for a Business Associate Agreement under HIPAA for Picovoice processing, a processing agreement under GDPR Article 28, and the third-party breach surface through Picovoice systems.
+
Can I integrate voice ePCR with ImageTrend, ESO, ZOLL RescueNet, or HealthEMS?
Yes. The recipe captures structured slots, such as unit ID, incident type, patient condition, transport destination, handoff status, and handoff time, plus free-form narrative dictation. The output maps to the NEMSIS schema or any internal ePCR schema your ImageTrend, ESO, ZOLL RescueNet, or HealthEMS instance accepts. Voice replaces the typing, not the system of record.
+
What happens in a noisy ambulance, scene perimeter, or field environment?
Koala Noise Suppression cleans audio before it reaches Cheetah Streaming Speech-to-Text and Rhino Speech-to-Intent, including sirens, traffic, generator hum, and crowd noise. Rhino Speech-to-Intent is end-to-end and significantly more accurate than STT-plus-NLU stacks in noisy environments, with intent accuracy that holds up where transcript-based pipelines collapse.
+
How does the wake word work in voice ePCR and field reporting apps?
Porcupine Wake Word listens continuously on-device with very low CPU and battery usage, and only triggers the rest of the pipeline when the user speaks the chosen wake phrase, such as “Start report”, “New incident”, or “Begin charting”. The wake phrase is fully customizable in the Picovoice Console, in any supported language.
+
Can I customize the ePCR fields, prompts, and clinical vocabulary?
Yes. The Rhino context YAML defines the report fields and accepted phrasings, including unit IDs, incident types, drug names, hospital destinations, and triage codes. The Orca Text-to-Speech prompts the responder hears are fully configurable text. Cheetah Streaming Speech-to-Text accepts custom vocabulary for clinical terminology, drug names, hospital destinations, and proper nouns.
+
What hardware does the on-device voice ePCR pipeline run on?
The full five-engine pipeline runs on commodity Android phones, iOS devices, rugged tablets from Honeywell and Zebra, and Linux-based field hardware. It also runs on Raspberry Pi for embedded telematics installs. No GPU, no NPU, and no dedicated voice hardware required.
+
Which industries beyond EMS use voice guided field reporting?
Public safety, field service technicians on industrial sites, utility crews after storm response, telecom field engineers, oil and gas inspectors, claims adjusters at incident scenes, environmental and wildlife surveys, conservation crews, and humanitarian field reporting. Any role where a person needs to capture structured slots and free-form narrative hands-free in the field uses the same pipeline.
+
How can I get technical support for the voice guided field reporting demo?
Visit the GitHub pico-cookbook Voice Guided Field Reporting Recipe where you can find the open-source demo code and create an issue for the demo-related technical questions or reach out to your Picovoice contact.