TLDR: Automatic punctuation and truecasing transform raw speech-to-text into readable transcripts. This tutorial shows how to enable automatic punctuation and truecasing in Python for batch transcription and real-time speech-to-text.
What Are Automatic Punctuation and Truecasing in Speech-to-Text?
Automatic punctuation inserts punctuation marks such as periods, commas, and question marks into transcripts based on speech patterns and pauses. Truecasing (automatic capitalization) restores proper casing for sentence starts, proper nouns, and acronyms. Together, they transform raw speech-to-text output into readable, properly formatted text without manual editing. For example:
Without automatic formatting:
With automatic formatting:
Punctuation and Capitalization for Batch & Streaming STT
This tutorial covers automatic punctuation and truecasing for both batch and real-time speech-to-text in Python:
- Batch Audio File Transcription with Leopard Speech-to-Text - For pre-recorded audio files (meeting recordings, documentation, interviews, podcasts)
- Real-Time Transcription with Cheetah Streaming Speech-to-Text - For live transcription (dictation software, live captions, real-time meeting transcription)
Both speech-to-text models enable automatic formatting with a single parameter: enable_automatic_punctuation=True and process audio on-device for low-latency transcription.
Prerequisites
- Python 3.9+
- Audio file (WAV, MP3, FLAC, or other common formats) for batch transcription or microphone access for real-time transcription
- Picovoice
AccessKeyfrom Picovoice Console
Batch Transcription with Punctuation and Capitalization
Leopard Speech-to-Text transcribes pre-recorded audio files with automatic punctuation and truecasing.
Install the Speech-to-Text SDK
Install Leopard Speech-to-Text SDK using pip:
Full Code: Audio File Transcription with Automatic Punctuation and Truecasing
Here's the complete working code that transcribes an audio file with automatic formatting:
Setting enable_automatic_punctuation=True enables automatic punctuation insertion and truecasing.
Run the Batch Transcription Script
Replace ${ACCESS_KEY} with your AccessKey from Picovoice Console and path/to/your/audio.wav with your audio file path to run the script:
Leopard Speech-to-Text offers additional production-ready features like speaker diarization, word-level confidence scores, and timestamps. Explore all speech-to-text features to get reliable and high quality transcriptions.
Real-Time Streaming Transcription with Punctuation and Capitalization
Cheetah Streaming Speech-to-Text transcribes audio streams in real-time with automatic punctuation and truecasing.
Install the required Python Libraries
Install Python Cheetah Streaming Speech-to-Text SDK pvcheetah and Picovoice Python recorder library pvrecorder, using pip:
Full Code: Real-Time Transcription with Automatic Punctuation and Truecasing
Here's the complete code for real-time streaming transcription with automatic formatting:
Setting enable_automatic_punctuation=True enables automatic punctuation insertion and truecasing while the is_endpoint flag detects natural pauses in speech to structure output into readable segments.
Run the Real-Time Transcription Script
Replace ${ACCESS_KEY} with your AccessKey from Picovoice Console to run the script:
Real-Time Punctuation Accuracy and Performance
In batch transcription, speech-to-text models can use the full audio context to generate transcripts, which often leads to higher overall accuracy for word recognition, truecasing and punctuation. In contrast, real-time transcription models must emit text as audio arrives, making accurate punctuation and capitalization decisions more challenging without seeing future speech.
A common way to evaluate punctuation quality in streaming speech-to-text is Punctuation Error Rate (PER), which measures how closely predicted punctuation aligns with reference transcripts. Lower PER reflects more reliable sentence boundaries and capitalization while preserving real-time responsiveness — a key requirement for live captions, meeting notes, and interactive voice applications.
Cheetah Streaming Speech-to-Text delivers a lower Punctuation Error Rate than major cloud-based streaming STT services, with less than half the PER of Google Streaming Speech-to-Text.
In practice, even small improvements in PER can significantly reduce manual cleanup and improve readability in live transcription workflows.
Start building speech recognition applications with automatic punctuation and truecasing for professional-quality transcripts today!
Start Free






