On-device streaming text-to-speech that reads dynamic LLM responses out loud as they emerge - with no latency or pause
Orca Streaming Text-to-Speech is a voice generator developed for LLM applications. It concurrently synthesizes speech as LLMs compose their responses.
Orca Streaming Text-to-Speech eliminates the latency between LLMs' text output and TTS's audio output, enabling humanlike interactions with no awkward pauses.
Orca Streaming Text-to-Speech can handle streaming text input by synthesizing audio while an LLM is still producing the response.
OpenAI TTS, Elevenlabs TTS Streaming, IBM Real-time Speech Synthesis, Amazon Polly, and others start converting text to audio after receiving the entire LLM output.
Starting to convert text to voice much sooner allows Orca Streaming Text-to-Speech to finish reading in tandem with LLMs before other TTS engines can even begin.
orca = pvorca.create(access_key)stream = orca.stream_open()speech = stream.synthesize(get_next_text_chunk())
Don’t ruin user experience with awkward silences
Embed Orca Streaming Text-to-Speech into your product in less than 10 minutes.
Start FreeStreaming Text-to-Speech shines the most when developers build AI agents and assistants, enabling human-like interactions. AI agents can work in several industries:
Streaming text-to-speech (TTS) is the technology that converts written text into spoken words in real time as the text is generated. Traditional TTS systems process pre-defined text, in other words, they require the full text to start processing. Once the text is processed, they generate audio as a complete file or stream audio by incrementally playing it back. The latter is called “audio output streaming”. Streaming Text-to-Speech does not just stream audio output but also processes streaming text input. Unlike traditional TTS, streaming TTS doesn’t require the full text to start playing.
The term "streaming" or "real-time" Text-to-Speech (TTS) has been used excessively and often inappropriately, leading to confusion about its true meaning and capabilities. Orca Streaming Text-to-Speech, similar to humans continuously processing text input, reads streaming text inputs out loud as they appear.
The current “real-time” TTS solutions can stream audio as it is generated before the full audio file has been created. They were designed for legacy Natural Language Processing (NLP) techniques that generate output all at once. In the pre-LLM era, this was sufficient. Today's Large Language Models (LLMs) work differently – they produce text incrementally, token by token. Thus, traditional TTS solutions couldn’t catch up with the token-by-token processing concept. They still wait for the whole text to be generated. Orca Streaming Text-to-Speech, besides streaming audio output like traditional TTS solutions, can process streaming text input that is generated on a token-by-token basis.
Imagine attending an event with two interpreters: one translates as the speaker speaks (simultaneous translation) and the other waits for the speaker to pause (consecutive translation). Orca Streaming Text-to-Speech works like the former and processes data simultaneously, whereas other “real-time” Text-to-Speech engines work like the latter.
Orca Streaming Text-to-Speech works with all closed-source and free and open large language models.
Some examples of closed-source large language models Orca Streaming TTS supports:
Some examples of open large language models Orca Streaming TTS supports:
Yes, Orca has the async processing capability, in other words, it can process predefined, i.e., static, text, and convert it into audio streams or audio recordings. Developers can convert pre-defined text into audio as a complete file or by streaming audio incrementally. Please visit our docs for more information.
Orca Streaming Text-to-Speech runs across platforms:
Orca Streaming Text-to-Speech supports English with many more languages, including French, German, Hindi, Italian, Japanese, Korean, Portuguese, and Spanish on the roadmap. Reach out to the Picovoice Consulting team with the details of your project if you have an immediate need.
Picovoice Consulting customizes Orca Streaming Text-to-Speech for brands that want to represent their “voice” via unique, custom voices.
Orca Streaming Text-to-Speech base model allows developers to adjust the speed of the selected voice. Custom Orca Streaming Text-to-Speech models can be leveraged for further voice tuning. Contact Picovoice Consulting with your project requirements and get a custom Text-to-Speech model that fits your needs.
Orca Streaming Text-to-Speech can be application, company, domain, or industry-specific with custom vocabulary.
Custom Orca Streaming Text-to-Speech models generate voices with emotions and styles, including joy, anger, whispering, and shouting. Contact Picovoice Consulting with your project requirements to get your custom text-to-speech model trained.
Picovoice docs, blog, Medium posts, and GitHub are great resources to learn about voice AI, Picovoice technology, and how to add AI-generated voice to your product. Enterprise customers get dedicated support specific to their applications from Picovoice Product & Engineering teams. While Picovoice customers reach out to their contacts, prospects can also purchase Support Add-on before committing to any paid plan.