Orca Streaming Text‑to‑Speech

Human-like AI voice generator built for the LLM era

Enterprises' choice for adding voice to the next generation LLM-powered AI assistants that sound natural and responsive.

English57/200
>
What is Orca Streaming Text-to-Speech?

Orca Streaming Text-to-Speech is a voice generator developed for LLM applications.

It concurrently synthesizes speech as LLMs compose their responses, enabling humanlike interactions with no awkward pauses.

Get started with just a few lines of code

1orca = pvorca.create(access_key)
2
3stream = orca.stream_open()
4
5speech = stream.synthesize(
6 get_next_text_chunk())
1const o = new Orca(accessKey);
2
3const s = o.streamOpen();
4
5const pcm = s.synthesize(
6 getNextTextChunk());
1Orca o = new Orca.Builder()
2 .setAccessKey(accessKey)
3 .setModelPath(modelPath)
4 .build(appContext);
5
6OrcaStream stream =
7 o.streamOpen(
8 new OrcaSynthesizeParams
9 .Builder()
10 .build());
11
12short[] speech =
13 stream.synthesize(
14 getNextTextChunk());
1let orca = Orca(
2 accessKey: accessKey,
3 modelPath: modelPath)
4
5let stream = orca.streamOpen()
6
7let speech =
8 stream.synthesize(
9 getNextTextChunk())
1const o =
2 await OrcaWorker.create(
3 accessKey,
4 modelPath);
5
6const s = await o.streamOpen();
7
8const speech =
9 await s.synthesize(
10 getNextTextChunk());
1Orca o =
2 Orca.Create(accessKey);
3
4Orca.OrcaStream stream =
5 o.StreamOpen();
6
7short[] speech =
8 stream.Synthesize(textChunk);
Why Orca Streaming Text-to-Speech is the best for building low-latency LLM-powered AI applications

AI conversations feel unnatural because of the awkward delay traditional text-to-speech engines add while waiting for LLMs to complete their responses

Orca Streaming Text-to-Speech handles LLMs’ streaming text input and synthesizes audio while LLMs are still producing a response. Starting to convert text to voice much sooner allows Orca Streaming Text-to-Speech to finish reading in tandem with LLMs before other TTS engines can even begin.

Plan ahead, don't rush it!

Why choose Orca Streaming Text-to-Speech over other AI Voice Generators?

Get started with
Orca Streaming Text-to-Speech
Embed Orca Streaming Text-to-Speech into your product in less than 10 minutes.
Start Free
  • Real-time
  • Production-ready
  • Cross-platform SDKs
  • English, French, German, Italian, Japanese, Korean, Portuguese, and Spanish Voice Models

Frequently asked questions

What are the use cases and applications of Streaming Text-to-Speech?

Streaming Text-to-Speech shines the most when developers build AI agents and assistants, enabling human-like interactions. AI agents can work in several industries:

  • Healthcare: AI-Powered Patient Communication with Orca Streaming Text-to-Speech: Text-to-speech is used to enable natural conversations between patients and AI health assistants. Orca Streaming TTS differentiates itself with quick responses without increasing patient anxiety and stress with unexpected pauses.
  • Education & E-Learning: Interactive AI Tutoring Systems with Orca Streaming Text-to-Speech: Text-to-Speech has transformed online education with AI tutors that speak as naturally as human teachers. Yet, students can easily get distracted when tutors do not respond to them timely. Orca Streaming TTS increases the student engagement time, resulting in higher success rates for everyone!
  • Customer Service & Support: 24/7 AI Customer Representatives with Orca Streaming Text-to-Speech: Text-to-Speech is crucial to create AI agents to handle tasks, such as order support, providing instant voice responses to order status, or technical troubleshooting, guiding customers through solutions with continuous voice instructions. Orca Streaming TTS streamlines the conversations by eliminating the network latency and minimizing the compute latency.
  • Financial Services & Fintech: AI-Powered Financial Advisors powered by Orca Streaming Text-to-Speech: Text-to-Speech choice while developing AI-Powered Financial Advisors is crucial to build trust with clients through natural voice interactions for sensitive financial discussions, such as portfolio reviews, streaming real-time explanations of market changes and investment performance or loan applications, guiding applicants through complex forms with immediate voice assistance.
What's Streaming Text-to-Speech?

Streaming text-to-speech (TTS) is the technology that converts written text into spoken words in real time as the text is generated. Traditional TTS systems process pre-defined text, in other words, they require the full text to start processing. Once the text is processed, they generate audio as a complete file or stream audio by incrementally playing it back. The latter is called “audio output streaming”. Streaming Text-to-Speech does not just stream audio output but also processes streaming text input. Unlike traditional TTS, streaming TTS doesn’t require the full text to start playing.

How does Orca Streaming Text-to-Speech differ from other Text-to-Speech engines offered for real-time interactions?

The term "streaming" or "real-time" Text-to-Speech (TTS) has been used excessively and often inappropriately, leading to confusion about its true meaning and capabilities. Orca Streaming Text-to-Speech, similar to humans continuously processing text input, reads streaming text inputs out loud as they appear.

The current “real-time” TTS solutions can stream audio as it is generated before the full audio file has been created. They were designed for legacy Natural Language Processing (NLP) techniques that generate output all at once. In the pre-LLM era, this was sufficient. Today's Large Language Models (LLMs) work differently – they produce text incrementally, token by token. Thus, traditional TTS solutions couldn’t catch up with the token-by-token processing concept. They still wait for the whole text to be generated. Orca Streaming Text-to-Speech, besides streaming audio output like traditional TTS solutions, can process streaming text input that is generated on a token-by-token basis.

Imagine attending an event with two interpreters: one translates as the speaker speaks (simultaneous translation) and the other waits for the speaker to pause (consecutive translation). Orca Streaming Text-to-Speech works like the former and processes data simultaneously, whereas other “real-time” Text-to-Speech engines work like the latter.

Which LLMs does Orca Streaming Text-to-Speech support?

Orca Streaming Text-to-Speech works with all closed-source and free and open large language models.

Some examples of closed-source large language models Orca Streaming TTS supports:

  • OpenAI GPTs-4, OpenAI GPT 3.5, OpenAI GPT 3.5 Turbo, OpenAI GPT 3
  • Anthropic Claude, Anthropic Claude 2, Anthropic Claude 3 Sonnet, Anthropic Claude 3 Opus, Anthropic Claude 3 Haiku
  • Cohere Coral

Some examples of open large language models Orca Streaming TTS supports:

  • Llama
  • Falcon
  • Gemini
  • Gemma
  • Grok
  • Mistral
  • Mixtral
  • Phi
  • DBRX
Does Orca Streaming Text-to-Speech support async processing?

Yes, Orca has the async processing capability, in other words, it can process predefined, i.e., static, text, and convert it into audio streams or audio recordings. Developers can convert pre-defined text into audio as a complete file or by streaming audio incrementally. Please visit our docs for more information.

Which platforms does Orca Streaming Text-to-Speech support?

Orca Streaming Text-to-Speech runs across platforms:

Which languages does Orca Streaming Text-to-Speech support?

Orca Streaming Text-to-Speech supports English, French, German, Italian, Japanese, Korean, Portuguese, and Spanish Voice Models. Reach out to the Picovoice Consulting team with the details of your project if you have an immediate need.

How can I generate custom voices using Orca Streaming Text-to-Speech?

Picovoice Consulting customizes Orca Streaming Text-to-Speech for brands that want to represent their “voice” via unique, custom voices.

Does Orca Streaming Text-to-Speech allow voice tuning?

Orca Streaming Text-to-Speech base model allows developers to adjust the speed of the selected voice. Custom Orca Streaming Text-to-Speech models can be leveraged for further voice tuning. Contact Picovoice Consulting with your project requirements and get a custom Text-to-Speech model that fits your needs.

Can I add custom industry jargon and terminology to Orca Streaming Text-to-Speech?

Orca Streaming Text-to-Speech can be application, company, domain, or industry-specific with custom vocabulary.

Does Orca Streaming Text-to-Speech generate voices with different emotions?

Custom Orca Streaming Text-to-Speech models generate voices with emotions and styles, including joy, anger, whispering, and shouting. Contact Picovoice Consulting with your project requirements to get your custom text-to-speech model trained.

How do I get technical support for Orca Streaming Text-to-Speech?

Picovoice docs, blog, Medium posts, and GitHub are great resources to learn about voice AI, Picovoice technology, and how to add AI-generated voice to your product. Enterprise customers get dedicated support specific to their applications from Picovoice Product & Engineering teams. While Picovoice customers reach out to their contacts, prospects can also purchase Enterprise Support before committing to any paid plan.

How can I get informed about updates and upgrades?

Version changes appear in the and LinkedIn. Subscribing to GitHub is the best way to get notified of patch releases. If you enjoy building with Orca Streaming Text-to-Speech, show it by giving a GitHub star!