Question 1

What are the use cases and applications of Streaming Text-to-Speech?

Accepted Answer

Streaming Text-to-Speech shines the most when developers build AI agents and assistants, enabling human-like interactions. AI agents can work in several industries: Healthcare: AI-Powered Patient Communication with Orca Streaming Text-to-Speech: Text-to-speech is used to enable natural conversations between patients and AI health assistants. Orca Streaming TTS differentiates itself with quick responses without increasing patient anxiety and stress with unexpected pauses. Education & E-Learning: Interactive AI Tutoring Systems with Orca Streaming Text-to-Speech: Text-to-Speech has transformed online education with AI tutors that speak as naturally as human teachers. Yet, students can easily get distracted when tutors do not respond to them timely. Orca Streaming TTS increases the student engagement time, resulting in higher success rates for everyone! Customer Service & Support: 24/7 AI Customer Representatives with Orca Streaming Text-to-Speech: Text-to-Speech is crucial to create AI agents to handle tasks, such as order support, providing instant voice responses to order status, or technical troubleshooting, guiding customers through solutions with continuous voice instructions. Orca Streaming TTS streamlines the conversations by eliminating the network latency and minimizing the compute latency. Financial Services & Fintech: AI-Powered Financial Advisors powered by Orca Streaming Text-to-Speech: Text-to-Speech choice while developing AI-Powered Financial Advisors is crucial to build trust with clients through natural voice interactions for sensitive financial discussions, such as portfolio reviews, streaming real-time explanations of market changes and investment performance or loan applications, guiding applicants through complex forms with immediate voice assistance.

Question 2

What's Streaming Text-to-Speech?

Accepted Answer

Streaming text-to-speech (TTS) is the technology that converts written text into spoken words in real time as the text is generated. Traditional TTS systems process pre-defined text, in other words, they require the full text to start processing. Once the text is processed, they generate audio as a complete file or stream audio by incrementally playing it back. The latter is called "audio output streaming". Streaming Text-to-Speech does not just stream audio output but also processes streaming text input. Unlike traditional TTS, streaming TTS doesn't require the full text to start playing.

Question 3

How does Orca Streaming Text-to-Speech differ from other Text-to-Speech engines offered for real-time interactions?

Accepted Answer

The term "streaming" or "real-time" Text-to-Speech (TTS) has been used excessively and often inappropriately, leading to confusion about its true meaning and capabilities. Orca Streaming Text-to-Speech, similar to humans continuously processing text input, reads streaming text inputs out loud as they appear.

The current "real-time" TTS solutions can stream audio as it is generated before the full audio file has been created. They were designed for legacy Natural Language Processing (NLP) techniques that generate output all at once. In the pre-LLM era, this was sufficient. Today's Large Language Models (LLMs) work differently – they produce text incrementally, token by token. Thus, traditional TTS solutions couldn't catch up with the token-by-token processing concept. They still wait for the whole text to be generated. Orca Streaming Text-to-Speech, besides streaming audio output like traditional TTS solutions, can process streaming text input that is generated on a token-by-token basis.

Imagine attending an event with two interpreters: one translates as the speaker speaks (simultaneous translation) and the other waits for the speaker to pause (consecutive translation). Orca Streaming Text-to-Speech works like the former and processes data simultaneously, whereas other "real-time" Text-to-Speech engines work like the latter.

Question 4

Which LLMs does Orca Streaming Text-to-Speech support?

Accepted Answer

Orca Streaming Text-to-Speech works with all closed-source and free and open large language models.

Some examples of closed-source large language models Orca Streaming TTS supports:

OpenAI GPTs-4, OpenAI GPT 3.5, OpenAI GPT 3.5 Turbo, OpenAI GPT 3
Anthropic Claude, Anthropic Claude 2, Anthropic Claude 3 Sonnet, Anthropic Claude 3 Opus, Anthropic Claude 3 Haiku
Cohere Coral

Some examples of open large language models Orca Streaming TTS supports:

Llama
Falcon
Gemini
Gemma
Grok
Mistral
Mixtral
Phi
DBRX

Question 5

Does Orca Streaming Text-to-Speech support async processing?

Accepted Answer

Yes, Orca has the async processing capability, in other words, it can process predefined, i.e., static, text, and convert it into audio streams or audio recordings. Developers can convert pre-defined text into audio as a complete file or by streaming audio incrementally. Please visit our docs for more information.

Question 6

Which platforms does Orca Streaming Text-to-Speech support?

Accepted Answer

Orca Streaming Text-to-Speech runs across platforms: Web Browsers: Chrome, Safari, Edge, and Firefox. Single Board Computers: Raspberry Pi. Mobile Devices: Android and iOS. Desktop and Servers: Linux, macOS, and Windows.

Question 7

Which languages does Orca Streaming Text-to-Speech support?

Accepted Answer

Orca Streaming Text-to-Speech supports English, French, German, Italian, Japanese, Korean, Portuguese, and Spanish Voice Models. Reach out to the Picovoice Consulting team with the details of your project if you have an immediate need.

Question 8

How can I generate custom voices using Orca Streaming Text-to-Speech?

Accepted Answer

Picovoice Consulting customizes Orca Streaming Text-to-Speech for brands that want to represent their "voice" via unique, custom voices.

Question 9

Does Orca Streaming Text-to-Speech allow voice tuning?

Accepted Answer

Orca Streaming Text-to-Speech base model allows developers to adjust the speed of the selected voice. Custom Orca Streaming Text-to-Speech models can be leveraged for further voice tuning. Contact Picovoice Consulting with your project requirements and get a custom Text-to-Speech model that fits your needs.

Question 10

Can I add custom industry jargon and terminology to Orca Streaming Text-to-Speech?

Accepted Answer

Orca Streaming Text-to-Speech can be application, company, domain, or industry-specific with custom vocabulary.

Question 11

Does Orca Streaming Text-to-Speech generate voices with different emotions?

Accepted Answer

Custom Orca Streaming Text-to-Speech models generate voices with emotions and styles, including joy, anger, whispering, and shouting. Contact Picovoice Consulting with your project requirements to get your custom text-to-speech model trained.

Question 12

How do I get technical support for Orca Streaming Text-to-Speech?

Accepted Answer

Picovoice docs, blog, Medium posts, and GitHub are great resources to learn about voice AI, Picovoice technology, and how to add AI-generated voice to your product. Enterprise customers get dedicated support specific to their applications from Picovoice Product & Engineering teams. While Picovoice customers reach out to their contacts, prospects can also purchase Enterprise Support before committing to any paid plan.

Question 13

How can I get informed about updates and upgrades?

Accepted Answer

Version changes appear in the Picovoice Newsletter and LinkedIn. Subscribing to GitHub is the best way to get notified of patch releases. If you enjoy building with Orca Streaming Text-to-Speech, show it by giving a GitHub star!

Human-like AI voice generator built for the LLM era

Get started with just a few lines of code

Plan ahead, don't rush it!

Why choose Orca Streaming Text-to-Speech over other AI Voice Generators?

Frequently asked questions