Cheetah Speech-to-Text
Python API
API Reference for the Python Cheetah SDK (PyPI).
pvcheetah.create()
Factory method for Cheetah Speech-to-Text engine.
Parameters
access_keystr : AccessKey obtained from Picovoice Console.model_pathOptional[str] : Absolute path to the file containing model parameters.deviceOptional[str] : String representation of the device (e.g., CPU or GPU) to use. If set tobest, the most suitable device is selected automatically. If set togpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}, where${GPU_INDEX}is the index of the target GPU. If set tocpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}, where${NUM_THREADS}is the desired number of threads.library_pathOptional[str] : Absolute path to Cheetah's dynamic library.endpoint_duration_secOptional[float] : Duration of endpoint in seconds. A speech endpoint is detected when there is a chunk of audio (with a duration specified herein) after an utterance without any speech in it. Set toNoneto disable endpoint detection.enable_automatic_punctuationbool : Set toTrueto enable automatic punctuation insertion.
Returns
Cheetah: An instance of Cheetah Speech-to-Text engine.
Throws
pvcheetah.available_devices()
Lists all available devices that Cheetah can use for inference. Each entry in the list can be the device argument
of create() factory method or Cheetah constructor.
Parameters
library_pathOptional[str] : Absolute path to Cheetah's dynamic library. If not set it will be set to the default location.
Returns
- Sequence[str]: List of all available devices that Cheetah can use for inference.
Throws
pvcheetah.Cheetah
Class for the Cheetah Speech-to-Text engine.
Cheetah can be initialized either using the module level create() function
or directly using the class __init__() method.
Resources should be cleaned when you are done using the delete() method.
pvcheetah.Cheetah.version
The version string of the Cheetah library.
pvcheetah.Cheetah.frame_length
The number of audio samples per frame that Cheetah accepts.
pvcheetah.Cheetah.sample_rate
The audio sample rate the Cheetah accepts.
pvcheetah.Cheetah.__init__()
__init__ method for Cheetah Speech-to-Text engine.
Parameters
access_keystr : AccessKey obtained from Picovoice Console.model_pathstr : Absolute path to the file containing model parameters.library_pathstr : Absolute path to Cheetah's dynamic library.devicestr : String representation of the device (e.g., CPU or GPU) to use. If set tobest, the most suitable device is selected automatically. If set togpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}, where${GPU_INDEX}is the index of the target GPU. If set tocpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}, where${NUM_THREADS}is the desired number of threads.endpoint_duration_secfloat : Duration of endpoint in seconds.enable_automatic_punctuationbool : Set toTrueto enable automatic punctuation insertion.
Returns
Cheetah: An instance of Cheetah Speech-to-Text engine.
Throws
pvcheetah.Cheetah.delete()
Releases resources acquired by Cheetah.
pvcheetah.Cheetah.process()
Processes a frame of audio and returns newly-transcribed text and a flag indicating if an endpoint has been detected. Upon detection of an endpoint, the client may invoke .flush() to retrieve any remaining transcription.
The number of samples per frame can be attained by calling .frame_length. The incoming audio needs to have a sample rate equal to .sample_rate and be 16-bit linearly-encoded. Furthermore, Cheetah operates on single-channel audio.
Parameters
pcmSequence[int] : A frame of audio samples.
Returns
Tuple[str, bool]: Any newly-transcribed speech (if none is available then an empty string is returned) and a flag indicating if an endpoint has been detected.
Throws
pvcheetah.Cheetah.flush()
Marks the end of the audio stream, flushes internal state of the object, and returns any remaining transcribed text.
Returns
str: Any remaining transcribed text. If none is available then an empty string is returned.
Throws
pvcheetah.CheetahError
Error thrown if an error occurs within Cheetah Speech-to-Text engine.
Exceptions