Leopard Speech-to-Text
C Quick Start

Platforms

Linux (x86_64)
macOS (x86_64, arm64)
Windows (x86_64, arm64)
Raspberry Pi (3, 4, 5)

Requirements

C99-compatible compiler
CMake (3.13+)
For Windows Only: MinGW is required to build the demo

Picovoice Account & AccessKey

Signup or Login to Picovoice Console to get your AccessKey. Make sure to keep your AccessKey secret.

Quick Start

Setup

Clone the repository:

git clone --recurse-submodules https://github.com/Picovoice/leopard.git

Usage

Include the public header files (picovoice.h and pv_leopard.h).
Link the project to an appropriate precompiled library for the target platform and load it.
Download a custom model from Picovoice Console or use a default language model.
Construct the Leopard Speech-to-Text object:

static const char* ACCESS_KEY = "${ACCESS_KEY}";
const char *model_file_path = "${MODEL_FILE_PATH}";
const char *device = "best";
bool enable_automatic_punctuation = false;
bool enable_diarization = false;

pv_leopard_t *leopard;

const pv_status_t status = pv_leopard_init(
    ACCESS_KEY
    model_file_path,
    device,
    enable_automatic_punctuation,
    enable_diarization,
    &leopard);

if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

Pass in an audio path to the pv_leopard_process_file function:

static const char* audio_path = "${AUDIO_FILE_PATH}";

char *transcript = NULL;
int32_t num_words = 0;
pv_word_t *words = NULL;

const pv_status_t status = pv_leopard_process_file(leopard, audio_path, &transcript, &num_words, &words);
if (status != PV_STATUS_SUCCESS) {
  // error handling logic
}
fprintf(stdout, "%s\n", transcript);
pv_leopard_transcript_delete(transcript); // make sure to free transcript result
pv_leopard_words_delete(words); // make sure to free words result

Release resources explicitly when done with Leopard Speech-to-Text:

pv_leopard_delete(leopard);

Word Metadata

Along with the transcript, Leopard Speech-to-Text returns metadata for each transcribed word. Available metadata items are:

Start Time: Indicates when the word started in the transcribed audio. Value is in seconds.
End Time: Indicates when the word ended in the transcribed audio. Value is in seconds.
Confidence: Leopard Speech-to-Text's confidence that the transcribed word is accurate. It is a number within [0, 1].
Speaker Tag: If speaker diarization is enabled on initialization, the speaker tag is a non-negative integer identifying unique speakers, with 0 reserved for unknown speakers. If speaker diarization is not enabled, the value will always be -1.

Demo

For the Leopard Speech-to-Text SDK, we offer demo applications that demonstrate how to use the Speech-to-Text engine on audio recordings.

Setup

Clone the Leopard Speech-to-Text repository from GitHub using HTTPS:

git clone --recurse-submodules https://github.com/Picovoice/leopard.git

Build the demo:

cd leopard
cmake -S demo/c/. -B demo/c/build
cmake --build demo/c/build --target leopard_demo

Usage

To see the usage options for the demo:

./demo/c/build/leopard_demo

Run the command corresponding to your platform from the root of the repository:

./demo/c/build/leopard_demo \
-a ${ACCESS_KEY} \
-m ${MODEL_FILE_PATH} \
-l lib/${PLATFORM}/${ARCH}/libpv_leopard.so \
${AUDIO_PATH1} ${AUDIO_PATH2} ...

For more information on our Leopard Speech-to-Text demos for C, head over to our GitHub repository.

Resources

Was this doc helpful?

Issue with this doc?

Leopard Speech-to-Text C Quick Start

Platforms

Requirements

Picovoice Account & AccessKey

Quick Start

Setup

Usage

Word Metadata

Demo

Setup

Usage

Resources

API

GitHub

Benchmark

Leopard Speech-to-Text
C Quick Start