> ## Documentation Index
> Fetch the complete documentation index at: https://daily-docs-source-analytics-user-turn-strategies.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# NVIDIA Nemotron Speech

> Text-to-speech service implementation using NVIDIA Nemotron Speech

## Overview

NVIDIA Nemotron Speech provides three TTS service implementations:

* **`NvidiaTTSService`** -- High-quality TTS using both locally deployed and cloud-based NVIDIA TTS models. Supports multilingual synthesis, configurable quality settings, per-sentence and stitched synthesis modes, and zero-shot voice cloning.
* **`NvidiaSageMakerHTTPTTSService`** -- Single HTTP invocation to an AWS SageMaker endpoint, streaming raw PCM audio back for each text segment.
* **`NvidiaSageMakerTTSService`** -- Persistent HTTP/2 bidi-stream to an AWS SageMaker endpoint with full interruption support via `InterruptibleTTSService`.

<CardGroup cols={2}>
  <Card title="NVIDIA Nemotron Speech TTS API Reference" icon="code" href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.nvidia.tts.html">
    Pipecat's API methods for NVIDIA Nemotron Speech TTS integration
  </Card>

  <Card title="Example Implementation" icon="play" href="https://github.com/pipecat-ai/pipecat/blob/main/examples/voice/voice-nvidia.py">
    Complete example with Nemotron Speech NIM
  </Card>

  <Card title="NVIDIA TTS NIM Documentation" icon="book" href="https://docs.nvidia.com/nim/speech/latest/tts/">
    Official NVIDIA TTS NIM documentation
  </Card>

  <Card title="NVIDIA Developer Portal" icon="microphone" href="https://developer.nvidia.com/">
    Access API keys and Nemotron Speech services
  </Card>
</CardGroup>

## Installation

To use NVIDIA Nemotron Speech services, install the required dependencies:

```bash theme={null}
uv add "pipecat-ai[nvidia]"
```

## Prerequisites

### NVIDIA Nemotron Speech Setup

Before using Nemotron Speech TTS services, you need:

1. **NVIDIA Developer Account**: Sign up at [NVIDIA Developer Portal](https://developer.nvidia.com/)
2. **API Key**: Generate an NVIDIA API key for Nemotron Speech services (required for cloud endpoint)
3. **Nemotron Speech Access**: Ensure access to NVIDIA Nemotron Speech TTS services

For local deployments, see the [NVIDIA TTS NIM documentation](https://docs.nvidia.com/nim/speech/latest/tts/).

### Required Environment Variables

* `NVIDIA_API_KEY`: Your NVIDIA API key for authentication (required for cloud endpoint, not needed for local deployments)

## Configuration

### NvidiaTTSService

<ParamField path="api_key" type="str" default="None">
  NVIDIA API key for authentication. Required when using the cloud endpoint. Not
  needed for local deployments.
</ParamField>

<ParamField path="server" type="str" default="grpc.nvcf.nvidia.com:443">
  gRPC server endpoint. Defaults to NVIDIA's cloud endpoint. For local
  deployments, pass the local address (e.g. `localhost:50051`).
</ParamField>

<ParamField path="voice_id" type="str" default="Magpie-Multilingual.EN-US.Aria" deprecated>
  Voice model identifier.

  *Deprecated in v0.0.105. Use `settings=NvidiaTTSService.Settings(...)` instead.*
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
  rate.
</ParamField>

<ParamField path="model_function_map" type="dict" default="{&#x22;function_id&#x22;: &#x22;877104f7-e885-42b9-8de8-f6e4c6303969&#x22;, &#x22;model_name&#x22;: &#x22;magpie-tts-multilingual&#x22;}">
  Dictionary containing `function_id` and `model_name` for the TTS model.
</ParamField>

<ParamField path="use_ssl" type="bool" default="True">
  Whether to use SSL for the gRPC connection. Defaults to True for the NVIDIA
  cloud endpoint. Set to False for local deployments.
</ParamField>

<ParamField path="custom_dictionary" type="dict" default="None">
  Custom pronunciation dictionary mapping words (graphemes) to IPA phonetic
  representations (phonemes), e.g. `{"NVIDIA": "ɛn.vɪ.diː.ʌ"}`. See [NVIDIA TTS
  NIM phoneme
  support](https://docs.nvidia.com/nim/speech/latest/tts/phoneme-support.html)
  for the list of supported IPA phonemes.
</ParamField>

<ParamField path="zero_shot_audio_prompt_file" type="str | os.PathLike[str]" default="None">
  Optional audio prompt file for Magpie zero-shot voice cloning. NVIDIA
  recommends a 16-bit mono WAV prompt, sample rate 22.05 kHz or higher, and
  duration 3 to 10 seconds. Access to NVIDIA's hosted zero-shot models requires
  approval through [NVIDIA Riva TTS Zero-Shot
  Models](https://developer.nvidia.com/riva-tts-zeroshot-models).
</ParamField>

<ParamField path="audio_prompt_encoding" type="AudioEncoding" default="AudioEncoding.ENCODING_UNSPECIFIED">
  Audio encoding for `zero_shot_audio_prompt_file`. Use this when the server
  expects a specific prompt encoding for Magpie zero-shot voice cloning.
</ParamField>

<ParamField path="encoding" type="AudioEncoding" default="AudioEncoding.LINEAR_PCM">
  Output audio encoding format. Defaults to `AudioEncoding.LINEAR_PCM`.
</ParamField>

<ParamField path="params" type="InputParams" default="None" deprecated>
  Runtime-configurable synthesis settings. See [Settings](#settings)
  below.

  *Deprecated in v0.0.105. Use `settings=NvidiaTTSService.Settings(...)` instead.*
</ParamField>

<ParamField path="settings" type="NvidiaTTSService.Settings" default="None">
  Runtime-configurable settings. See [Settings](#settings) below.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `NvidiaTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter        | Type                     | Default     | Description                                                                                  |
| ---------------- | ------------------------ | ----------- | -------------------------------------------------------------------------------------------- |
| `model`          | `str`                    | `None`      | Model identifier. *(Inherited.)*                                                             |
| `voice`          | `str`                    | `None`      | Voice identifier. *(Inherited.)*                                                             |
| `language`       | `Language \| str`        | `None`      | Language for synthesis. *(Inherited.)*                                                       |
| `quality`        | `int`                    | `NOT_GIVEN` | Audio quality setting (0-100). For Magpie zero-shot, NVIDIA expects values in range 1 to 40. |
| `synthesis_mode` | `NvidiaTTSSynthesisMode` | `NOT_GIVEN` | Whether to synthesize one sentence per request or stitch multiple sentences in one stream.   |

## Usage

### Basic Setup

```python theme={null}
from pipecat.services.nvidia import NvidiaTTSService

tts = NvidiaTTSService(
    api_key=os.getenv("NVIDIA_API_KEY"),
)
```

### With Custom Voice and Quality

```python theme={null}
from pipecat.services.nvidia import NvidiaTTSService
from pipecat.transcriptions.language import Language

tts = NvidiaTTSService(
    api_key=os.getenv("NVIDIA_API_KEY"),
    model_function_map={
        "function_id": "877104f7-e885-42b9-8de8-f6e4c6303969",
        "model_name": "magpie-tts-multilingual",
    },
    settings=NvidiaTTSService.Settings(
        voice="Magpie-Multilingual.EN-US.Aria",
        language=Language.EN_US,
        quality=40,
    ),
)
```

<Tip>
  The `InputParams` / `params=` pattern is deprecated as of v0.0.105. Use
  `Settings` / `settings=` instead. See the [Service Settings
  guide](/pipecat/fundamentals/service-settings) for migration details.
</Tip>

## Notes

* **gRPC-based**: NVIDIA Nemotron Speech uses gRPC (not HTTP or WebSocket) for communication with the TTS service.
* **Synthesis modes**: The service supports two synthesis modes via the `synthesis_mode` setting:
  * `PER_SENTENCE` (default): Opens a separate `SynthesizeOnline` call for each sentence. Compatible with all NVIDIA TTS NIMs, including Chatterbox, Magpie multilingual, and Magpie zero-shot.
  * `STITCHED`: Reuses one `SynthesizeOnline` stream across multiple sentences within the same LLM response for improved multi-sentence synthesis quality. Only use with models that support cross-sentence stitching, such as Magpie multilingual and Magpie zero-shot v1.7.0 or later.
* **Zero-shot voice cloning**: Magpie zero-shot models support voice cloning via the `zero_shot_audio_prompt_file` parameter. NVIDIA recommends a 16-bit mono WAV prompt (22.05 kHz or higher, 3-10 seconds duration). Access to hosted zero-shot models requires approval.
* **Runtime settings updates**: Voice, language, quality, and synthesis mode can be updated mid-conversation with `TTSUpdateSettingsFrame`. New values take effect on the next synthesis turn, not for the current turn's in-flight requests.
* **Model cannot be changed after initialization**: The model and function ID must be set during construction via `model_function_map`. Calling `set_model()` after initialization will log a warning and have no effect.
* **SSL enabled by default**: The service connects to NVIDIA's cloud endpoint with SSL. Set `use_ssl=False` only for local or custom Nemotron Speech deployments.
* **Metrics generation**: This service supports metric generation via `can_generate_metrics()`. Metrics are automatically stopped when an audio context is interrupted.

## NvidiaSageMakerHTTPTTSService

NVIDIA Magpie TTS service that calls a SageMaker HTTP endpoint for each text segment. Sends JSON to the endpoint's `/invocations` path and streams raw PCM audio back.

### Configuration

<ParamField path="endpoint_name" type="str" required>
  Name of the deployed SageMaker endpoint.
</ParamField>

<ParamField path="region" type="str" default="us-west-2">
  AWS region where the endpoint is deployed.
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
  rate.
</ParamField>

<ParamField path="settings" type="NvidiaSageMakerHTTPTTSService.Settings" default="None">
  Runtime-configurable settings. See [Settings](#settings-2) below.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `NvidiaSageMakerHTTPTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter  | Type              | Default                          | Description                         |
| ---------- | ----------------- | -------------------------------- | ----------------------------------- |
| `model`    | `str`             | `magpie`                         | Model identifier. *(Inherited.)*    |
| `voice`    | `str`             | `Magpie-Multilingual.EN-US.Aria` | Voice identifier. *(Inherited.)*    |
| `language` | `Language \| str` | `en-US`                          | BCP-47 language code for synthesis. |

### Usage

```python theme={null}
from pipecat.services.nvidia.sagemaker.tts import NvidiaSageMakerHTTPTTSService

tts = NvidiaSageMakerHTTPTTSService(
    endpoint_name=os.getenv("SAGEMAKER_MAGPIE_ENDPOINT_NAME"),
    region=os.getenv("AWS_REGION", "us-west-2"),
    settings=NvidiaSageMakerHTTPTTSService.Settings(
        voice="Magpie-Multilingual.EN-US.Aria",
        language="en-US",
    ),
)
```

### Notes

* **AWS SageMaker deployment required**: This service requires a deployed SageMaker endpoint running NVIDIA Magpie TTS NIM. See the [deployment example](https://github.com/pipecat-ai/pipecat-examples/tree/main/deployment/aws-sagemaker-nvidia) for setup instructions.
* **AWS credentials**: Requires `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables for SageMaker authentication.
* **Environment variables**: `SAGEMAKER_MAGPIE_ENDPOINT_NAME` for the endpoint name.
* **HTTP-based**: Each text segment triggers a new HTTP POST request to the SageMaker endpoint.
* **Metrics support**: This service supports metrics generation (`can_generate_metrics()` returns `True`).

## NvidiaSageMakerTTSService

NVIDIA Magpie TTS service using SageMaker bidirectional streaming. Maintains a persistent HTTP/2 bidi-stream connection for the lifetime of the pipeline with full interruption support.

### Configuration

<ParamField path="endpoint_name" type="str" required>
  Name of the deployed SageMaker endpoint.
</ParamField>

<ParamField path="region" type="str" default="us-west-2">
  AWS region where the endpoint is deployed.
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
  Audio sample rate in Hz. When `None`, uses the pipeline's configured sample
  rate.
</ParamField>

<ParamField path="settings" type="NvidiaSageMakerTTSService.Settings" default="None">
  Runtime-configurable settings. See [Settings](#settings-3) below.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `NvidiaSageMakerTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details.

| Parameter  | Type              | Default                          | Description                         |
| ---------- | ----------------- | -------------------------------- | ----------------------------------- |
| `model`    | `str`             | `magpie`                         | Model identifier. *(Inherited.)*    |
| `voice`    | `str`             | `Magpie-Multilingual.EN-US.Aria` | Voice identifier. *(Inherited.)*    |
| `language` | `Language \| str` | `en-US`                          | BCP-47 language code for synthesis. |

### Usage

```python theme={null}
from pipecat.services.nvidia.sagemaker.tts import NvidiaSageMakerTTSService

tts = NvidiaSageMakerTTSService(
    endpoint_name=os.getenv("SAGEMAKER_MAGPIE_ENDPOINT_NAME"),
    region=os.getenv("AWS_REGION", "us-west-2"),
    settings=NvidiaSageMakerTTSService.Settings(
        voice="Magpie-Multilingual.EN-US.Aria",
        language="en-US",
    ),
)
```

### Notes

* **AWS SageMaker deployment required**: This service requires a deployed SageMaker endpoint running NVIDIA Magpie TTS NIM. See the [deployment example](https://github.com/pipecat-ai/pipecat-examples/tree/main/deployment/aws-sagemaker-nvidia) for setup instructions.
* **AWS credentials**: Requires `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables for SageMaker authentication.
* **Environment variables**: `SAGEMAKER_MAGPIE_ENDPOINT_NAME` for the endpoint name.
* **Persistent connection**: Maintains a single HTTP/2 bidi-stream session for the pipeline's lifetime, reconnecting automatically on error.
* **Interruption support**: Extends `InterruptibleTTSService` for proper handling of user interruptions.
* **Metrics support**: This service supports metrics generation (`can_generate_metrics()` returns `True`).
