Google - Pipecat

Overview

Google Cloud Text-to-Speech provides high-quality speech synthesis with two service implementations: GoogleTTSService (WebSocket-based) for streaming with the lowest latency, and GoogleHttpTTSService (HTTP-based) for simpler integration. GoogleTTSService is recommended for real-time applications.

Google TTS API Reference

Pipecat’s API methods for Google Cloud TTS integration

Example Implementation

Complete example with Chirp 3 HD voice

Google Cloud Documentation

Official Google Cloud Text-to-Speech documentation

Voice Gallery

Browse available voices and languages

Installation

To use Google services, install the required dependencies:

uv add "pipecat-ai[google]"

Prerequisites

Google Cloud Setup

Before using Google Cloud TTS services, you need:

Google Cloud Account: Sign up at Google Cloud Console
Project Setup: Create a project and enable the Text-to-Speech API
Service Account: Create a service account with TTS permissions
Authentication: Set up credentials via service account key or Application Default Credentials

Required Environment Variables

GOOGLE_APPLICATION_CREDENTIALS: Path to your service account key file (recommended)
Or use Application Default Credentials for cloud deployments

Configuration

GoogleTTSService

Streaming service optimized for Chirp 3 HD and Journey voices.

credentials

str

default:"None"

JSON string containing Google Cloud service account credentials.

credentials_path

str

default:"None"

Path to Google Cloud service account JSON file.

location

str

default:"None"

Google Cloud location for regional endpoint (e.g., "us-central1").

voice_id

str

default:"en-US-Chirp3-HD-Charon"

deprecated

Google TTS voice identifier. Deprecated in v0.0.105. Use settings=GoogleTTSService.Settings(voice=...) instead.

voice_cloning_key

str

default:"None"

Voice cloning key for Chirp 3 custom voices.

sample_rate

int

default:"None"

Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.

params

InputParams

default:"InputParams()"

deprecated

Deprecated in v0.0.105. Use settings=GoogleTTSService.Settings(...) instead.

settings

GoogleTTSService.Settings

default:"None"

Runtime-configurable settings. See GoogleTTSService Settings below.

GoogleTTSService Settings

Runtime-configurable settings passed via the settings constructor argument using GoogleTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`None`	Model identifier. (Inherited.)
`voice`	`str`	`None`	Voice identifier. (Inherited.)
`language`	`Language \| str`	`None`	Language for synthesis. (Inherited.)
`speaking_rate`	`float`	`NOT_GIVEN`	Speaking rate in the range [0.25, 2.0].

GoogleHttpTTSService

HTTP service with full SSML support for all voice types.

credentials

str

default:"None"

JSON string containing Google Cloud service account credentials.

credentials_path

str

default:"None"

Path to Google Cloud service account JSON file.

location

str

default:"None"

Google Cloud location for regional endpoint.

voice_id

str

default:"en-US-Chirp3-HD-Charon"

deprecated

Google TTS voice identifier. Deprecated in v0.0.105. Use settings=GoogleHttpTTSService.Settings(voice=...) instead.

sample_rate

int

default:"None"

Output audio sample rate in Hz.

params

InputParams

default:"None"

deprecated

Deprecated in v0.0.105. Use settings=GoogleHttpTTSService.Settings(...) instead.

settings

GoogleHttpTTSService.Settings

default:"None"

Runtime-configurable settings. See GoogleHttpTTSService Settings below.

GoogleHttpTTSService Settings

Runtime-configurable settings passed via the settings constructor argument using GoogleHttpTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`None`	Model identifier. (Inherited.)
`voice`	`str`	`None`	Voice identifier. (Inherited.)
`language`	`Language \| str`	`None`	Language for synthesis. (Inherited.)
`pitch`	`str`	`NOT_GIVEN`	Voice pitch adjustment (e.g., `"+2st"`, `"-50%"`).
`rate`	`str`	`NOT_GIVEN`	Speaking rate for SSML prosody (non-Chirp voices, e.g., `"slow"`, `"fast"`, `"125%"`).
`speaking_rate`	`float`	`NOT_GIVEN`	Speaking rate for AudioConfig (Chirp/Journey voices). Range [0.25, 2.0].
`volume`	`str`	`NOT_GIVEN`	Volume adjustment (e.g., `"loud"`, `"soft"`, `"+6dB"`).
`emphasis`	`Literal`	`NOT_GIVEN`	Emphasis level: `"strong"`, `"moderate"`, `"reduced"`, `"none"`.
`gender`	`Literal`	`NOT_GIVEN`	Voice gender preference: `"male"`, `"female"`, `"neutral"`.
`google_style`	`Literal`	`NOT_GIVEN`	Google-specific voice style: `"apologetic"`, `"calm"`, `"empathetic"`, `"firm"`, `"lively"`.

GeminiTTSService

Streaming service using Gemini’s TTS-specific models with natural voice control. Supports two backends: the Google Cloud backend (with prompts for style instructions and multi-speaker support) or the Gemini Developer API (google-genai) backend (simpler API key authentication).

model

str

default:"gemini-3.1-flash-tts-preview"

deprecated

Gemini TTS model to use. Options: "gemini-3.1-flash-tts-preview", "gemini-2.5-flash-tts", "gemini-2.5-pro-tts". Deprecated in v0.0.105. Use settings=GeminiTTSService.Settings(model=...) instead.

api_key

str

default:"None"

Google AI API key for authentication with the GenAI backend. When provided, automatically selects the GenAI backend. Alternatively set GOOGLE_API_KEY environment variable.

credentials

str

default:"None"

JSON string containing Google Cloud service account credentials for the Google Cloud backend.

credentials_path

str

default:"None"

Path to Google Cloud service account JSON file for the Google Cloud backend.

location

str

default:"None"

Google Cloud location for regional endpoint (Google Cloud backend only).

voice_id

str

default:"Kore"

deprecated

Voice name from available Gemini voices (e.g., "Kore", "Charon", "Puck", "Zephyr"). Deprecated in v0.0.105. Use settings=GeminiTTSService.Settings(voice=...) instead.

sample_rate

int

default:"None"

Output audio sample rate in Hz. Google TTS outputs at 24kHz; mismatched rates will produce a warning.

params

InputParams

default:"None"

deprecated

Deprecated in v0.0.105. Use settings=GeminiTTSService.Settings(...) instead.

settings

GeminiTTSService.Settings

default:"None"

Runtime-configurable settings. See GeminiTTSService Settings below.

use_genai

bool

default:"None"

Force use of the google-genai backend when True, or the Google Cloud backend when False. If not provided, backend is selected automatically based on whether api_key is passed.

http_options

HttpOptions

default:"None"

HTTP client options for the google-genai client. Only applicable when using the GenAI backend.

GeminiTTSService Settings

Runtime-configurable settings passed via the settings constructor argument using GeminiTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`None`	Model identifier. (Inherited.)
`voice`	`str`	`None`	Voice identifier. (Inherited.)
`language`	`Language \| str`	`None`	Language for synthesis. (Inherited.)
`prompt`	`str`	`NOT_GIVEN`	Style instructions for how to synthesize the content.
`multi_speaker`	`bool`	`NOT_GIVEN`	Enable multi-speaker support.
`speaker_configs`	`list[dict]`	`NOT_GIVEN`	Speaker configurations for multi-speaker mode. Each dict should have `speaker_alias` and optionally `speaker_id`.

Usage

Basic Setup (Streaming)

from pipecat.services.google import GoogleTTSService

tts = GoogleTTSService(
    credentials_path="/path/to/service-account.json",
    settings=GoogleTTSService.Settings(
        voice="en-US-Chirp3-HD-Charon",
        language=Language.EN_US,
    )
)

HTTP Service with SSML

from pipecat.services.google import GoogleHttpTTSService
from pipecat.transcriptions.language import Language

tts = GoogleHttpTTSService(
    credentials_path="/path/to/service-account.json",
    settings=GoogleHttpTTSService.Settings(
        voice="en-US-Standard-A",
        language=Language.EN_US,
        rate="1.1",
        pitch="+2st",
    ),
)

Gemini TTS with GenAI Backend (API Key)

from pipecat.services.google import GeminiTTSService

tts = GeminiTTSService(
    api_key=os.environ["GOOGLE_API_KEY"],
    settings=GeminiTTSService.Settings(
        model="gemini-3.1-flash-tts-preview",
        voice="Puck",
    )
)

Gemini TTS with Google Cloud Backend (Style Prompt)

from pipecat.services.google import GeminiTTSService
from pipecat.transcriptions.language import Language

tts = GeminiTTSService(
    credentials_path="/path/to/service-account.json",
    settings=GeminiTTSService.Settings(
        model="gemini-3.1-flash-tts-preview",
        voice="Kore",
        language=Language.EN_US,
        prompt="Say this in a friendly and helpful tone"
    )
)

The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

Streaming vs HTTP: GoogleTTSService uses the streaming API for low latency and only supports Chirp 3 HD and Journey voices. GoogleHttpTTSService supports all Google voices including Standard and WaveNet, with full SSML support.
Chirp/Journey voices and SSML: Chirp and Journey voices do not support SSML. The HTTP service automatically uses plain text input for these voices.
Speaking rate: For Chirp and Journey voices, use speaking_rate (float, 0.25-2.0) in settings. For other voices, use rate (string) for SSML prosody control.
Gemini TTS sample rate: Google TTS always outputs at 24kHz. Setting a different sample rate will produce a warning and may cause audio issues.
Gemini TTS backends: GeminiTTSService supports two backends:
- GenAI backend (google-genai): Simpler authentication with API key. Automatically selected when api_key is provided. Does not support prompt or multi_speaker settings.
- Google Cloud backend: Uses service account credentials. Supports prompt for style instructions and multi_speaker for multi-voice conversations.
Backend selection: Pass api_key to use the GenAI backend, or credentials/credentials_path for Google Cloud. The GOOGLE_API_KEY environment variable alone does not switch backends; it is only used once the GenAI backend is active. Use use_genai=True to force the GenAI backend explicitly.
Gemini multi-speaker: Use multi_speaker=True with speaker_configs to generate conversations between multiple voices (Google Cloud backend only). Markup text with speaker aliases to control which voice speaks.

​Overview

Google TTS API Reference

Example Implementation

Google Cloud Documentation

Voice Gallery

​Installation

​Prerequisites

​Google Cloud Setup

​Required Environment Variables

​Configuration

​GoogleTTSService

​GoogleTTSService Settings

​GoogleHttpTTSService

​GoogleHttpTTSService Settings

​GeminiTTSService

​GeminiTTSService Settings

​Usage

​Basic Setup (Streaming)

​HTTP Service with SSML

​Gemini TTS with GenAI Backend (API Key)

​Gemini TTS with Google Cloud Backend (Style Prompt)

​Notes

Overview

Installation

Prerequisites

Google Cloud Setup

Required Environment Variables

Configuration

GoogleTTSService

GoogleTTSService Settings

GoogleHttpTTSService

GoogleHttpTTSService Settings

GeminiTTSService

GeminiTTSService Settings

Usage

Basic Setup (Streaming)

HTTP Service with SSML

Gemini TTS with GenAI Backend (API Key)

Gemini TTS with Google Cloud Backend (Style Prompt)

Notes