Skip to main content

Overview

Google Cloud Text-to-Speech provides high-quality speech synthesis with two service implementations: GoogleTTSService (WebSocket-based) for streaming with the lowest latency, and GoogleHttpTTSService (HTTP-based) for simpler integration. GoogleTTSService is recommended for real-time applications.

Google TTS API Reference

Pipecat’s API methods for Google Cloud TTS integration

Example Implementation

Complete example with Chirp 3 HD voice

Google Cloud Documentation

Official Google Cloud Text-to-Speech documentation

Voice Gallery

Browse available voices and languages

Installation

To use Google services, install the required dependencies:
uv add "pipecat-ai[google]"

Prerequisites

Google Cloud Setup

Before using Google Cloud TTS services, you need:
  1. Google Cloud Account: Sign up at Google Cloud Console
  2. Project Setup: Create a project and enable the Text-to-Speech API
  3. Service Account: Create a service account with TTS permissions
  4. Authentication: Set up credentials via service account key or Application Default Credentials

Required Environment Variables

  • GOOGLE_APPLICATION_CREDENTIALS: Path to your service account key file (recommended)
  • Or use Application Default Credentials for cloud deployments

Configuration

GoogleTTSService

Streaming service optimized for Chirp 3 HD and Journey voices.
credentials
str
default:"None"
JSON string containing Google Cloud service account credentials.
credentials_path
str
default:"None"
Path to Google Cloud service account JSON file.
location
str
default:"None"
Google Cloud location for regional endpoint (e.g., "us-central1").
voice_id
str
default:"en-US-Chirp3-HD-Charon"
deprecated
Google TTS voice identifier. Deprecated in v0.0.105. Use settings=GoogleTTSService.Settings(voice=...) instead.
voice_cloning_key
str
default:"None"
Voice cloning key for Chirp 3 custom voices.
sample_rate
int
default:"None"
Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
InputParams
default:"InputParams()"
deprecated
Deprecated in v0.0.105. Use settings=GoogleTTSService.Settings(...) instead.
settings
GoogleTTSService.Settings
default:"None"
Runtime-configurable settings. See GoogleTTSService Settings below.

GoogleTTSService Settings

Runtime-configurable settings passed via the settings constructor argument using GoogleTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel identifier. (Inherited.)
voicestrNoneVoice identifier. (Inherited.)
languageLanguage | strNoneLanguage for synthesis. (Inherited.)
speaking_ratefloatNOT_GIVENSpeaking rate in the range [0.25, 2.0].

GoogleHttpTTSService

HTTP service with full SSML support for all voice types.
credentials
str
default:"None"
JSON string containing Google Cloud service account credentials.
credentials_path
str
default:"None"
Path to Google Cloud service account JSON file.
location
str
default:"None"
Google Cloud location for regional endpoint.
voice_id
str
default:"en-US-Chirp3-HD-Charon"
deprecated
Google TTS voice identifier. Deprecated in v0.0.105. Use settings=GoogleHttpTTSService.Settings(voice=...) instead.
sample_rate
int
default:"None"
Output audio sample rate in Hz.
params
InputParams
default:"None"
deprecated
Deprecated in v0.0.105. Use settings=GoogleHttpTTSService.Settings(...) instead.
settings
GoogleHttpTTSService.Settings
default:"None"
Runtime-configurable settings. See GoogleHttpTTSService Settings below.

GoogleHttpTTSService Settings

Runtime-configurable settings passed via the settings constructor argument using GoogleHttpTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel identifier. (Inherited.)
voicestrNoneVoice identifier. (Inherited.)
languageLanguage | strNoneLanguage for synthesis. (Inherited.)
pitchstrNOT_GIVENVoice pitch adjustment (e.g., "+2st", "-50%").
ratestrNOT_GIVENSpeaking rate for SSML prosody (non-Chirp voices, e.g., "slow", "fast", "125%").
speaking_ratefloatNOT_GIVENSpeaking rate for AudioConfig (Chirp/Journey voices). Range [0.25, 2.0].
volumestrNOT_GIVENVolume adjustment (e.g., "loud", "soft", "+6dB").
emphasisLiteralNOT_GIVENEmphasis level: "strong", "moderate", "reduced", "none".
genderLiteralNOT_GIVENVoice gender preference: "male", "female", "neutral".
google_styleLiteralNOT_GIVENGoogle-specific voice style: "apologetic", "calm", "empathetic", "firm", "lively".

GeminiTTSService

Streaming service using Gemini’s TTS-specific models with natural voice control. Supports two backends: the Google Cloud backend (with prompts for style instructions and multi-speaker support) or the Gemini Developer API (google-genai) backend (simpler API key authentication).
model
str
default:"gemini-3.1-flash-tts-preview"
deprecated
Gemini TTS model to use. Options: "gemini-3.1-flash-tts-preview", "gemini-2.5-flash-tts", "gemini-2.5-pro-tts". Deprecated in v0.0.105. Use settings=GeminiTTSService.Settings(model=...) instead.
api_key
str
default:"None"
Google AI API key for authentication with the GenAI backend. When provided, automatically selects the GenAI backend. Alternatively set GOOGLE_API_KEY environment variable.
credentials
str
default:"None"
JSON string containing Google Cloud service account credentials for the Google Cloud backend.
credentials_path
str
default:"None"
Path to Google Cloud service account JSON file for the Google Cloud backend.
location
str
default:"None"
Google Cloud location for regional endpoint (Google Cloud backend only).
voice_id
str
default:"Kore"
deprecated
Voice name from available Gemini voices (e.g., "Kore", "Charon", "Puck", "Zephyr"). Deprecated in v0.0.105. Use settings=GeminiTTSService.Settings(voice=...) instead.
sample_rate
int
default:"None"
Output audio sample rate in Hz. Google TTS outputs at 24kHz; mismatched rates will produce a warning.
params
InputParams
default:"None"
deprecated
Deprecated in v0.0.105. Use settings=GeminiTTSService.Settings(...) instead.
settings
GeminiTTSService.Settings
default:"None"
Runtime-configurable settings. See GeminiTTSService Settings below.
use_genai
bool
default:"None"
Force use of the google-genai backend when True, or the Google Cloud backend when False. If not provided, backend is selected automatically based on whether api_key is passed.
http_options
HttpOptions
default:"None"
HTTP client options for the google-genai client. Only applicable when using the GenAI backend.

GeminiTTSService Settings

Runtime-configurable settings passed via the settings constructor argument using GeminiTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel identifier. (Inherited.)
voicestrNoneVoice identifier. (Inherited.)
languageLanguage | strNoneLanguage for synthesis. (Inherited.)
promptstrNOT_GIVENStyle instructions for how to synthesize the content.
multi_speakerboolNOT_GIVENEnable multi-speaker support.
speaker_configslist[dict]NOT_GIVENSpeaker configurations for multi-speaker mode. Each dict should have speaker_alias and optionally speaker_id.

Usage

Basic Setup (Streaming)

from pipecat.services.google import GoogleTTSService

tts = GoogleTTSService(
    credentials_path="/path/to/service-account.json",
    settings=GoogleTTSService.Settings(
        voice="en-US-Chirp3-HD-Charon",
        language=Language.EN_US,
    )
)

HTTP Service with SSML

from pipecat.services.google import GoogleHttpTTSService
from pipecat.transcriptions.language import Language

tts = GoogleHttpTTSService(
    credentials_path="/path/to/service-account.json",
    settings=GoogleHttpTTSService.Settings(
        voice="en-US-Standard-A",
        language=Language.EN_US,
        rate="1.1",
        pitch="+2st",
    ),
)

Gemini TTS with GenAI Backend (API Key)

from pipecat.services.google import GeminiTTSService

tts = GeminiTTSService(
    api_key=os.environ["GOOGLE_API_KEY"],
    settings=GeminiTTSService.Settings(
        model="gemini-3.1-flash-tts-preview",
        voice="Puck",
    )
)

Gemini TTS with Google Cloud Backend (Style Prompt)

from pipecat.services.google import GeminiTTSService
from pipecat.transcriptions.language import Language

tts = GeminiTTSService(
    credentials_path="/path/to/service-account.json",
    settings=GeminiTTSService.Settings(
        model="gemini-3.1-flash-tts-preview",
        voice="Kore",
        language=Language.EN_US,
        prompt="Say this in a friendly and helpful tone"
    )
)
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Streaming vs HTTP: GoogleTTSService uses the streaming API for low latency and only supports Chirp 3 HD and Journey voices. GoogleHttpTTSService supports all Google voices including Standard and WaveNet, with full SSML support.
  • Chirp/Journey voices and SSML: Chirp and Journey voices do not support SSML. The HTTP service automatically uses plain text input for these voices.
  • Speaking rate: For Chirp and Journey voices, use speaking_rate (float, 0.25-2.0) in settings. For other voices, use rate (string) for SSML prosody control.
  • Gemini TTS sample rate: Google TTS always outputs at 24kHz. Setting a different sample rate will produce a warning and may cause audio issues.
  • Gemini TTS backends: GeminiTTSService supports two backends:
    • GenAI backend (google-genai): Simpler authentication with API key. Automatically selected when api_key is provided. Does not support prompt or multi_speaker settings.
    • Google Cloud backend: Uses service account credentials. Supports prompt for style instructions and multi_speaker for multi-voice conversations.
  • Backend selection: Pass api_key to use the GenAI backend, or credentials/credentials_path for Google Cloud. The GOOGLE_API_KEY environment variable alone does not switch backends; it is only used once the GenAI backend is active. Use use_genai=True to force the GenAI backend explicitly.
  • Gemini multi-speaker: Use multi_speaker=True with speaker_configs to generate conversations between multiple voices (Google Cloud backend only). Markup text with speaker aliases to control which voice speaks.