Voice Module - Doubao TTS Configuration Guide ✨
This module interfaces with the TTS voice synthesis service provided by Doubao TTS!
First, we need to copy a configuration file template from template_configs/Doubao_tts_template.toml to configs/Doubao_tts.toml.
# Note: All items marked with [*] should be configured and interpreted by referring to the official documentation https://www.volcengine.com/docs/6561/1257584
[app] # Doubao TTS app configuration
base_url = "https://openspeech.bytedance.com/api/v1/tts" # API address
# The following parts are from official documentation, please refer to Q1 section of https://www.volcengine.com/docs/6561/196768
appid = ""
token = ""
cluster = ""
[audio] # Audio configuration
voice_type = "" # [*]
emotion = "" # Currently ineffective, please keep empty [*]
enable_emotion = false # Currently ineffective, please keep as false [*]
emotion_scale = 0 # Currently ineffective, please keep as 0 [*]
speed_ratio = 1.0 # Speech speed, range 0.8-2.0, default 1.0
explicit_language = "" # Explicit language [*]
context_language = "" # Reference language [*]
loudness_ratio = 1.0 # Volume, range 0.5-2.0, default 1.0 [*]
[request]
silence_duration = 0 # End-of-sentence silence duration, unit ms, default 0, maximum 300000 [*]Configuration Item Details
All configuration content here is written based on the official documentation, so if you have any questions, remember to check the official documentation~
app
For how to obtain this content, please refer to the Q1 section of another official document.
base_url: This is the API interface address for Doubao TTS service. In most cases, keep it as default.appid: Application ID (AppID) applied for.token: Access token.cluster: Cluster information
audio Audio Parameters
voice_type: Voice setting! Doubao TTS service provides many different voices. You can check the voice list provided by Doubao official, find the code for your favorite voice from there, and fill in the code here.emotion: Currently not used, must be left empty!enable_emotion: Currently not used, must be left empty!emotion_scale: Currently not used, must be left empty!speed_ratio: Speech speed setting, range is0.8to2.0, default value is1.0.explicit_language: Explicit language. This part tells the large model what language to read.
Bailing's Tip (2025/05/22 version)
According to the official documentation, the fillable methods are as follows:
- Leave empty: Normal Chinese-English mixed
crosslingual: Multilingual mixed (includingzh/en/ja/es-ms/id/pt-br)zh: Mainly Chinese, supports Chinese-English mixeden: English onlyja: Japanese onlyes-mx: Mexican Spanish onlyid: Indonesian onlypt-br: Brazilian Portuguese only
context_language: Reference language provided to the model.
Bailing's Tip (2025/05/22 version)
According to the official documentation, the fillable methods are now as follows:
- Leave empty: Western European languages use English.
id: Western European languages use Indonesian.es: Western European languages use Mexican Spanish.pt: Western European languages use Brazilian Portuguese.
loudness_ratio: Volume adjustment. The volume here can be selected as a decimal between 0.5 and 2.0, default is 1.0.
request
This content is categorized under request because that's how the official parameter list writes it~
silence_duration: Sets how much additional silence duration to add after each sentence is spoken. Unit is milliseconds. Default is0, meaning no additional silence is added. Maximum can be set to300000milliseconds.