TTS(1)

tts - text-to-speech converter using ElevenLabs API by jtgi

source: https://github.com/jtgi/tts

INSTALLATION

curl -fsSL https://raw.githubusercontent.com/jtgi/tts/main/install.sh | sh

EXAMPLES

Convert simple text to speech

echo "Hello world" | tts > hello.mp3

Process a large book with custom chunk size

cat book.txt | tts -c 10000 > book.mp3

Play audio directly (macOS only)

echo "Hello" | tts --play

Extract and convert chapter headings

grep "^Chapter" book.txt | tts -v EXAVITQu4vr4xnSDxMaL > chapters.mp3

SYNOPSIS

tts [OPTIONS] < input > output.mp3

OPTIONS:
  -k, --api-key KEY         API key
  -v, --voice ID            Voice ID
  -m, --model ID            Model ID  
  -f, --format FORMAT       Output format
  -c, --chunk-size SIZE     Chunk size
  -d, --debug              Debug mode
  -p, --play               Play directly
  -l, --latency LEVEL      Latency optimization
  --stability FLOAT        Voice stability
  --similarity-boost FLOAT Similarity boost
  --style FLOAT            Style intensity
  --speaker-boost BOOL     Speaker boost
  --seed INTEGER           Reproducible seed
  --previous-text TEXT     Previous context
  --next-text TEXT         Next context
  --previous-request-ids   Previous IDs
  --next-request-ids       Next IDs

DESCRIPTION

tts reads text from standard input and writes audio to standard output. It uses the ElevenLabs API to convert text to speech, processing input in chunks for streaming large files.

OPTIONS

-k, --api-key KEY
ElevenLabs API key. If not specified, uses the ELEVENLABS_API_KEY environment variable.
-v, --voice ID
Voice ID to use for speech synthesis. Default: 21m00Tcm4TlvDq8ikWAM (Rachel)
-m, --model ID
Model ID to use for speech synthesis. Default: eleven_monolingual_v1
-f, --format FORMAT
Output audio format. Default: mp3_44100_128
-c, --chunk-size SIZE
Number of characters to process per chunk. Useful for large files. Default: 5000
-d, --debug
Enable debug output to stderr. Shows API requests, responses, and detailed error information.
-p, --play
Play audio directly without saving (macOS only).
-l, --latency LEVEL
Optimize streaming latency (0-4, higher = lower latency). Default: 1
--stability FLOAT
Voice stability (0.0-1.0). Higher values make voice more consistent but less expressive. Default: 0.5
--similarity-boost FLOAT
Voice similarity boost (0.0-1.0). Enhances similarity to the original voice. Default: 0.75
--style FLOAT
Speaking style intensity (0.0-1.0). Controls how expressive the voice is. Default: 0.0
--speaker-boost BOOL
Enable speaker boost (true/false). Enhances voice clarity and distinction. Default: true
--seed INTEGER
Seed for reproducible generation. Use the same seed to get identical audio for the same text.
--previous-text TEXT
Previous text context for better pronunciation and intonation.
--next-text TEXT
Next text context for better pronunciation and intonation.
--previous-request-ids IDS
Comma-separated previous request IDs for context continuity.
--next-request-ids IDS
Comma-separated next request IDs for context continuity.
-h, --help
Display help message and exit.

ENVIRONMENT

ELEVENLABS_API_KEY Default API key for ElevenLabs service
TTS_VOICE_ID Default voice ID
TTS_MODEL_ID Default model ID
TTS_OUTPUT_FORMAT Default output format
TTS_DEBUG Set to 1 to enable debug mode by default
TTS_CHUNK_SIZE Default chunk size for processing
TTS_OPTIMIZE_STREAMING_LATENCY Default latency optimization level (0-4)
TTS_STABILITY Default voice stability (0.0-1.0)
TTS_SIMILARITY_BOOST Default similarity boost (0.0-1.0)
TTS_STYLE Default speaking style intensity (0.0-1.0)
TTS_USE_SPEAKER_BOOST Default speaker boost setting (true/false)
TTS_SEED Default seed for reproducible generation
TTS_PREVIOUS_TEXT Default previous text context
TTS_NEXT_TEXT Default next text context
TTS_PREVIOUS_REQUEST_IDS Default previous request IDs
TTS_NEXT_REQUEST_IDS Default next request IDs

VOICES

Common voice IDs include:

21m00Tcm4TlvDq8ikWAM Rachel (default)
EXAVITQu4vr4xnSDxMaL Bella
ErXwobaYiN019PkySvjV Antoni
MF3mGyEYCl7XYWbV9V6O Elli
TxGEqnHWrfWFTfGW9XjX Josh

FORMATS

mp3_44100_128 MP3 128kbps (default)
mp3_44100_192 MP3 192kbps
pcm_16000 PCM 16kHz
pcm_22050 PCM 22kHz
pcm_24000 PCM 24kHz
pcm_44100 PCM 44.1kHz
ulaw_8000 μ-law 8kHz

EXIT STATUS

0
Success
1
Error occurred (API failure, missing input, authentication error, etc.)

NOTES

Requires curl(1) to be installed.

The tool processes text in chunks to handle large files efficiently, making it suitable for converting entire books or long documents.

API errors are reported with HTTP status codes and helpful hints for common issues like authentication failures or rate limiting.

SEE ALSO

curl(1), ffmpeg(1)

open source - made by jtgi