Linux CUDA
git clone https://github.com/Zyphra/ZONOS2.git
cd ZONOS2
uv sync
uv run python -m minisgl --model-path Zyphra/ZONOS2 --tts-default-voices-dir ./default_voices/Linux CUDA first, WSL2 and cloud GPU as fallback routes
Start with the official Linux x86_64 and NVIDIA CUDA path, then use the command generator to choose the least painful setup for your machine.
Command generator
Pick Linux CUDA for the cleanest route. Use WSL2 only if you already know how to debug NVIDIA passthrough. Use cloud GPU when you want speed over local setup for multilingual ZONOS2 TTS, Japanese voice cloning, and Mandarin Chinese speech tests.
git clone https://github.com/Zyphra/ZONOS2.git
cd ZONOS2
uv sync
uv run python -m minisgl --model-path Zyphra/ZONOS2 --tts-default-voices-dir ./default_voices/Ready.
System requirements
Local inference is aimed at Linux x86_64 with NVIDIA CUDA. Use this quick checker to choose local, WSL2, or cloud GPU.
Copyable setup
The shortest official path is Linux plus NVIDIA CUDA. Windows users should consider WSL2 only if they are comfortable debugging GPU passthrough.
git clone https://github.com/Zyphra/ZONOS2.git
cd ZONOS2
uv sync
uv run python -m minisgl --model-path Zyphra/ZONOS2 --tts-default-voices-dir ./default_voices/curl -X POST http://localhost:1919/tts/generate \
-H "Content-Type: application/json" \
-d '{"text":"Hello from ZONOS2","stream":true}' \
--output output.pcm
ffmpeg -f f32le -ar 44100 -ac 1 -i output.pcm output.wavwsl --install
# Install an NVIDIA driver with WSL CUDA support on Windows.
# Inside Ubuntu on WSL2:
nvidia-smi
uv --version
# Then follow the Linux CUDA commands.Multilingual ZONOS2 workflow
This Install page targets users searching for ZONOS2 multilingual TTS, ZONOS2 voice cloning, ZONOS2 Japanese voice cloning, ZONOS2 Mandarin Chinese speech, and ZONOS2 English narration. Keep tests short, compare language output side by side, and verify consent before using any cloned voice.
Use ZONOS2 TTS for product explainers, YouTube voiceovers, podcasts, API demos, and developer documentation where clear English pacing matters.
Use ZONOS2 Japanese TTS for anime-style dialogue tests, game character lines, VTuber scripts, localization drafts, and language-learning examples.
Use ZONOS2 Mandarin Chinese speech for bilingual demos, creator narration, app onboarding, education content, and Chinese voice cloning experiments.
Store language, reference voice, prompt text, consent status, and output settings together so every ZONOS2 multilingual generation is traceable.
FAQ
Use Linux x86_64 with an NVIDIA GPU and matching CUDA toolkit when possible.
Windows users should treat WSL2 as an advanced route. Cloud GPU is often faster to debug.
The documented local server starts on localhost port 1919 by default.