Windows Spark-TTS Installation Guide

Spark TTS is an advanced text-to-speech system that utilizes the power of large language models (LLM) to achieve highly accurate and natural speech synthesis. It is designed to be efficient, flexible, and powerful, suitable for both research and production purposes.

Spark-TTS Installation (Windows Guide)

1. Install Conda (if you haven’t already)

  • Download Miniconda and install it.
  • Make sure to check “Add Conda to PATH” during installation.

Download Spark-TTS

You have two options to get the files:

Option 1 (Recommended for Windows): Download ZIP manually

  • Go to Spark-TTS GitHub
  • Click “Code” > “Download ZIP”, then extract it.

Option 2: Use Git (Optional)

If you prefer using Git, install Git and run:git clone https://github.com/SparkAudio/Spark-TTS.git

2. Create a Conda Environment

Open Command Prompt (cmd) and run:

conda create -n sparktts python=3.12 -y
conda activate sparktts

This creates and activates a Python 3.12 environment for Spark-TTS.

3. Install Dependencies

Inside the Spark-TTS folder (whether from ZIP or Git), run:

pip install -r requirements.txt

4. Install PyTorch (Auto-Detect CUDA or CPU)

pip install torch torchvision torchaudio --index-url https://pytorch.org/get-started/previous-versions/

# OR Manually install a specific CUDA version (if needed)
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118  # Older GPUs

5. Download the Model

There are two ways to get the model files. Pick one:

Option 1 (Recommended): Using Python
Create a new file in the Spark-TTS folder called download_model.py, paste this inside, and run it:

from huggingface_hub import snapshot_download
import os

# Set download path
model_dir = "pretrained_models/Spark-TTS-0.5B"

# Check if model already exists
if os.path.exists(model_dir) and len(os.listdir(model_dir)) > 0:
    print("Model files already exist. Skipping download.")
else:
    print("Downloading model files...")
    snapshot_download(
        repo_id="SparkAudio/Spark-TTS-0.5B",
        local_dir=model_dir,
        resume_download=True  # Resumes partial downloads
    )
    print("Download complete!")

Run it with:

python download_model.py

✅ Option 2: Using Git (If You Installed It)

mkdir pretrained_models
git clone https://huggingface.co/SparkAudio/Spark-TTS-0.5B pretrained_models/Spark-TTS-0.5B

Either method works—choose whichever is easier for you.

6. Run Spark-TTS

Web UI (Recommended)

For an interactive browser-based interface, run:

python webui.py

This launches a local web server where you can enter text and generate speech or clone a voice.

7. Troubleshooting & Common Questions

🔎 Before Asking for Help
Many common issues are already covered in existing discussions, documentation, or online resources. Please:

  • Search GitHub issues first 🕵️‍♂️
  • Check the documentation 📖
  • Google or use AI tools (ChatGPT, DeepSeek, etc.)

If you still need help, please explain what you’ve already tried so we can assist you better!

Share

Leave a Reply

Your email address will not be published. Required fields are marked *