isp-tts: Simple Educational TTS model

A minimal, educational text-to-speech (TTS) system developed for the
Speech Synthesis and Voice Cloning course during the Independent Study Period 2025 (ISP'25) at Skoltech.

Demos

The model components and training example are provided in the following demonstration notebooks:

inference.ipynb: demo with the TTS inference using the pre-trained models
training.ipynb: code for fine-tuning the pre-trained model on custom data

Model

The model architecture takes inspiration from FastPitch and Matcha-TTS and introduces a few modifications and simplifications. Its modules are:

Transformer-based TextEncoder with ALiBi embeddings
Aligner between text and mel spectrograms with CUDA-supported Monotonic Alignment Search
Flow Matching and Transformer-based TemporalAdaptor for modeling the distribution of token duration, pitch, and energy
Transformer-based MelDecoder with ALiBi embeddings

Dataset

The dataset for training the models should have the following structure:

DATASET_ROOT
  wavs
    audio_1.wav
    audio_2.wav
    ...
    audio_N.wav
  meta.csv

The metadata file should have the following structure:

wavs/audio_1.wav|This is the sample text.
wavs/audio_2.wav|The second audio св+язяно с +этим т+екстом.
...
wavs/audio_N.wav|нижний текст.

In other words, the metadata files should contain the "|"-separated paths to audios (relative to the dataset root) and matched texts.

License

Prepared for academic and non-commercial use.
Inspired by open-source projects and educational resources in speech synthesis research.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
notebooks		notebooks
recipes		recipes
tts		tts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

isp-tts: Simple Educational TTS model

Demos

Model

Dataset

License

About

Uh oh!

Releases

Packages

Languages

License

ilya16/isp-tts

Folders and files

Latest commit

History

Repository files navigation

isp-tts: Simple Educational TTS model

Demos

Model

Dataset

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages