๐ŸŽค F5-TTS: Vietnamese Text-to-Speech Synthesis.

The model was trained for 500.000 steps with approximately 150 hours of data on an RTX 3090 GPU.

Enter text and upload a sample voice to generate natural speech.

0.3 2