abus-aikorea/voice-pro

9.1k

+409/day

1.2k

Python

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

From the README

Voice-Pro

The best AI speech recognition, translation, and multilingual dubbing solution 🚀

🎙️ An AI-powered web application for speech recognition, translation, and dubbing

 한국어

∙

 English

∙

 中文简体

∙

 中文繁體

∙

 日本語

∙

 Deutsch

∙

 Español

∙

 Português

Voice-Pro is a state-of-the-art web app that transforms multimedia content creation. It integrates YouTube video downloading, voice separation, speech recognition, translation, and text-to-speech into a single, powerful tool for creators, researchers, and multilingual professionals.

🔊 Top-tier speech recognition: Whisper, Faster-Whisper, Whisper-Timestamped, WhisperX
🎤 Zero-shot voice cloning: F5-TTS, E2-TTS, CosyVoice
📢 Multilingual text-to-speech: Edge-TTS, kokoro (Paid version includes Azure TTS)
🎥 YouTube processing & audio extraction: yt-dlp
🌍 Instant translation for 100+ languages: Deep-Translator (Paid version includes Azure Translator)

A ro

View on GitHub