Prof. David Harel Group Department of Computer Science & Applied Mathematics

Whisper Emotional Speech Synthesis (WESS)

WESS architecture: T2S → S2A → Vocoder with emotion/dominance prefix and speaker embedding

Interactive Audio Demo

WS (before fine-tuning)

WESS (ours)

GPT-4o mini TTS (SOTA)

Emotion-ID accuracy by emotion (95% CI, FDR-adjusted p-values)