Fricatives in modern Text-to-Speech synthesizers
[Conference] 13th Speech Synthesis Workshop (SSW 2025), Leeuwarden, Netherlands, 2025
Authors:
Sriyugesh Bhyravajulla, Ayushi Pandey, Arun Baby
Abstract:
This research examines how neural TTS systems handle voiceless fricatives. Previous work identified acoustic-phonetic deviations in a single speaker; this study extends the analysis to the LJSpeech dataset and compares multiple architectures. Flow-based models like GradTTS and GlowTTS demonstrate improved fricative modeling, while the end-to-end VITS architecture shows persistent deviation patterns across acoustic-phonetic features.
Cite:
@inproceedings{bhyravajulla25_ssw,
title = {{Fricatives in modern Text-to-Speech synthesizers}},
author = {Sriyugesh Bhyravajulla and Ayushi Pandey and Arun Baby},
year = {2025},
booktitle = {{Proc. 13th edition of the Speech Synthesis Workshop}},
pages = {222--227},
doi = {10.21437/SSW.2025-34}
}