[Conference] 13th Speech Synthesis Workshop (SSW 2025), Leeuwarden, Netherlands, 2025

Authors:

Sriyugesh Bhyravajulla, Ayushi Pandey, Arun Baby

Abstract:

This research examines how neural TTS systems handle voiceless fricatives. Previous work identified acoustic-phonetic deviations in a single speaker; this study extends the analysis to the LJSpeech dataset and compares multiple architectures. Flow-based models like GradTTS and GlowTTS demonstrate improved fricative modeling, while the end-to-end VITS architecture shows persistent deviation patterns across acoustic-phonetic features.

Cite:

@inproceedings{bhyravajulla25_ssw,
  title = {{Fricatives in modern Text-to-Speech synthesizers}},
  author = {Sriyugesh Bhyravajulla and Ayushi Pandey and Arun Baby},
  year = {2025},
  booktitle = {{Proc. 13th edition of the Speech Synthesis Workshop}},
  pages = {222--227},
  doi = {10.21437/SSW.2025-34}
}

Proceedings

PDF