Anonymous EMNLP Submission
Recent advancements in speech synthesis have significantly improved the audio quality and pronunciation of synthesized speech. To further advance toward human-like conversational speech synthesis, this paper presents FillerSpeech, a novel speech synthesis framework that enables natural filler insertion and control over filler style. To address this, we construct a filler-inclusive speech data, derived from the open-source large-scale speech corpus. This data includes fillers with pitch and duration information. For the generation and style control of natural fillers, we propose a method that tokenizes filler style and utilizes cross-attention with the input text. Furthermore, we introduce a large language model-based filler prediction method that enables natural insertion of fillers even when only text input is provided. The experimental results demonstrate that the constructed dataset is valid and that our proposed methods for filler style control and filler prediction are effective. Our code and demo are available at https://fillerspeech.github.io/main.
Official implementation of FillerSpeech: https://github.com/FillerSpeech/main
Text: "<Well>, what are you having to drink?" "Beer," said Matheny without hesitation. "<Huh>? Look, Pal, this is on me." | |||
---|---|---|---|
GT |
Vocoded |
Matcha-TTS |
|
FillerSpeech w/o PP and CA |
FillerSpeech w/o CA |
FillerSpeech |
Text: "He rose and stood swaying, showing a stricken face. "<Eh>?" Ducroy insisted with an accent of exasperation. | |||
---|---|---|---|
GT |
Vocoded |
Matcha-TTS |
|
FillerSpeech w/o PP and CA |
FillerSpeech w/o CA |
FillerSpeech |
Text: "<Ha>!" There was a sensation without. I could see that Emmeline recoiled from the side of her companion. | |||
---|---|---|---|
GT |
Vocoded |
Matcha-TTS |
|
FillerSpeech w/o PP and CA |
FillerSpeech w/o CA |
FillerSpeech |
Text: "There are no bad shots on mars survival of the fittest, you know." Doran wet his lips. "<Uh>, no hard feelings. No, none at all." | |||
---|---|---|---|
GT |
Vocoded |
Matcha-TTS |
|
FillerSpeech w/o PP and CA |
FillerSpeech w/o CA |
FillerSpeech |
Text: "But what?" He asked, moodily. "What were you going to say?" Her eyes closed with pain. "<Eh>?" He said. | |||
---|---|---|---|
GT |
Vocoded |
Matcha-TTS |
|
FillerSpeech w/o PP and CA |
FillerSpeech w/o CA |
FillerSpeech |
Text: <Ah>, the license might be easy to obtain but how about his forgiveness? That must be obtained first. | |||
---|---|---|---|
High | Medium | Low | |
Long | |||
Medium | |||
Short |
Text: "<Well>, what are you having to drink?" "Beer," said Matheny without hesitation. "<Huh>? Look, Pal, this is on me. | |||
---|---|---|---|
High | Medium | Low | |
Long | |||
Medium | |||
Short |
Text: "<Well>, we shall give it to you," said mother fisher. Then she went over to the bed and dropped a kiss on polly's brown hair. | |||
---|---|---|---|
High | Medium | Low | |
Long | |||
Medium | |||
Short |