Text To Speech
The Text to Speech (TTS) tab allows you configure and use both Local TTS or NovelAI's own Streamed TTS technology to voice over your written text.
Text to Speech Source
The first thing you need to do is choosing if you want to use your Streamed or Local TTS
Streamed: Uses NovelAI's own remote TTS service to generate voice lines. This option offers more quality and customizability, but it requires an active subscription (with 100 voice line generations provided for free trials).
Local: Uses your browser's text synthesis capabilities. Quality will vary depending on what local TTS tools you have, and NovelAI's TTS model features will be unavailable.
A note about Streamed TTS: Some internet browsers lack a feature that allow it to play as soon as possible. So depending on your browser, you might get voice lines a tiny bit later. You will know that's the case if you see this warning when selecting Streamed TTS.
Automatic Speech Options
Next, you can select when NovelAI will automatically activate TTS to read specific text.
Speak Outputs: This will make the AI always voice-over the text it generates when you hit Send.
Speak Inputs: This will make the AI always voice-over the text you write as a new input (blue text on default theme) preceding its next output when you hit Send.
Speak HypeBot Comments: This will make the AI always voice-over the text generated as a Hypebot comment, after it's done generating it.
If you have multiple automatic speech option selected, it will first read your input, then the AI's output, and then the Hypebot comment, seamlessly in one voice-over file.
You can also make TTS read any text of your story on demand by utilizing the right-click menu on the Editor.
You can simply click the option to instantly generate a voice line with your default settings, or use the arrow to generate any voice of choice. Clicking the icon will download the voice line as a file instead of playing it.
Streamed TTS Settings
Model
When using Streamed TTS, you have the option to use TTS v1 or TTS v2.
TTS v1 is a bit older and lacks features compared to TTS v2, but it's simpler to user.
If you want more control over how the voices sound, v2 is recommended.
For both the streamed TTS models, the button can be used to instantly play your selected voice,
while the button can be used to download a sound file containing the voice line.
v1 Settings
For v1, you can select one of the default voices, or input a custom seed. You can use the field on the right to write any text you want to try the voices with.
By selecting the very last voice option in the dropdown list: 'Custom Seed', you can write a string of text to use a different voice instead of the default ones. Clicking the Randomize button will write a random seed.
Using common first names tend to reliably influence the TTS AI's pitch and intonation.
For example, the seed Maria
results in a feminine-sounding voice.
Finally, you can change the Volume and Speed in which the voice lines are read using their respective sliders.
But keep in mind that the Volume setting is already defaulted to max, and the Speed settings won't affect downloaded sound files.
v2 Settings
As for v2, there are a lot more options and features to explore, as it's much more recent and robust TTS AI.
The first feature that sets it apart from v1 is that, aside from having a larger and more varied library of default voices, you can also save new voices you "craft" using seeds as "custom defaults", which you can easily select and edit later.
And furthermore, v2 has is a deeper seed system that allows you to mix and match different seeds using a special seed syntax to more finely control what you want your final voice to sound like.
Seedmixing
By starting your seed with the string seedmix:
, you can use the +
and -
signs to combine one or more seeds, or applying a negative effect on the final result based on one or more seed.
For example, if you want to mix the seeds Kayra and Clio, one that sounds masculine, and one that sounds feminine, but then you want to subtract some of that feminine factor by applying a negative Calliope to the mix, you write use the seed as such:
seedmix:Kayra+Clio-Calliope
Additionally, note the Style, Intonation and Cadence text under the selected seed. Those are there because in reality, TTS v2 voice are those three separate parameters, and each can use a different seed.
Style: Influences the overall tone of the voice, but tends to have a rather subtle effect. The easiest to notice effect it has, is making the final voice sound a bit deeper or higher depending on the seed.
Intonation: Determines what the voice itself sounds like. This is the parameter that most influences how the end result is going to sound, as a different intonation seed sounds like a different person speaking
Cadence: Adjusts how quickly or slowly the voice will read certain phonemes, changing how the voice will emphasize words. It's easier to notice its effect on questions or exclamations.
And as mentioned before, by using a special syntax you can use individual seeds for each of those parameters. You just need to separate them with the |
(pipe) character
Basically, you can add |style:
, |intonation:
, or |cadence:
strings to your seedmix with another seed combination after each one, to dictate that you want a different seed for each parameter of choice.
You can also simply have your seedmix accounting for each parameter individually, like for example: seedmix:|style:Kayra+Clio-Calliope|intonation:Krake+Euterpe-Sigurd|cadence:Genji+Snek
which will result in the following:
Something to note about the seedmix syntax: While TTS seeds usually support having empty spaces in them, seedmixes will NOT work with empty spaces. So using empty spaces in seeds is not recommended, even for single seeds, in case you want to use the seed in a seedmix later.
Finally, v2 also has the Volume and Speed sliders, but the same v1 limitations apply.
Goose Tip: Careful not to have your seedmix sum resulting as neutral or negative. For example: seedmix:Goose-Goose
It will NOT sound good!
Local TTS Settings
There aren't many settings to customize Local TTS voice-over compared to Streamed TTS.
All you can do is select what Local voice to use, test the voice in-browser, and adjust the sliders.
Local TTS has an unique Pitch slider, but you can't download voice files with Local TTS at all.