Using eSpeak to make a Raspberry Pi talk

Talking Raspberry Pi using eSpeak text to speech synthesiser
Published: by
Categories
Raspberry Pi
https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js (adsbygoogle = window.adsbygoogle || []).push({});

eSpeak is a compact, open source text to speech synthesiser for Windows and Linux and is a great piece of software to create a talking Raspberry Pi. It is able to synthesise speech from text in English and other languages (including Afrikaans).

This is an ongoing post. Please suggest corrections, explanations, etc. in the comment section at the bottom of this page.

At the time of writing, eSpeak is newer than some other software text to speech synthesizers and is free to use. Apart from its installation, it does not need the internet to operate and is fairly small in size. It is also very easy to get up and running with the default Raspbian settings, easy to use and is very customisable. eSpeak commands are triggered using the terminal and Bash.

On the down-side, eSpeak wails a little and sounds very muck like an alien or a robot (which can also be why you would like to incorporate it into a project). The text to speach (TTS) conversion is also not that accurate and due to bad pronunciation, some words might be difficult to hear.

Assumptions

For this post, a fully installed Raspberry Pi with the latest version of Raspbian was used. Default sound output from either the 3.5mm audio jack or HDMI cable need to be audible. During the installation process a connection to the internet will be required. Without a screen, keyboard and mouse, PuTTY and/or WinSCP can be used to do the testing and coding.

Sound output

eSpeak should work out of the box with Raspbian’s default sound settings. The only requiredment is to choose the desired sound output (i.e. HDMI or audio jack). To test the sound output on Raspbain, the following terminal command can be used:

aplay /usr/share/scratch/Media/Sounds/Vocals/Singer2.wav

You should hear a clip playing a short, “haaa” singing voice. If this clip can be heard, then eSpeak should also be heard.

Alternatively, eSpeak can be used with the -w command to write wave files containing the speech instead of playing it on the soound device. More on this below.

Installing eSpeak on a Raspberry Pi

While connected to the internet, the following terminal command is used to install eSpeak:

sudo apt-get install espeak

To see if eSpeak has been installed correctly use:

epseak -h

or

espeak --help

Using the eSpeak command

On the Raspberry Pi, eSpeak is used by using terminal commands. The eSpeak command can be used in s couple of ways. The simplest way to use eSpeak is by typing the desired speech in the form of text input (text within double quotes) after the espeak command:

espeak "Hello, world"

To read text from a text file, use:

espeak -f <text file>

By not entering any text after the eSpeak command the program will use text taken form stdin, but each line is treated as a separate sentence. I.e. by just typing:

espeak

followed by text on subsequent lines, each line is spoken when Enter/RETURN is pressed. Pressing Ctrl + Z will enter the command prompt cursor again.

eSpeak command options

eSpeak has plenty of handy command line options which will alter its default use. These include changing the accent/language, gendered tone, pitch, speed, etc. of the spoken voice. Command line options can be ‘stacked’ onto each other. To see the version of eSpeak and all the command line options:

epseak -h

or

espeak --help

The voice used by eSpeak is a determined by the voice accent/language file and a variant determining its tone (e.g. male or female). To change the voice accent/language, the correct voice file needs to be used. To see a list of the available voice files, the following command is used:

espeak --voices

To use the Afrikaans accent/language, for example, the following command option is used:

espeak -v af

where af is the corresponding Language in the list of available voices.

By default all languages/accents are generated in a male tone. With eSpeak the tone of a voice can also be changed using an additional variant property:

-v <voice filename>[+<variant>]

According to the official documentation, the variants are “+m1 +m2 +m3 +m4 +m5 +m6 for male voices and +f1 +f2 +f3 +f4 which simulate female voices by using higher pitches. Other variants are +croak and +whisper.”

To use the Afrikaans accent/language with a mid-tone female voice, for example, the following command option is used:

-v af+f2

Some of the other more useful command line options include:

-f <text file> speaks a text file.
--stdin takes the text input from stdin.
-a <integer> sets amplitude (volume) in a range of 0 to 200. The default is 100.
-p <integer> adjusts the pitch in a range of 0 to 99. The default is 50.
-s <integer> sets the speed in words-per-minute (approximate values for the default English voice, others may differ slightly). The default value is 170. Range 80 to 390.
-g <integer> inserts a pause between words. The value is the length of the pause, in units of 10 mS (at the default speed of 170 wpm).
-l <integer> inserts a line-break length, default value 0. If set, then lines which are shorter than this are treated as separate clauses and spoken separately with a break between them. This can be useful for some text files, but bad for others.
-w <wave file> writes the speech output to a file in WAV format, rather than speaking it.
-z removes the end-of-sentence pause which normally occurs at the end of the text.
--stdout writes the speech output to stdout as it is produced, rather than speaking it. The data starts with a WAV file header which indicates the sample rate and format of the data. The length field is set to zero because the length of the data is unknown when the header is produced.

See the eSpeak documentation for more information.

About the author
Renier busies himself with improving his English writing, creative web design and his websites, photoshopping, micro-electronics, multiple genres of music, superhero movies and badass series.
Behind the Scenes is a free, informative website. If you find value in any of our content, please consider making a donation to our cause.
Donate via PayPal
https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js (adsbygoogle = window.adsbygoogle || []).push({});

Save, share & Disqus

Use the buttons below, on the left or the bottom of this page to share this post. Your comment is important, but don't be a knob. Keep it constructive and polite.

Comment via Disqus

Disqus is a worldwide comment hosting service for web sites and online communities. This secure platform ensures a pleasant commenting environment which is manageable from one account. Use the Login button to sign up. [dcl-comments]

More eSpeak related posts