Voice banking tutorial: Acapela and ModelTalker

This 2023 updated voice banking tutorial will teach you how to make a digital copy of your voice. Our voice is such a key part of our identity, that losing it can be very traumatic. In fact, anyone can chose to create a digital copy of their voice for later use. Currently there are two methods for creating a digital copy: voice banking and message banking.

This post was inspired by an ATiA 2019 presentation by John Costello and Meghan O’Brien on “Speech Synthesis, Voice/Message Banking: PAST, CURRENT, and FUTURE trends.” John and Meghan are both speech-language pathologists (SLPs) at Boston Children’s Hospital. John is the director of the Augmentative Communication Services.

Free DIRECT download: Updated guide to voice banking (patient handout). (Email subscribers get free access to all the resources in the Free Subscription Library.)

Outline:

Tell me what voice banking is

Voice banking describes the process of creating a synthetic voice. Current technology requires recording hundreds or thousands of phrases and sentences. These recordings are fed through voice analysis software to obtain a synthetic voice. The synthetic voice can be added to any compatible speech-generating device, which will “speak” using that voice. (Check out a new app that makes voice banking easy.)

return to top

Who’s it for?

Anyone can bank their voice as insurance against a future loss. Sometimes a person can lose their voice very quickly, which may occur as a result of a stroke or surgery for laryngeal cancer. Other times, a person may lose their voice more slowly, such as with Parkinson’s disease or dementia. A motor neuron disease like ALS can go either way.

return to top

What are the pros of voice banking?

By creating a synthetic voice, the person can continue to communicate in a voice approximating their own. In addition, the user can type brand-new sentences they’ve never said before and the speech-generating communication device will “speak” the sentence in their own synthetic voice. This means the person doesn’t have to record every possible sentence they may want to say.

The technology is improving, too. It used to take lot more work to digitally capture a voice. Now, a synthetic voice can be created with as few as 50 sentences (Acapela). And the process is more robust, meaning that even if someone has a noticeable voice impairment, it may still be possible to develop a decent synthetic copy.

Even if someone has already lost their voice, a family member or friend can create a voice for their loved one. This voice is likely to be more similar to the person’s lost voice than a generic voice that comes with the speech-generating device.

return to top

What are the cons of voice banking?

Well, the voice is synthetic, which means while it will sound close to the person’s natural voice, it will still sound somewhat artificial.

In addition, we don’t currently have the technology to replicate the full range of emotion and intonation in the synthetic voice. So sentences will tend to sound mechanic. The sentence “I love you” will have the same emotional quality as the sentence “I want you to leave.”

However, Acapela offers you the ability to upload custom messages that you record. If you upload the recordings before your synthetic voice is created, they will be included in training your voice. Then if you use the same words with your AAC device, those words will be “spoken” in your own words.

And finally, the synthetic voice won’t necessarily say the names of people and places correctly.

return to top

Now tell me how to do voice banking

I’m familiar with three companies that offer voice banking services, although there are likely others. This tutorial discusses Acapela’s My-Own-Voice (MOV) and ModelTalker. I wrote about the newer service The Voice Keeper in a different article. Anyone can do voice banking in their own home. A person may have better results with the support of a speech-language pathologist.

return to top

Acapela’s My Own Voice

Acapela’s My-Own-Voice (MOV) offers two types of technology to synthesize a voice. The original technology is “unit selection” and requires 1500 recorded sentences. The new technology uses Deep Neural Networks and may require as few as 50 sentences. MOV supports more than 20 languages.

return to top

How much does it cost?

It’s free to record and create the synthetic voice so that you can test how it sounds. There’s no charge until you actually download the synthesized voice to use.

When it’s time to download the voice, you have a choice between subscribing for $99/year or buying the full version for $999.

Many NPO, NGO, charities and organizations, including Team Gleason and the MND Association, offer funding support for people who qualify. To request funding, simply select the funding choice from a drop-down box on Acapela’s registration form.

return to top

So, what equipment will I need?

You’ll need a computer, a good internet connection, and a good quality directional microphone. You should use a head-mounted microphone. Ideally the microphone would have noise cancellation and be able to record from 80 Hz to 15,000 Hz in frequency. They recommend using the Sennheiser GSP 300 Headset*, or something similar.

*This is an Amazon affiliate link. As an Amazon associate, I may earn a small commission on qualifying purchases. There is no extra charge to you, and it will help keep Eat, Speak, & Think sustainable.

return to top

How exactly do I do it?

First, if you’ve applied for funding support, you’ll need to wait to be contacted by the organization. Once approved, they’ll provide you with your log-in credentials. If you’re not applying for funding, Acapela will send you your log-in information.

Once you have your log-in information, click on the small “Log in” icon in the top right corner. After entering your information, you’ll arrive at your home page. Check that you’re recording in the language that you want.

Next, select “Record” from the top menu and launch the Acapela Online Recorder. Then, enter your log-in information.

Select “microphone” as your recording device. Don’t use the sound card on your PC. You may have to go into your PC’s audio recording settings to select the headphone/microphone you’re using.

Next, you’ll be prompted to calibrate your microphone. Then you can start recording. I suggest having the sentences read to you first, so that you can get a feel for the pace and intonation. It’s important to follow any punctuation exactly.

Once you’re finished recording, Acapela will create your synthesized voice, which you can download and add to any speech-generating communication device.

return to top

ModelTalker

ModelTalker was created by the Nemours Speech Research Laboratory and their primary goal is to provide realistic-sounding voices for children, especially those that have never had a voice of their own. But anyone can use their service.

return to top

How much does it cost?

It’s free to register and create your voice. As Nemours is a non-profit pediatric healthcare system, they do charge $100 to download your ModelTalker voice. The fee is to help cover the costs associated with creating and storing the voice, as well as providing customer support.

Several organizations, such as Team Gleason, offer to pay the fee for qualified individuals. You can access the list of supporting organizations after registering.

The fee is waived for people registering to assist others with voice banking, as well as for those who have unimpaired speech and would like to donate their voice.

return to top

What equipment do I need?

You can use a PC or a laptop with a good-quality head-mounted microphone.Here are ModelTalker’s suggested headphones. They suggest something like the Sennheiser PC 36* as a lower-cost option.

For a higher-quality microphone, ModelTalker recommends the Jabra UC Voice 550 Duo*.

*These are Amazon affiliate links. As an Amazon associate, I may earn a small commission on qualifying purchases. There is no extra charge to you, and it will help keep Eat, Speak, & Think sustainable.

ModelTalker has a web-based recording tool which only works with Chrome, or you can download a Windows program called MTVR. They are working on an iOS app.

return to top

What’s the process?

Check out the training videos for more details, but here is the basic process:

After calibrating your microphone, you’ll record 10 screening sentences for the ModelTalker team to examine. They’ll let you know if it looks good, or they’ll make suggestions for how you can improve your recordings.

Next, you’ll record up to 3155 pre-determined sentences. Each sentence is recorded into a single file, and you may have to re-record sentences. The total amount of recorded speech is one hour, however it takes much longer to get those recordings. Expect to spend at least six hours, and it may take much longer.

The company can create a voice with as few as 250 sentences, but they strongly encourage you to record at least 400. The chances of an acceptable voice increase when more sentences are recorded.

ModelTalker doesn’t check your recordings along the way, so be sure to listen to at least the first few recordings at each sitting.

ModelTalker allows you to record names or other words that are important to you, but even these messages will be pronounced with the synthetic voice. Each word will be recorded in four different sentences that they design.

Once the recordings are complete, ModelTalker creates a few synthetic voice candidates. You can listen to these voices and decide which one you want. If you’re not satisfied with your options, ModelTalker will work with you to see if the quality can be improved.

Once you decide, your voice is ready for download within a week. ModelTalker provides the software to download the voice to your computer or speech-generating communication device.

return to top

How can I get the best quality possible?

Try to use the best head-mounted microphone possible. If you have access to a recording booth, that would be ideal. But if not, don’t worry. Just find a quiet place to record, preferably a small room with carpeting and curtains. If the sound quality isn’t good enough, you could place a blanket over a hardwood floor, place a screen around you, or even drape a blanket around you.

Eliminate as much background noise as possible. For instance, turn off the air conditioner and make sure there aren’t any ticking clocks. Close the windows and doors. Turn off notifications on your phone.

ModelTalker emphasizes that consistency is very important for the outcome of the voice. Record in the same place and with the same equipment each time. Try to record at the same time of day. Also, your speech should be at the same level of loudness, the same rate of speech, and the same “quality” each time.

By quality, they’re referring to how the sound of the voice changes depending on whether we’re relaxed, or excited, or angry. While synthesized voices sound best when the person sounds relaxed, it’s more important that the overall quality is consistent across recordings.

If your speech is pretty normal, you can complete the recordings in three or four days. If your speech is starting to be affected, then it’s probably a good idea to limit your recording to the times of day when your speech is strongest.

It’s still possible to obtain a synthetic voice if you’re not able to record the full set of sentences. Both companies appear to be very supportive, as they want you to succeed. Don’t be shy about reaching out to them with any questions or concerns.

Here are the tips recommended by Acapela:

return to top

return to top

Please share this voice banking tutorial

Although voice banking is getting easier, only a small fraction of people who’d benefit are taking advantage. If you’re an SLP and you’re not helping your patients to bank their voice, I’d strongly encourage you to try it yourself. Once you do it, you’ll feel much more comfortable helping others.

Free DIRECT download: Updated guide to voice banking (patient handout). (Email subscribers get free access to all the resources in the Free Subscription Library.)

return to top

Website | + posts

Lisa earned her M.A. in Speech-Language Pathology from the University of Maryland, College Park and her M.A. in Linguistics from the University of California, San Diego.

She participated in research studies with the National Institute on Deafness and other Communication Disorders (NIDCD) and the University of Maryland in the areas of aphasia, Parkinson’s Disease, epilepsy, and fluency disorders.

Lisa has been working as a medical speech-language pathologist since 2008. She has a strong passion for evidence-based assessment and therapy, having earned five ASHA Awards for Professional Participation in Continuing Education.

She launched EatSpeakThink.com in June 2018 to help other clinicians be more successful working in home health, as well as to provide strategies and resources to people living with problems eating, speaking, or thinking.

3 Comments

  1. anna eppes said:

    how do i registe fot modeltalkerr

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.