The dictionary I’ve been working on is really more of a 4 language lexicon at this point. For a while the nationals I’ve worked with have asked for audio of the analysis languages in the dictionary app. Before being willing to consider undertaking adding recordings for the analysis languages, I wanted to be sure I could actually get the national audio to stream/download from the internet consistently in the dictionary app. Through the process of getting that squared away in the last few weeks, I’ve become a lot more familiar with how Flex interacts with audio and more proficient in bulk edit tools both in and out of Flex.
I LOVE that FLEx has added in-program recording, and also analysis language audio. But I don’t particularly love the Flex-generated labeling system used, .wav files, or the quality of the recordings I get from the in-program recording. So I end up spending time post-processing and converting them that I don’t really want to.
I was ready to hang up the dictionary for a bit when a random thing lead me to think about how much more natural sounding AI is making text-to-speech. With all my knew bulk-replacing knowledge, I decided to see if there was a way to use text-to-speech to my advantage and bulk add audio to my analysis languages. I figured I’d only attempt it on the languages that have solid voice documentation/resources (English and French). And for what I’m going for, it went great.
Here was the general process:
- Copy my whole reversal index with the vernacular language turned off
- Paste it into an excel. Bulk make any pronunciation changes I want the TTS to read correctly.
- Paste the list into a TTS reader that can produce a download file. Play it and download the file.
- Throw the download into audacity and have it automatically separate the sounds with labels. This was the best part. Because the audio was computer generated, there were no background noises, hums, fluctuations in voice quality, or aspirations int he microphone. It was already essentially post processed and audacity could find the word breaks on the first try.
- Import labels into audacity that would match up with the filenames I would use in flex entries.
- Export the audio files at the hz, bitrate, and file type I want.
- Use the flex bulk edit tab to add the filenames to the analysis audio language column so the entries would recognize the new audio files.
- Throw the audio files into the Flex AudioVisual folder and it’s done.
There were definitely more specifics than this and it took a bit of time getting the process set up. But at this point, I can easily add 500+ entries in an hour.
The only real con in this method is that sometimes the pronunciation is awkward (about 25 out of 1000 entries). The longer the item of text, the more likely that could be to occur (like if you wanted to do this with example sentences). But this would also depend on your purposes and your wallet. I used cloudtts.com with microsoft based TTS voices which I can use without copyright issues. There are certainly other TTS services out there that are going to nail pronunciation way more naturally if you are wiling to search for them… or pay for them.
Anyway… just feels like this board is always filled with things people wished were part of DAB or specific problems someone is having… thought I’d post something that is just a random idea.