An Audiobook Version of Blues Discovery: Audible’s Interesting Beta
I just published the audiobook version of my book Blues Discovery: Reaching Across the Divide (Rev. Ed., Dost Publishing, 2023). This audiobook was created with Amazon’s new text-to-voice software, used by Audible to allow authors to convert their Kindle books to Audible books.
For those of us who have been following the rather sudden explosion of interest in artificial intelligence since the emergence of ChatGPT and other LLMs, it’s interesting to see how these advances in AI result in changes in our daily lives. I know that there are various sorts of AIs in my phone. I know that there are AIs operating as recommendation tools in websites I use—which seem to be about as sophisticated as saying, “If you liked The Maltese Falcon, You might also like Casablanca.” (Thanks.) I’ve used some voice-to-text software when I needed to convert a podcast to text. But I can’t say that all of the advances in AI have had much of an impact on my otherwise—I mostly read about it and have talked to experts on a podcast I used to do for The Charleston Conference.
So when Amazon contacted me (and probably a million other authors) to try the beta version of their new “virtual voice narration” software, I was curious to see how it would work. Since I can’t afford to create an audiobook otherwise, this is basically my only chance. Of course, as a bit of a technophobe, I was worried that using the software would be too hard for me. in fact, it was quite simple and the results are really rather good.
Basically, all you have to do is go to the KDP Dashboard, click on the link to the virtual voice narration software, and go to the setup page. There you choose the voice you’d like, which is basically a variety of American accented male or female voices or two British female voices—for some reason they don’t have British male voices. Then you go into the text, press go, and listen to the voice as you watch the text. If the voice makes a mistake, you just pause the process, highlight the word or words in question, click either on a button for inserting a pause (short, medium, or long) or on a button to correct the pronunciation, and then proceed again in the text. There’s not a lot more to it.
So, what do you correct? Well, the virtual voice narration doesn’t always pause at the end of a paragraph or section, so instead of allowing the narration to run the paragraphs or sections together, you add a pause.
More complex is the pronunciation. The virtual voice doesn’t always know what to do with things like the names of radio stations and acronyms. The AI already knew that HUAC (House Unamerican Activities Committee) is pronounced “who ak,” but Blues Discovery mentions a variety of old R&B stations, such as WAOK or WEEK, and the narrator wasn’t always sure what to do. The way to correct it is to highlight WAOK, go to “How it should be pronounced,” and then write in “double you ai oh kay.” The narrator then pronounces it correctly. WEEK was a bit odd because I corrected it to “double you ee ee kay” and it also, later in the text converted the word “week” to the same pronunciation. That, however, is operator error, not a software problem.
There were also a couple of words I just couldn’t figure out how to represent phonetically, such as a coupe of French words, and was forced just to leave as is.
More difficult yet is representing language that is simply not correct standard English. When Will Shade sent Roger Brown a letter in 1962, the letter was written to the best of Shade’s ability, but his spelling, punctuation, and phrasing were non-standard and the virtual narrator couldn’t pronounce them and I couldn’t see how to interpret.
But the text-to-voice was a very impressive beta tool, as far as I’m concerned. The basic voice sounds like a person, not like a monotone robot, and while it isn’t perfect in intonation, phrasing, etc., it’s much, much better than similar versions I have heard in the past. Have a listen here at Amazon and see what you think: Blues Discovery.