Background⌗

Because I didn’t learn phonetic symbols well, I don’t know how to pronounce many words, so I usually use macOS’s text-to-speech feature to practice pronunciation. However, since updating to macOS 12 — Monterey, pressing the shortcut key doesn’t seem to work. Initially, I thought the shortcut was disabled or conflicting with something else. But 5-7 seconds after pressing the shortcut, the computer finally produced the speech sound.

This indicated that the shortcut was working, but why was there such a long delay before hearing the sound? Driven by curiosity, I selected a word and used the mouse right-click option Speech > Start Speaking to read it, and I discovered something interesting.

Speech option

The voice sounds different
There’s no delay when using the mouse right-click Start Speaking option

So I suspected they might be calling different system services.

Investigation⌗

I found the issue in System Preferences > Accessibility > Spoken Content. If the System Voice is set to Siri Voice, it becomes very laggy. I suspect this might be because all pronunciations need to request data from the network.

Accessibility

Then I changed the System Voice to Daniel (United Kingdom) and tried Option + Esc again to see if it would respond promptly, and it did. Why choose Daniel? Perhaps because the pronunciation sounds more human-like.

Summary⌗

I noticed an interesting phenomenon: if I select Chinese characters and then use text-to-speech, the voice is not Daniel’s but changes to Ting-Ting (China mainland). I suspect this is because Daniel’s voice cannot recognize Chinese characters, so the system calls the first system voice for the corresponding language, which is Ting-Ting.

Similarly, if I set the System Voice to Ting-Ting and try to read English words, the system will use Alex (United States) voice.

However, regardless of what system voice I select in Accessibility > Spoken Content, when using the mouse right-click Start Speaking option to read text, whether it’s Chinese or English, the computer always uses the Ting-Ting voice.

I hope this is helpful, Happy hacking…

Exploring macOS’s Text-to-Speech Feature

Background⌗

Investigation⌗

Summary⌗