Zombie Theory
In the field of consciousnessology, a zombie is a hypothetical human being or some entity as similar as possible to a human being, which is not conscious. Philosophers of consciousness ask themselves questions about whether or not a zombie could do this thing or that thing, and "what it is like" to be a zombie.
The common-sense view of zombies is that consciousness is an essential component of the human mind/brain, and a human being without consciousness would suffer severe deficits in their information processing capabilities. This is fairly obvious to anyone who has ever watched a zombie movie.
The Perception of Consciousness in Others
Another common question that philosophers of consciousness ask is: even if I know that I am conscious and therefore not a zombie, how do I know that other people aren't zombies?
A common-sense answer to this question is that we perceive consciousness in other people indirectly from observation of their speech and their actions, which are such that they could not be produced by a non-conscious individual. Implicitly this assumes that we have some notion of which information processing capabilities are provided by our faculty of consciousness, and what sort of contribution these capabilities make to the decisions that we make about what to do and what to say.
But the super-stimulus theory of music raises a more radical possibility: that we do more than deduce the existence of consciousness from the response of an individual to their circumstances, that we actually directly and constantly monitor the level of activity of the conscious faculty within another person as they speak. In particular, the perceived level of musicality of speech provides a direct estimate of the current level of consciousness in the speaker.
Musicality and TTS
This has immediate consequences for anyone in the Text-To-Speech (TTS) business, especially if you want a software application to produce human-like speech that sounds completely natural. If the perception of varying levels of musicality is a component of speech perception, and a component that tells the listener about the conscious nature of the speaker, then any TTS system that fails to add appropriate levels of musicality to its output will not sound natural, and risks sounding like a non-conscious zombie.
How Do I Add Musicality to my TTS System?
The long answer is: go and read my book What is Music? Solving a Scientific Mystery. The short answer is:
- Consider the different perceived aspects of speech that musicality applies to.
- Determine the level of consciousness appropriate to different portions of the output speech.
- Modulate the musicality of each of the perceived aspects in proportion to the determined level of consciousness.
Of course it may be very difficult for a software application to determine an appropriate level of consciousness for a given utterance, unless the application is itself conscious. However, in practice, any kind of variation in musicality, applied simultaneously to different aspects, even if it is partly random, may be enough to make generated speech sound more natural.
A second difficulty is determining all the perceived aspects of speech that musicality applies to. Even if you read my book, you will only learn about some of them, and the failure to identify all of them is a major cause of the incompleteness of the super-stimulus theory. But in the pragmatic world of TTS, something may be better than nothing, so it's worth trying to apply musicality to as many aspects of generated speech as possible, to see if it helps at all.
Author's self-advertisement:
In my new book What is Music? Solving a Scientific Mystery,
I explain the super-stimulus theory of music, which is possibly the first scientific theory
of music to do all of:
- Explain music as an evolutionary adaptation, which benefits us now, and not just in some hypothetical prehistoric environment.
- Give detailed explanations for specific aspects of music, including scales, chords, regular beat and repetition.
- Provide a universal explanation for all aspects of music (based on geometrical patterns of neural activity in cortical maps, where the same rule applied to different cortical maps explains corresponding different aspects of music).
- Explain the emotional effect that music has.
- Explain the similarities and differences between music and speech.
- Explain all six symmetries of music perception.
