Yanny… or laurel? This is the question circulating the web, after a video clip emerged of a strange word being vocalised. Some listeners heard the first word, others the second. So should you be distrusting your own hearing?
The human voice is really a fancy instrument. We create sounds in the throat by pushing air from the lungs past the vocal cords. These vibrations then echo up and out of the mouth, through a bent pipe (the vocal tract). Whenever we change the shape of this pipe, it changes the way the vibrations echo.
You can see the anatomy at work in a video showing a real-time MRI of someone speaking German. The shape changes can be measured in energy bands called formants (which are noted in the style F1, F2, F3). So speech contains information about the position and motion of all three parts of the pipe. The evidence is right there in the sound waves – if you know where to look.
Image credit: Suzy Styles
Here’s a spectrogram showing the energy in the Yanny/Laurel clip first shared by Reddit user RolandCamry. Let’s look at the vowels: We expect to see just three dark bands in the lower part (F1, F2, F3), but there is a jumble of overlapping lines.
In fact, this mess holds the clue to why we are hearing different things. In red, are three long wavy lines, with a big gap between the first and the second. Their location and shape is just what we would expect if the speech was mostly vowels, and the particular vowels were “ee” “ah” “ee” as in Yanny.
In blue, the lines are short, and there are three dark bars just in the middle. This is what we would expect if there was an “oh” vowel in the middle, surrounded by quieter consonants like “l” and “r”, as in Laurel. So the audio contains both signals, but the information has to be combined in different ways.
Image credit: Suzy Styles
In short, the video clip is the audio equivalent of one of those line drawings that is both a face and a vase, but can’t be both at the same time. The signal is a bit vague in places, which helps this audio works its magic: the human brain is extremely good at filling in the missing information, so it reconstructs the parts of the speech it can’t hear accurately.
We can also see that the Yanny pattern includes more energy at high frequencies, and the Laurel pattern is stronger at low frequencies. This is probably why, when people switch devices, they sometimes hear the other name, since each device will “perform” the frequencies differently. Importantly, each time the audio is heard, the brain has to decide which is the most reliable part of the signal – which “cues” to follow. This could be why some people simply can’t hear the other name – their brain prefers a one set of cues over the other. When people swap, their brain has switched cues. So relax #TeamLaurel, it doesn’t mean you are losing your hearing for high-notes just yet – it just means your brain has decided that the consonants are more reliable. And #TeamYanny, your brain favours the vowels.
It might be possible for us to hear different things, but it isn’t possible for a human speaker to produce this pattern of noises – our speech-pipes simply can’t be in two places at once. We might be tempted to think that this audio clip was designed to trick us (I initially did). However, it seems that it might just be the crummy speech synthesis on this online dictionary. This in turn is most likely the product of a speech algorithm, which combines sounds in a way that makes sense to a computer, but can’t actually be done by a human throat. So there you have it – speech is a lot more complicated than you think. It’s no surprise that human babies take as long as they do to learn it. Just remember that, next time you are agonising over whether you’re the only one on Team Yanny.
Suzy J Styles is an assistant professor at Nanyang Technological University, Singapore