New Times,
New Thinking.

  1. Science & Tech
16 May 2018updated 09 Sep 2021 4:08pm

The “Yanny or Laurel” debate reveals more about our brains than our ears

The video clip circulating the web is the audio equivalent of an optical illusion. 

By Suzy J Styles

Yanny… or laurel? This is the question circulating the web, after a video clip emerged of a strange word being vocalised. Some listeners heard the first word, others the second. So should you be distrusting your own hearing? 

The human voice is really a fancy instrument. We create sounds in the throat by pushing air from the lungs past the vocal cords. These vibrations then echo up and out of the mouth, through a bent pipe (the vocal tract). Whenever we change the shape of this pipe, it changes the way the vibrations echo. 

You can see the anatomy at work in a video showing a real-time MRI of someone speaking German. The shape changes can be measured in energy bands called formants (which are noted in the style F1, F2, F3). So speech contains information about the position and motion of all three parts of the pipe. The evidence is right there in the sound waves – if you know where to look.

Image credit: Suzy Styles

Here’s a spectrogram showing the energy in the Yanny/Laurel clip first shared by Reddit user RolandCamry. Let’s look at the vowels: We expect to see just three dark bands in the lower part (F1, F2, F3), but there is a jumble of overlapping lines.

Select and enter your email address The New Statesman's quick and essential guide to the news and politics of the day. The best way to sign up for Morning Call is via morningcall.substack.com Your weekly guide to the best writing on ideas, politics, books and culture every Saturday. The best way to sign up for The Saturday Read is via saturdayread.substack.com
Visit our privacy Policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications.
THANK YOU

In fact, this mess holds the clue to why we are hearing different things. In red, are three long wavy lines, with a big gap between the first and the second. Their location and shape is just what we would expect if the speech was mostly vowels, and the particular vowels were “ee” “ah” “ee” as in Yanny.

In blue, the lines are short, and there are three dark bars just in the middle. This is what we would expect if there was an “oh” vowel in the middle, surrounded by quieter consonants like “l” and “r”, as in Laurel. So the audio contains both signals, but the information has to be combined in different ways.

Image credit: Suzy Styles

In short, the video clip is the audio equivalent of one of those line drawings that is both a face and a vase, but can’t be both at the same time. The signal is a bit vague in places, which helps this audio works its magic: the human brain is extremely good at filling in the missing information, so it reconstructs the parts of the speech it can’t hear accurately.

We can also see that the Yanny pattern includes more energy at high frequencies, and the Laurel pattern is stronger at low frequencies. This is probably why, when people switch devices, they sometimes hear the other name, since each device will “perform” the frequencies differently. Importantly, each time the audio is heard, the brain has to decide which is the most reliable part of the signal – which “cues” to follow. This could be why some people simply can’t hear the other name – their brain prefers a one set of cues over the other. When people swap, their brain has switched cues. So relax #TeamLaurel, it doesn’t mean you are losing your hearing for high-notes just yet – it just means your brain has decided that the consonants are more reliable. And #TeamYanny, your brain favours the vowels.

It might be possible for us to hear different things, but it isn’t possible for a human speaker to produce this pattern of noises – our speech-pipes simply can’t be in two places at once. We might be tempted to think that this audio clip was designed to trick us (I initially did). However, it seems that it might just be the crummy speech synthesis on this online dictionary. This in turn is most likely the product of a speech algorithm, which combines sounds in a way that makes sense to a computer, but can’t actually be done by a human throat. So there you have it – speech is a lot more complicated than you think. It’s no surprise that human babies take as long as they do to learn it. Just remember that, next time you are agonising over whether you’re the only one on Team Yanny. 

Suzy J Styles is an assistant professor at Nanyang Technological University, Singapore

Content from our partners
The Circular Economy: Green growth, jobs and resilience
Water security: is it a government priority?
Defend, deter, protect: the critical capabilities we rely on