Groundbreaking research conducted by MIT neuroscientists reveals that our natural auditory environment has fundamentally shaped how humans perceive sound, optimizing our hearing for the specific types of sounds we regularly encounter in daily life.
In a revolutionary study published on December 14 in the prestigious journal Nature Communications, a team of researchers led by McGovern Institute for Brain Research's associate investigator Josh McDermott employed advanced computational modeling to investigate the factors that influence human pitch perception. Their AI-based model demonstrated pitch perception remarkably similar to humans—but exclusively when trained with music, speech, or other naturalistic sounds.
Human capacity to perceive pitch—essentially recognizing how frequently a sound vibrates—adds melody to music and subtle meaning to spoken communication. Despite being arguably the most thoroughly examined aspect of human hearing, scientists continue to debate which factors determine pitch perception properties, and why it's more refined for certain sound types than others. McDermott, who also serves as an associate professor in MIT's Department of Brain and Cognitive Sciences and as an Investigator with the Center for Brains, Minds, and Machines (CBMM), focuses particularly on understanding how our nervous system processes pitch because cochlear implants—devices that transmit electrical sound signals to the brains of individuals with profound hearing loss—fail to effectively replicate this crucial aspect of human hearing.
“Cochlear implants can reasonably help people comprehend speech, particularly in quiet environments. However, they struggle to accurately reproduce pitch perception,” explains Mark Saddler, a graduate student and CBMM researcher who co-directed the project and is an inaugural graduate fellow of the K. Lisa Yang Integrative Computational Neuroscience Center. “Understanding the detailed mechanisms of pitch perception in people with normal hearing is crucial for developing ways to artificially replicate this ability in prosthetic devices.”
Revolutionizing Artificial Hearing Technology
Pitch perception begins in the cochlea, the spiral-shaped structure in the inner ear where sound vibrations transform into electrical signals transmitted to the brain via the auditory nerve. The cochlea's structure and functionality significantly influence our hearing capabilities. While this theory hasn't been experimentally testable until now, McDermott's team hypothesized that our “auditory diet”—the sounds we regularly hear—might also shape our hearing abilities.
To investigate how both our ears and environment influence pitch perception, McDermott, Saddler, and Research Assistant Ray Gonzalez developed an innovative computer model called a deep neural network. Neural networks represent a machine learning approach widely utilized in automatic speech recognition and other artificial intelligence applications. Although artificial neural networks roughly resemble brain neuron connectivity, engineering models don't actually perceive sound like humans do—prompting the team to create a new model to accurately replicate human pitch perception. Their methodology combined an artificial neural network with an existing mammalian ear model, merging machine learning capabilities with biological insights. “These new machine-learning models represent the first that can be trained to perform complex auditory tasks and execute them effectively at human performance levels,” Saddler explains.
The researchers trained the neural network to estimate pitch by instructing it to identify the repetition rate of sounds in a training dataset. This approach allowed them to modify the parameters under which pitch perception developed. They could manipulate both the sound types presented to the model and the properties of the ear that processed those sounds before transmitting them to the neural network.
When the model was trained using sounds significant to humans, such as speech and music, it learned to estimate pitch similarly to humans. “We successfully replicated numerous characteristics of human perception... suggesting that the model uses similar cues from sounds and cochlear representations to perform the task,” Saddler notes.
However, when the model was trained using more artificial sounds or without background noise, its behavior diverged significantly. For instance, Saddler explains, “When optimized for an idealized world without competing noise sources, the model learns a pitch strategy markedly different from humans, suggesting that the human pitch system evolved to handle situations where noise sometimes obscures parts of the sound.”
The team also discovered that the timing of nerve signals originating in the cochlea is critical to pitch perception. In a healthy cochlea, McDermott explains, nerve cells fire precisely in sync with sound vibrations reaching the inner ear. When researchers altered this relationship in their model—causing nerve signal timing to correlate less precisely with incoming sound vibrations—pitch perception deviated from normal human hearing.
McDermott emphasizes that these findings will be crucial as researchers develop improved cochlear implants. “The research strongly indicates that for cochlear implants to produce normal pitch perception, there must be a way to reproduce the fine-grained timing information in the auditory nerve,” he states. “Currently, they don't achieve this, and while technical challenges exist to making this happen—the modeling results clearly suggest this is the necessary path forward.”