Welcome To AI news, AI trends website

How Deep Neural Networks Perceive Differently Than Humans: The Model Metamers Study

How Deep Neural Networks Perceive Differently Than Humans: The Model Metamers Study
How Deep Neural Networks Perceive Differently Than Humans: The Model Metamers Study

When your mother calls your name, you instantly recognize her voice regardless of volume or connection quality. Similarly, you can identify her face even from a distance or in poor lighting conditions. This remarkable ability to handle variations represents a fundamental aspect of human perception. However, humans are also susceptible to perceptual illusions, sometimes failing to distinguish between different sounds or images. While scientists have explained many of these illusions, our understanding of the invariances in our auditory and visual systems remains incomplete.

Deep neural networks have demonstrated impressive robustness in speech recognition and image classification tasks, showing resilience to variations in input stimuli. But are the invariances learned by these AI models similar to those in human perceptual systems? According to MIT researchers, the answer is no. These findings were presented at the 2019 Conference on Neural Information Processing Systems, revealing significant differences between how AI systems and humans process sensory information.

The researchers expanded on a classical concept called "metamers"—physically distinct stimuli that produce identical perceptual experiences. The most well-known examples occur in color vision, where humans have three types of cone cells in their retinas. Any single wavelength of light can be matched exactly by combining specific intensities of red, green, and blue light—the basis for all modern electronic displays. Similar phenomena exist in both visual and auditory domains. For instance, we might perceive different peripheral visual scenes as identical when focusing on an object, or find two insect swarms to sound alike despite different acoustic details. These metamers provide valuable insights into perceptual mechanisms and help constrain models of human sensory systems.

In this groundbreaking research, the team selected natural images and spoken word samples from standard databases. They then synthesized new sounds and images designed to be classified identically by deep neural networks as their natural counterparts. This approach created physically distinct stimuli that AI systems grouped together, regardless of human perception. The researchers termed these "model metamers"—a novel generalization that swaps human perceivers with computer models. They then tested whether humans could identify these synthesized words and images.

"Participants heard a short speech segment and had to identify which word was in the middle from a list of options. While this task was easy with natural audio, humans struggled to recognize many of the model metamers," explains lead author Jenelle Feather, a graduate student in the MIT Department of Brain and Cognitive Sciences and member of the Center for Brains, Minds, and Machines. Humans typically wouldn't categorize these synthetic stimuli as the same as a spoken word like "bird" or an image of a bird. In fact, model metamers generated to match the deepest layers of neural networks were often completely unrecognizable to human subjects.

Josh McDermott, associate professor in BCS and investigator in CBMM, explains the significance: "The fundamental principle is that if we have an accurate model of human perception, say for speech recognition, then when two sounds are classified as identical by the model, human listeners should also perceive them as the same. If humans instead perceive these stimuli as different, it clearly indicates that our model's representations don't match human perception."

Examples of these model metamer stimuli are available in the video below.

Feather and McDermott were joined by co-authors Alex Durango, a post-baccalaureate student, and Ray Gonzalez, a research assistant, both from BCS.

This research relates to another well-publicized limitation of deep networks: adversarial examples (as discussed in "Why did my classifier just mistake a turtle for a rifle?"). These are stimuli that appear similar to humans but are deliberately misclassified by AI models. They complement the current study's stimuli, which sound or look different to humans but are classified together by AI systems. The vulnerabilities exposed by adversarial attacks are widely recognized—from facial recognition errors to autonomous vehicles failing to detect pedestrians.

The significance of this research extends beyond identifying deep network limitations. While adversarial examples highlight differences between AI and human perception, the model metamers represent a more fundamental model failure—demonstrating that stimuli classified identically by neural networks can produce dramatically different human experiences.

The team also developed methods to modify neural networks to generate metamers that appeared more plausible to humans. As McDermott notes, "This gives us hope that we may eventually develop models that pass the metamer test and better capture human invariances."

"Model metamers reveal a significant failure of current neural networks to replicate the invariances in human visual and auditory systems," says Feather. "We believe this work provides a valuable behavioral benchmark to improve model representations and create more accurate models of human sensory processing."

tags:deep neural networks vs human perception AI perception differences from humans model metamers in artificial intelligence how AI processes visual information differently limitations of deep learning perception
This article is sourced from the internet,Does not represent the position of this website
justmysocks
justmysocks