Your capacity for object identification remains truly extraordinary. Even when encountering a mug under atypical lighting conditions or from unconventional angles, your cognitive functions can still accurately identify it as a mug. This precise object identification represents a crucial objective for artificial intelligence developers, particularly those enhancing autonomous vehicle navigation systems.
Although replicating primate object recognition within the visual cortex has transformed artificial visual recognition technologies, contemporary deep learning frameworks remain overly simplified, struggling to identify certain objects that primates—including humans—effortlessly recognize.
According to research published in Nature Neuroscience, McGovern Institute researcher James DiCarlo and his team have discovered evidence demonstrating that feedback mechanisms enhance the recognition of challenging objects within the primate brain. Furthermore, incorporating feedback circuitry similarly boosts the performance of artificial neural network systems utilized in vision applications.
Deep convolutional neural networks (DCNN) currently represent the most effective models for precisely identifying objects within rapid timeframes (under 100 milliseconds), featuring a general architecture inspired by the primate ventral visual stream—cortical regions that progressively construct accessible and refined representations of observed objects. Nevertheless, most DCNNs remain comparatively simplistic when contrasted with the primate ventral stream.
"For an extended period, we lacked comprehensive model-based understanding. Consequently, our field initiated this pursuit by modeling visual recognition as a feedforward process," explains principal investigator DiCarlo, who additionally serves as head of MIT's Department of Brain and Cognitive Sciences and research co-leader at the Center for Brains, Minds, and Machines (CBMM). "However, we recognize the existence of recurrent anatomical connections within brain regions associated with object recognition."
Consider feedforward DCNNs—and the initial object-capturing segment of the visual system—as a subway line progressing forward through sequential stations. The additional, recurrent brain networks instead resemble interconnected streets above ground, characterized by their multidirectional nature. Since the brain requires merely approximately 200 milliseconds to accurately identify an object, the potential role of these recurrent interconnections in core object recognition remained uncertain. Perhaps these recurrent connections primarily serve to maintain the visual system's calibration over extended periods. For instance, the return channels of streets gradually clear water and debris but aren't essential for rapidly transporting individuals across town. DiCarlo, alongside lead author and CBMM postdoctoral fellow Kohitij Kar, endeavored to investigate whether a subtle role of recurrent operations in rapid visual object recognition had been overlooked.
Challenging Recognition Tasks
The researchers initially needed to identify objects that primates effortlessly decode but artificial systems find challenging. Rather than attempting to speculate why deep learning struggled with object identification (perhaps due to image clutter or misleading shadows?), the authors adopted an unbiased approach that ultimately proved critical.
Kar further explains that "we discovered AI models don't actually encounter difficulties with every image featuring occluded or cluttered objects. Humans attempting to predict why AI models faced challenges actually impeded our progress."
Instead, the researchers presented the deep learning system alongside monkeys and humans with images, focusing on 'challenge images' where primates could readily recognize objects but feedforward DCNNs struggled. When they—and others—incorporated appropriate recurrent processing into these DCNNs, object recognition within challenge images suddenly became significantly easier.
Processing Dynamics
Kar employed neural recording techniques with exceptional spatial and temporal precision to determine whether these images were genuinely as effortless for primates as they appeared. Remarkably, they discovered that although challenge images initially seemed trivial to the human brain, they actually necessitate additional neural processing time (approximately 30 milliseconds more), suggesting that recurrent loops operate within our brains as well.
"What the computer vision community has recently accomplished by stacking increasingly numerous layers onto artificial neural networks, evolution has achieved through a brain architecture featuring recurrent connections," notes Kar.
Diane Beck, psychology professor and co-chair of the Intelligent Systems Theme at the Beckman Institute (and not involved in the study), provides additional context. "Since entirely feedforward deep convolutional networks now effectively predict primate brain activity, questions emerged regarding feedback connections' role in the primate brain. This research demonstrates that feedback connections are indeed likely playing a role in object recognition after all."
What implications does this hold for autonomous vehicles? It indicates that deep learning architectures involved in object recognition require recurrent components to match primate brain capabilities, while also demonstrating how to operationalize this process for the next generation of intelligent machines.
"Recurrent models provide predictions of neural activity and behavior across time," states Kar. "We may now be able to model more complex tasks. Perhaps eventually, these systems will not only recognize objects like people but also perform cognitive tasks that the human brain manages effortlessly, such as understanding others' emotions."
This research received support from the Office of Naval Research and the Center for Brains, Minds, and Machines through the National Science Foundation.