Welcome To AI news, AI trends website

Revolutionizing AI: Teaching Machines to Understand Physics Like Infants

Revolutionizing AI: Teaching Machines to Understand Physics Like Infants
Revolutionizing AI: Teaching Machines to Understand Physics Like Infants

Humans naturally develop an early understanding of physical laws, with infants demonstrating expectations about object movement and interaction. When presented with physically impossible scenarios, such as objects disappearing in magic tricks, infants exhibit clear signs of surprise, indicating their innate grasp of physical reality.

Building on this concept, MIT researchers have developed an innovative AI model that demonstrates an understanding of basic "intuitive physics" similar to human cognition. This breakthrough in artificial intelligence infant cognition research could lead to smarter AI systems while simultaneously providing insights into how human infants perceive and understand the physical world.

The model, named ADEPT (AI physics engine for object tracking), observes objects moving within a scene and generates predictions about their behavior based on fundamental physical laws. As it tracks these objects, the model produces a signal at each video frame that corresponds to a level of "surprise" – the larger the signal, the greater the surprise when objects violate expected physical behaviors, such as vanishing or teleporting across the scene.

When tested with videos showing objects moving in both physically plausible and implausible ways, the ADEPT AI model physics engine registered surprise levels that closely matched those reported by human observers who watched the same videos. This remarkable similarity demonstrates the model's effectiveness in AI intuitive physics understanding.

"By three months of age, infants already understand that objects don't simply appear and disappear, can't move through each other, and don't teleport from one location to another," explains Kevin A. Smith, lead researcher from the Department of Brain and Cognitive Sciences and a member of the Center for Brains, Minds, and Machines. "Our goal was to capture and formalize this knowledge to incorporate infant cognition into artificial intelligence agents. We're now approaching near-human-like capabilities in how models can distinguish between basic implausible and plausible scenarios."

The research team includes co-first authors Lingjie Mei from the Department of Electrical Engineering and Computer Science, and BCS research scientist Shunyu Yao, along with Jiajun Wu, Elizabeth Spelke, Joshua B. Tenenbaum, and Tomer D. Ullman.

Advanced Physics Understanding in AI

ADEPT relies on two core components: an "inverse graphics" module that captures object representations from raw images, and a "physics engine" that predicts future object behaviors based on a distribution of possibilities. This combination enables sophisticated machine learning physical laws comprehension.

The inverse graphics component extracts crucial object information – including shape, pose, and velocity – from pixel inputs. This module processes video frames as images and uses inverse graphics to extract information about objects in the scene. Importantly, it doesn't get overwhelmed by unnecessary details. ADEPT requires only approximate geometry of each shape to function effectively, which helps the model generalize predictions to new objects beyond its training data.

"It doesn't matter if an object is a rectangle or circle, or whether it's a truck or a duck," Smith notes. "ADEPT simply recognizes an object with a specific position moving in a particular way to make predictions. Interestingly, young infants also don't seem overly concerned with properties like shape when making physical predictions."

These simplified object descriptions are fed into a physics engine – software that simulates the behavior of physical systems, commonly used in films, video games, and computer graphics. The researchers' physics engine "projects the objects forward in time," creating a range of predictions or a "belief distribution" for what will happen to those objects in the next frame.

Next, the model observes the actual next frame, capturing object representations and aligning them with one of the predicted representations from its belief distribution. If an object follows the laws of physics, there will be minimal mismatch between the two representations. However, if an object behaves implausibly – for instance, vanishing from behind a wall – there will be a significant discrepancy.

ADEPT then resamples from its belief distribution and notes a very low probability that the object simply vanished. If the probability is sufficiently low, the model registers high "surprise" as a signal spike. Essentially, surprise is inversely proportional to the probability of an event occurring – the lower the probability, the higher the signal spike.

"When an object moves behind a wall, your physics engine maintains a belief that the object remains behind the wall," Ullman explains. "If the wall is removed and nothing is there, there's a mismatch. The model then recognizes, 'There's an object in my prediction, but I see nothing. The only explanation is that it disappeared, which is surprising.'"

Testing Physical Expectations

In developmental psychology, researchers conduct "violation of expectations" tests where infants watch pairs of videos. One video shows a plausible event with objects following expected physical behaviors, while the other is identical except that objects behave in ways that violate expectations. Researchers often measure how long infants stare at a scene after an implausible action occurs, hypothesizing that longer gaze indicates greater surprise or interest.

For their experiments, the researchers created several scenarios based on classical developmental research to examine the model's core object knowledge. They recruited 60 adults to watch 64 videos displaying both physically plausible and implausible scenarios. For example, objects would move behind a wall and either remain there when the wall was removed or have disappeared. Participants rated their surprise at various moments on a scale of 0 to 100. The researchers then showed the same videos to the model, examining its ability to understand object permanence (objects don't appear or disappear without reason), continuity (objects move along connected trajectories), and solidity (objects cannot pass through each other).

ADEPT's responses closely matched human reactions, particularly in scenarios where objects moved behind walls and disappeared when the wall was removed. Interestingly, the model also demonstrated surprise in scenarios that humans didn't find surprising but perhaps should have. For instance, when an object moving at a certain speed disappeared behind a wall and immediately emerged on the other side, the object might have dramatically accelerated or teleported. Both humans and ADEPT showed uncertainty about whether these events were surprising. The researchers also discovered that traditional neural networks learning physics from observations – but without explicitly representing objects – were far less accurate at distinguishing surprising from unsurprising scenes, and their assessments rarely aligned with human perceptions.

Looking ahead, the researchers plan to explore further how infants observe and learn about the world, with the goal of incorporating new findings into their model. Studies show, for example, that infants up to a certain age aren't particularly surprised when objects completely transform – such as when a truck disappears behind a wall but reemerges as a duck.

"We want to discover what else needs to be incorporated to understand the world more like infants do," Smith concludes. "By formalizing our knowledge of psychology, we aim to build better AI agents that can more effectively navigate and understand the physical world."

tags:AI intuitive physics understanding machine learning physical laws artificial intelligence infant cognition ADEPT AI model physics engine AI perception of physical reality
This article is sourced from the internet,Does not represent the position of this website
justmysocks
justmysocks