When artificial intelligence systems undergo training using skewed datasets—such as those containing predominantly lighter skin tones while underrepresenting darker complexions—they carry substantial risks of generating discriminatory outcomes when deployed in real-world applications.
However, this represents merely one facet of a more complex challenge. Pioneering investigations conducted by MIT scholars have revealed that widely adopted machine learning frameworks for image processing inherently embed bias when trained on imbalanced data. Remarkably, this embedded prejudice proves resistant to correction through conventional approaches, including cutting-edge fairness enhancement methodologies and even comprehensive retraining with perfectly balanced datasets.
To address this critical issue, the research team devised an innovative methodology that integrates fairness directly into the model's fundamental architecture. This breakthrough enables AI systems to generate equitable outputs even when trained on inherently biased data—a particularly significant advancement given the scarcity of well-balanced machine learning datasets in practice.
Their novel solution not only facilitates more balanced predictive capabilities but also enhances performance across various downstream applications, including facial recognition systems and biological species classification tasks.
"Within the machine learning community, it's commonplace to attribute model biases solely to training data. However, perfectly balanced datasets rarely exist in real-world scenarios. Consequently, we must develop methodologies that effectively address data imbalance challenges," explains Natalie Dullerud, lead researcher and graduate student within MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) Healthy ML Group.
Dullerud's collaborative team includes Kimia Hamidieh, a fellow graduate student in the Healthy ML Group; Karsten Roth, a former visiting researcher now pursuing graduate studies at the University of Tubingen; Nicolas Papernot, an assistant professor at the University of Toronto's Department of Electrical Engineering and Computer Science; and senior author Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group. Their groundbreaking research will be presented at the prestigious International Conference on Learning Representations.
Defining Fairness in AI Systems
The machine learning technique under investigation by the research team, known as deep metric learning, represents a comprehensive approach to representation learning. In this methodology, a neural network learns to recognize similarities between objects by mapping comparable items into proximity while distancing dissimilar ones. During the training process, this neural network positions images within an "embedding space" where similarity metrics between images correspond directly to their spatial separation.
For instance, when implementing deep metric learning for bird species classification, the system would organize photographs of goldfinches into one region of the embedding space while grouping cardinals in another distinct area. Once adequately trained, the model can effectively evaluate similarities of previously unseen images. It demonstrates the capability to cluster images of unfamiliar bird species together while maintaining appropriate distance from established groups like cardinals or goldfinches within the embedding space.
The similarity metrics acquired through this process demonstrate remarkable robustness, explaining why deep metric learning frequently finds application in facial recognition technologies, according to Dullerud. This observation prompted her team to investigate methodologies for evaluating potential bias within similarity metrics.
"We recognize that data inherently reflects societal biases and systemic inequalities. This reality necessitates shifting our focus toward developing methods better adapted to real-world conditions," notes Ghassemi.
The researchers established two primary dimensions through which similarity metrics might exhibit unfairness. Considering facial recognition applications, a metric would be considered unfair if it demonstrates a higher probability of embedding individuals with darker complexions in closer proximity to each other—even when representing different persons—than it would for individuals with lighter skin tones. Secondly, a metric would be deemed unfair if the features it learns for measuring similarity demonstrate superior effectiveness for majority populations compared to minority groups.
Through comprehensive experimentation with models employing biased similarity metrics, the researchers discovered they could not overcome the bias embedded within the model's embedding space.
"This finding is particularly concerning because many organizations commonly release these embedding models, which users subsequently fine-tune for specific downstream classification tasks. However, regardless of downstream modifications, addressing fairness problems originating in the embedding space remains impossible," Dullerud explains.
Even when users retrain models using balanced datasets for downstream tasks—which represents the optimal scenario for addressing fairness concerns—performance gaps of at least 20 percent persist, she reports.
The only viable solution involves ensuring the embedding space maintains fairness from its inception.
Developing Separate Metrics for Fairness
The researchers' solution, designated as Partial Attribute Decorrelation (PARADE), involves training the model to develop separate similarity metrics for sensitive attributes—such as skin tone—and subsequently decorrelating these sensitive attribute metrics from the targeted similarity metric. When the model learns similarity metrics for different human faces, it maps similar faces into proximity while distancing dissimilar ones, utilizing features beyond skin tone for differentiation.
This methodology allows for decorrelation between similarity metrics across any number of sensitive attributes. Furthermore, because the similarity metric for sensitive attributes exists within a separate embedding space, it becomes discarded following training, leaving only the targeted similarity metric within the final model.
Their approach demonstrates broad applicability since users can control the degree of decorrelation between similarity metrics. For example, when developing models for breast cancer diagnosis through mammogram imaging, clinicians likely prefer retaining some biological sex information within the final embedding space, given that women face significantly higher breast cancer risk compared to men, Dullerud clarifies.
The team evaluated their methodology across two distinct applications—facial recognition and bird species classification—discovering that it successfully reduced performance gaps attributable to bias, both within the embedding space and during downstream tasks, irrespective of the dataset employed.
Looking toward future research directions, Dullerud aims to explore methodologies for compelling deep metric learning models to develop effective features from the outset.
"How can we properly audit fairness in AI systems? This remains an open question. How can we determine whether a model will demonstrate fairness, or identify the specific contexts where it will perform equitably? These represent the questions I'm particularly interested in exploring moving forward," she concludes.