Modern society increasingly relies on algorithmic decision-making across critical sectors including judicial systems, financial institutions, and private enterprises. These automated systems profoundly affect people's lives, yet frequently demonstrate troubling biases—particularly against communities of color and lower socioeconomic groups when determining loan eligibility, employment opportunities, or even bail amounts during court proceedings.
In a groundbreaking development, MIT scientists have engineered an innovative artificial intelligence programming language capable of evaluating algorithmic fairness with unprecedented precision and remarkable speed compared to existing solutions.
Their creation, known as Sum-Product Probabilistic Language (SPPL), represents a significant advancement in probabilistic programming systems. This emerging field bridges programming languages and artificial intelligence with the goal of simplifying AI system development, already demonstrating promising applications in computer vision, intelligent data processing, and automated data modeling. These specialized programming languages enable developers to define probabilistic models more efficiently and execute probabilistic inference—working backward to determine probable explanations for observed data.
"While several existing systems address various fairness concerns, our solution distinguishes itself through specialization and optimization for specific model classes, delivering performance improvements thousands of times faster," explains Feras Saad, a doctoral candidate in electrical engineering and computer science (EECS) and lead author of a recent publication detailing this research. Saad emphasizes these speed enhancements are substantial, with the system demonstrating performance up to 3,000 times faster than previous methodologies.
SPPL provides rapid, precise solutions to probabilistic inference questions such as "What is the likelihood of this model recommending loan approval for applicants over 40?" or "Generate 1,000 simulated loan applicants under 30 who would receive loan approval." These inference outcomes derive from SPPL programs encoding probabilistic models of likely applicant profiles and their classification methods. Fairness inquiries addressable by SPPL include "Does a disparity exist between loan recommendation probabilities for immigrant and non-immigrant applicants with identical socioeconomic backgrounds?" or "Given a qualified candidate from an underrepresented group, what is their probability of being hired?"
Unlike most probabilistic programming languages, SPPL exclusively permits users to construct probabilistic programs for which it can automatically deliver exact probabilistic inference results. Additionally, SPPL enables users to evaluate inference speed before execution, preventing inefficient program construction. In contrast, alternative probabilistic programming languages like Gen and Pyro allow creation of probabilistic programs where inference methods remain approximate—introducing errors whose characteristics and magnitude prove difficult to quantify.
While approximate probabilistic inference errors may prove acceptable in numerous AI applications, such inaccuracies become unacceptable in socially significant AI implementations, particularly in automated decision-making and especially during fairness analysis.
Jean-Baptiste Tristan, associate professor at Boston College and former Oracle Labs research scientist uninvolved in this research, observes, "Having conducted fairness analysis in both academic and large-scale industrial environments, I recognize SPPL's superior flexibility and reliability compared to other PPLs when addressing this challenging and crucial problem domain. This advantage stems from the language's expressiveness, its precise and straightforward semantics, and the speed and soundness of its exact symbolic inference engine."
SPPL eliminates errors by restricting itself to a carefully designed model class that still encompasses a wide range of AI algorithms, including decision tree classifiers extensively employed in algorithmic decision-making. SPPL operates by compiling probabilistic programs into specialized data structures called "sum-product expressions." The system further builds upon the emerging concept of utilizing probabilistic circuits as representations enabling efficient probabilistic inference. This methodology extends previous work on sum-product networks to models and queries expressed through probabilistic programming languages. However, Saad acknowledges certain limitations: "SPPL offers substantially faster analysis for decision tree fairness evaluation, for instance, but cannot analyze models like neural networks. Alternative systems can examine both neural networks and decision trees, though they typically operate more slowly and provide approximate answers."
"SPPL demonstrates that exact probabilistic inference is practically achievable, not merely theoretically possible, across a broad spectrum of probabilistic programs," notes Vikash Mansinghka, MIT principal research scientist and senior paper author. "In our laboratory, we've observed symbolic inference driving speed and accuracy enhancements in other inference tasks previously approached through approximate Monte Carlo and deep learning algorithms. We've also applied SPPL to probabilistic programs derived from real-world databases, quantifying rare event probabilities, generating synthetic proxy data under constraints, and automatically screening data for probable anomalies."
The revolutionary SPPL probabilistic programming language debuted in June at the ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI), presented in a paper co-authored by Saad, MIT EECS Professor Martin Rinard, and Mansinghka. Implemented in Python, SPPL is now available as open-source software.