Breaking Down AutoML Barriers: Expert Insights on Automated Machine Learning Challenges and Future Solutions

Updated on: 12/23/2025 05:44 PM

The explosive growth of big data across industries—from financial services to healthcare to environmental protection—has fueled unprecedented demand for machine learning solutions that help organizations extract valuable insights from their collected information.

This rising industry need has propelled researchers to investigate the potential of automated machine learning (AutoML), which aims to streamline the creation of ML solutions to make them accessible to non-specialists, enhance their efficiency, and accelerate innovation in the field. For instance, an advanced AutoML platform could empower medical professionals to leverage their expertise in interpreting EEG results to develop a predictive model identifying patients at elevated risk for epilepsy—without requiring these doctors to possess data science backgrounds.

However, despite more than ten years of dedicated research, scientists have yet to completely automate every phase of the machine learning development workflow. Even today's most sophisticated commercial AutoML platforms still necessitate extensive collaboration between domain experts (such as marketing directors or mechanical engineers) and data scientists, creating inefficiencies in the process.

Kalyan Veeramachaneni, a principal research scientist at the MIT Laboratory for Information and Decision Systems who has been investigating AutoML since 2010, has co-authored a groundbreaking paper published in ACM Computing Surveys that introduces a seven-tier framework for assessing AutoML tools based on their degree of autonomy.

A system at level zero offers no automation and requires a data scientist to build models from the ground up manually, while a tool at level six represents complete automation, allowing non-experts to use it effectively and effortlessly. Most commercial systems currently operate somewhere in the middle range of this spectrum.

Veeramachaneni recently sat down with MIT News to discuss the present landscape of AutoML, the obstacles hindering truly autonomous machine learning systems, and what lies ahead for researchers in this evolving field.

Q: How has automated machine learning evolved over the past decade, and what is the current state of AutoML systems?

A: Around 2010, we began witnessing a transformation, with enterprises seeking to derive additional value from their data beyond traditional business intelligence applications. This naturally led to the question: could we automate certain aspects of developing machine learning-based solutions? The initial phase of AutoML focused on making our own work as data scientists more efficient. Could we eliminate the repetitive tasks we perform daily and automate them through software systems? This research trajectory continued until approximately 2015, when we recognized we still hadn't significantly accelerated the development process.

Subsequently, another perspective emerged. Numerous problems could potentially be solved with data, originating from experts who understand these challenges intimately and engage with them daily. These individuals typically have minimal exposure to machine learning or software engineering. How can we effectively involve them in the process? This represents the next frontier in automated machine learning technology development.

There are three critical areas where domain experts contribute significantly to a machine learning system. First, they help define the problem itself and assist in framing it as a prediction task that an ML model can solve. Second, they understand how the data was collected, so they possess intuitive knowledge about processing it appropriately. Third, machine learning models provide only a small component of a complete solution—they merely generate predictions. The output of an ML model serves as just one input to help a domain expert reach a final decision or action.

Q: What steps of the machine learning pipeline are the most difficult to automate, and why has automating them been so challenging?

A: The problem-formulation phase presents exceptional challenges for automation. For example, consider a researcher aiming to secure additional government funding who possesses extensive data about research proposal content and funding outcomes. Can machine learning provide assistance in this scenario? We don't yet have a definitive answer. During problem formulation, I leverage my domain expertise to translate the challenge into something more concrete to predict, which requires someone with deep domain knowledge. This individual also understands how to utilize the information post-prediction. This aspect of the process continues to resist automation.

One component of problem-formulation that could potentially be automated involves examining the data and mathematically expressing various possible prediction tasks automatically. We could then present these prediction tasks to the domain expert to determine if any would address their larger problem. Once the prediction task is selected, numerous intermediate steps—including feature engineering, modeling, and so on—represent highly mechanical processes that are relatively straightforward to automate.

However, defining prediction tasks has traditionally required collaboration between data scientists and domain experts because, without domain knowledge, one cannot effectively translate the domain problem into a prediction task. Additionally, domain experts often lack understanding of what "prediction" entails in this context. This creates significant back-and-forth communication in the process. If we could automate this step, machine learning adoption and the use of data to generate meaningful predictions would increase dramatically.

Similarly, what happens after a machine learning model delivers a prediction? While we can automate the software and technological aspects, ultimately, root cause analysis, human intuition, and decision-making remain essential. We can enhance these capabilities with various tools, but complete automation of these elements remains elusive.

Q: What do you hope to achieve with the seven-tiered framework for evaluating AutoML systems that you outlined in your paper?

A: My primary aspiration is that people begin recognizing which levels of automation have already been accomplished and which still require attention. Within the research community, we tend to focus on areas where we feel comfortable. We've become accustomed to automating certain steps and consequently remain within those boundaries. Automating other aspects of machine learning solution development is critically important, and that's where the most significant bottlenecks persist.

My second hope is that researchers will develop a clearer understanding of what domain expertise truly means. Much AutoML research continues to be conducted in academic settings, where we often don't engage in applied work. There isn't a precise definition of what constitutes a domain expert, and the term "domain expert" itself remains quite ambiguous. What we mean by domain expert is someone with expertise in the specific problem you're attempting to solve with machine learning. I hope everyone can unify around this definition, as it would bring much-needed clarity to the field.

I still believe we're unable to build as many models for as many problems as we should, and even for those models we do create, the majority aren't being deployed in everyday applications. The output of machine learning will simply become another data point—an augmented piece of information—in someone's decision-making process. How individuals make decisions based on this input, how it will change their behavior, and how they'll adapt their working methods remain significant, open questions. Once we achieve full automation, these will be the next challenges to address.

We must determine what fundamentally needs to change in the daily workflow of a loan officer at a bank, or an educator deciding whether to modify assignments in an online course. How will these professionals incorporate machine learning outputs into their processes? We need to focus on the essential elements we must develop to make machine learning more practical and usable in real-world scenarios.

tags:automated machine learning challenges and solutions future of AutoML technology development domain expertise in automated ML systems evaluating AutoML tools effectiveness machine learning automation barriers

This article is sourced from the internet，Does not represent the position of this website

Prev AI-Powered Thermal Imaging Revolutionizes Boiling Crisis Prediction in Cooling Systems

Next Revolutionizing Construction Management with AI-Powered 360-Degree Site Documentation

Welcome To AI news, AI trends website

Breaking Down AutoML Barriers: Expert Insights on Automated Machine Learning Challenges and Future Solutions

Friden Link