Understand why with machine learning a model makes certain decisions is often just as important as the accuracy of those decisions. For example, a machine learning model might correctly predict that a skin lesion is cancerous, but it could have done so based on an independent blip on a clinical photograph.
While tools exist to help experts understand a model’s reasoning, these methods often only provide insight into one decision at a time, and each must be evaluated manually. Models are commonly trained with millions of data inputs, making it almost impossible for a human to evaluate enough decisions to identify patterns.
Now researchers at MIT and IBM Research have developed a method that allows a user to aggregate, sort and rank these individual explanations to quickly analyze the behavior of a machine learning model. Her technique, called Shared Interest, involves quantifiable metrics that compare how well a model’s reasoning agrees with that of a human.
Common interest could help a user to easily uncover relevant trends in a model’s decision making – for example, perhaps the model is often confused by distracting, irrelevant features like background objects in photos. Summarizing these insights could help the user to quickly and quantitatively determine if a model is trustworthy and ready to be used in a real-world situation.
“In developing shared interest, our goal is to extend this analysis process so that you can understand on a more global scale how your model is behaving,” says lead author Angie Boggust, a graduate student in the visualization group at the Laboratory for Computer Science and Artificial Intelligence (CSAIL ).
Boggust co-authored the paper with her advisor Arvind Satyanarayan, an assistant professor of computer science who leads the Visualization Group, and Benjamin Hoover and senior author Hendrik Strobelt, both from IBM Research. The paper will be presented at the Conference on Human Factors in Computing Systems.
Boggust began work on this project during a summer internship at IBM under the supervision of Strobelt. Upon returning to MIT, Boggust and Satyanarayan expanded the project and continued to collaborate with Strobelt and Hoover, who helped provide the case studies showing how the technique could be used in practice.
Human-AI Alignment
Shared Interest leverages popular techniques that show how a machine learning model made a particular decision, known as salience methods. When the model classifies images, salience methods highlight areas of an image that were important to the model when making its decision. These areas are visualized as a type of heat map called a salience map, which is often overlaid on top of the original image. If the model classified the image as a dog and the dog’s head is highlighted, it means that those pixels were important to the model when deciding that the image contained a dog.
Shared Interest works by comparing highlighting methods to ground truth data. In an image dataset, ground truth data is typically human-made annotations surrounding the relevant parts of each image. In the previous example, the box would surround the entire dog in the photo. When evaluating an image classification model, Shared Interest compares the model-generated highlight data and the human-generated ground truth data for the same image to see how well they match.
The technique uses several metrics to quantify this alignment (or misalignment) and then sorts a given decision into one of eight categories. Categories range from perfectly human-aligned (the model makes a correct prediction and the highlighted area in the salience map is identical to the human-generated box) to completely distracted (the model makes a wrong prediction and uses no image). Features found in the human-made box).
“At one end of the spectrum, your model made the decision for exactly the same reason as a human, and at the other end of the spectrum, your model and the human make that decision for vastly different reasons. By quantifying this for all images in your dataset, you can use that quantification to sort them,” explains Boggust.
The technique works similarly with text-based data, where keywords are highlighted instead of image regions.
Fast analysis
Researchers used three case studies to show how shared interest could be useful for both laypersons and machine learning researchers.
In the first case study, they used shared interest to help a dermatologist decide whether to trust a machine learning model developed to diagnose cancer from photos of skin lesions. Shared interest allowed the dermatologist to quickly see examples of the model’s right and wrong predictions. Ultimately, the dermatologist decided he couldn’t trust the model because it was making too many predictions based on image artifacts instead of actual lesions.
“The value here is that with shared interest, we can see these patterns appearing in our model’s behavior. In about half an hour, the dermatologist could make a confident decision on whether or not to trust the model and whether or not to use it,” says Boggust.
In the second case study, they worked with a machine learning researcher to show how Shared Interest can evaluate a particular highlighting method by uncovering previously unknown pitfalls in the model. Their technique allowed the researcher to analyze thousands of right and wrong decisions in a fraction of the time it would take traditional manual methods.
In the third case study, they used shared interest to delve deeper into a specific image classification example. By manipulating the ground truth region of the image, they were able to perform what-if analysis to see which image features were most important for specific predictions.
Researchers were impressed with how well Shared Interest performed in these case studies, but Boggust warns that the technique is only as good as the highlighting methods it’s based on. If these techniques are biased or inaccurate, Shared Interest inherits these limitations.
Going forward, the researchers plan to apply shared interest to different types of data, particularly tabular data used in medical records. They also want to use shared interest to improve current highlighting techniques. Boggust hopes this research will stimulate further work aimed at quantifying the behavior of machine learning models in a way that makes sense to humans.
This work is funded in part by the MIT-IBM Watson AI Lab, the United States Air Force Research Laboratory, and the United States Air Force Artificial Intelligence Accelerator.