A Swarm of Insight
By Beth W. Orenstein
Radiology Today
Vol. 20 No. 1 P. 12
When it comes to diagnosing pneumonia from chest X-rays, humans and AI are better together.
Chest X-rays to diagnose pneumonia are one of the most widely performed imaging studies in the United States, with more than 1 million adults hospitalized for the condition each year. Yet diagnosing pneumonia on chest X-rays can be challenging.
In 2017, researchers at Stanford University School of Medicine in California made headlines when they claimed their CheXNet system, which uses machine learning algorithms exclusively derived from deep learning, could detect pneumonia from chest X-rays with far more accuracy than experienced radiologists.
Now, a small group of doctors at Stanford have shown that when they work together in real time and combine their expertise with AI, they can diagnose the presence of pneumonia with even more accuracy than either alone. They found that their small group of eight radiologists, using technology called Swarm AI, had more than a 30% lower average error rate than individual radiologists. They also found that the Swarm AI technology was 22% more accurate than the CheXNet software-only system.
The researchers presented their findings at the 2018 Society for Imaging Informatics in Medicine Conference on Machine Intelligence in Medical Imaging, which was held in September in San Francisco.
Although it was a small study, two of the Stanford radiology professors who were involved, Matthew Lungren, MD, MPH, and Safwan Halabi, MD, are impressed with the results they've seen thus far. "The results of the study are very exciting because they point toward a future where doctors and AI algorithms can work together in real time, rather than human practitioners being replaced by automated algorithms," Lungren says.
Adds Halabi, "This new technology may enable us to generate more accurate data sets and increase the accuracy of all systems that use machine learning to train on medical data sets." Eventually, they agree, Swarm AI technology could routinely be used to diagnose difficult cases and therefore improve patient outcomes and lower medical costs.
AI for the People
Here's how the study came about: Scientist and engineer Louis Rosenberg, PhD, founded the technology firm Unanimous AI in San Francisco in 2014. His idea was to copy nature and use its "hive mind" to improve human decision-making. There's a reason why birds flock in groups, fish swim in schools, and bees swarm: It amplifies their collective intelligence, Rosenberg says.
"We've seen Swarm AI amplify intelligence across a whole bunch of different fields, from financial forecasting to business decision-making to market research and political forecasting," he says.
Named the Best AI Technology and Best in Show at the 2018 South by Southwest Festival's prestigious Innovation Awards in March 2018 in Austin, Texas, Swarm AI also has had notable success predicting the outcomes of sporting events, including the top four finishers in the Kentucky Derby and the Cubs winning the 2016 World Series.
Rosenberg was familiar with two studies that showed AI was as good as, if not better than, humans in detecting diseases. One was published in the field of dermatology in the Annals of Oncology in May 2018. It showed that a convolutional neural network outperformed a majority of human dermatologists in diagnosing melanoma. The other study, by Google DeepMind, was in the field of ophthalmology. Published in Nature Medicine in August 2018, it showed that algorithms trained by machine learning were as good as humans in detecting eye conditions.
Familiar with Stanford's CheXNet technology, Rosenberg approached the radiology department about testing Swarm AI against CheXNet. "The idea of applying Swarm AI to radiology was a natural step," he says.
When talking to radiologists, Rosenberg had heard some concern that AI was rivaling their job performance and that it could someday replace them. "I went to them and said, 'It's too early for AI to replace humans. There's another way to use AI. Let's use AI to amplify the intelligence of radiologists and take the crown back for the people,'" he recalls.
A Perfect Starting Place
While Rosenberg sees many potential applications in medicine, radiology was the perfect field to start, he says, because it's standard practice for radiologists to read at workstations, and the implementation would be simple. "They're already using a computer," he says. The difference with Swarm AI is they're connected to other radiologists and making decisions together.
Unanimous AI chose chest X-rays because of CheXNet, "but the same exact process could be used for any type of medical imaging," Rosenberg says. In fact, a mammography-centric Swarm AI project is in its early stages at UCLA.
For their study, Lungren, Halabi, and six others agreed to review 50 chest X-rays in real time. After a few seconds of individual assessment, the eight radiologists worked together as a swarm with the goal of agreeing on the likelihood the patient had pneumonia. Each radiologist worked from his own location.
"It was almost as if we were working on a Ouija board with magnets," Lungren says.
They could see each other annotating the X-rays. Their goal was first to converge on a coarse range of probabilities and second to converge on a refined value for the probability. They used only five levels of probability (0% to 20%, 20% to 40%, 40% to 60%, 60% to 80%, and 80% to 100%). It took them a total of approximately 90 seconds to view each X-ray. Rosenberg believes one of the reasons they were able to work so quickly was they weren't speaking to each other; their only communication was seeing what the others were highlighting, he says.
"We were coming to one consensus based off this group, not groupthink but a group team effort and annotation," Halabi says.
At a different time, the same 50 chest X-rays were run through the CheXNet software algorithm to generate whether each patient had pneumonia. The two sets of probabilities were scored against the ground truth and compared using a variety of statistical techniques. The researchers found that the Swarm AI system was 22% more accurate in binary classification than the software-only CheXNet system.
"We found that the hybrid human-machine system was able to outperform individual human doctors as well as the state of the art in deep learning–derived algorithms," Rosenberg says.
Were the radiologists reading the chest X-rays influenced by what they could see their colleagues annotating? Yes and no, Halabi says. "We couldn't see who the other people were who were moving their magnets around, but we saw where it was trending. If we knew one of our group was an expert in chest imaging, would we be swayed by what he or she was doing? Perhaps. But it was anonymous.
"There is definitely a psychology to this," Halabi continues, "but that's the whole premise behind the swarm technology. If it's a swarm of birds or fish, do they follow the leader or go independent and go rogue?"
Lungren agrees. "If you're one guy or gal pulling to one diagnosis and everyone else is pulling to another, that's a pretty clear-cut case where you're probably going to join that side," he says. "But then there are cases where people are split or people will eventually say, 'We lean toward this diagnosis,' and maybe they will compromise. That's what happens in a swarm."
Finding a Role
Even though their results were impressive and the process was not too time-consuming, Lungren and Halabi don't believe Swarm AI is practical for reading hundreds of plain chest X-rays in a day, as some radiologists might. "It would be too cumbersome for day-in and day-out clinical practice," Halabi says.
"This is clearly meant for certain situations," Lungren adds.
Rosenberg sees a role for Swarm AI in ground truth data sets that then could be used to train AI systems in radiology. "It could be used to generate more accurate data sets and increase the accuracy of all systems that use machine learning to train on medical data sets," he says.
Another area where Swarm AI might play a significant role is for second opinions and legal cases, Halabi says. "You could use it to determine if something was missed or if you needed a second opinion," he says. "You could have a group look at an image and determine whether or not there is a consensus about what's present on the images."
It is highly likely that Unanimous AI's Swarm system excels in certain types of cases, while software-only machine learning systems or individual radiologists excel in others. The next step is for more research that allows the radiologists to identify the differences and determine which method is best for which analysis. "Then each method could be applied to those cases where they are the most appropriate. If we are able to identify the cases that are best to go to a hive mind, then we should see even better performance with it," Rosenberg says, adding that it would be ideal if radiologists could triage cases so that the difficult ones went to a hive mind.
While Swarm AI isn't practical for daily clinical use, Rosenberg believes it could reduce the false-positive rate on difficult cases and thus help save patients from unnecessary worry and the cost of unnecessary procedures. "If you are able to reduce false-positive rates by a substantial percentage, it will be very cost-effective in terms of potential savings," he says.
Lungren shares additional thoughts on why Swarm AI makes sense: "One, if you're freeing up time by using AI to triage easy cases, you would have excess capacity to deal with difficult cases; and two, maybe you have better predictive power when you do it this way and that may lead to better diagnostics and therapy."
The radiology researchers are working on a follow-up. The first project used public data sets from the National Institutes of Health; the second will lean on Stanford's own internal data.
"It's essentially the same experiment that was presented in the abstract," Lungren says, "but it will have a more robust ground truth and 100 chest X-rays. We will still be looking at clinically proven pneumonia cases, but we'll see which performs best in what we know is the ground truth—individual radiologists, machine learning algorithms, or radiologists in a swarm."
Lungren expects it will take only about a month for the team to go through the 100 chest X-rays, but it will take another four or five months until the results are ready for publication.
— Beth W. Orenstein of Northampton, Pennsylvania, is a freelance medical writer and regular contributor to Radiology Today.