ML4H 2021
  • Home
  • Accepted Papers
  • Attend
    • Registration
    • Participation Guide
    • Schedule
    • Speakers
    • Research Roundtables
    • Career Mentorship
    • Raffle
    • Code of Conduct
  • Submit
    • Call for Participation
    • Writing Guidelines
    • Reviewer Instructions
    • Submission Mentorship
    • Reviewer Mentorship
  • Organization
    • About
    • Organizers
  • Past Events
    • 2020
    • 2019
    • 2018
    • 2017
    • 2016

Understanding the impact of class imbalance on the performance of chest x-ray image classifiers

Candelaria Mosquera, Luciana Ferrer, Diego H. Milone, Daniel Luna, Enzo Ferrante

Abstract: This work aims to understand the impact of class imbalance on the performance of chest x-ray classifiers, in light of the standard evaluation practices adopted by researchers in terms of discrimination and calibration performance. Firstly, we conducted a literature study to analyze common scientific practices and confirmed that: (1) even when dealing with highly imbalanced datasets, the community tends to use metrics that are dominated by the majority class; and (2) it is still uncommon to include calibration studies for chest x-ray classifiers, albeit its importance in the context of healthcare. Secondly, we perform a systematic experiment on two major chest x-ray datasets to explore the behavior of several performance metrics under different class ratios and show that widely adopted metrics can conceal the performance in the minority class. Finally, we propose the adoption of two alternative metrics, the precision-recall curve and the Balanced Brier score, which better reflect the performance of the system in such scenarios. Our results indicate that current evaluation practices adopted by the research community for chest x-ray classifiers may not reflect the performance of such systems for computer aided diagnosis in real clinical scenarios, and suggest alternatives to improve this situation.

Poster
Abstract: This work aims to understand the impact of class imbalance on the performance of chest x-ray classifiers, in light of the standard evaluation practices adopted by researchers in terms of discrimination and calibration performance. Firstly, we conducted a literature study to analyze common scientific practices and confirmed that: (1) even when dealing with highly imbalanced datasets, the community tends to use metrics that are dominated by the majority class; and (2) it is still uncommon to include calibration studies for chest x-ray classifiers, albeit its importance in the context of healthcare. Secondly, we perform a systematic experiment on two major chest x-ray datasets to explore the behavior of several performance metrics under different class ratios and show that widely adopted metrics can conceal the performance in the minority class. Finally, we propose the adoption of two alternative metrics, the precision-recall curve and the Balanced Brier score, which better reflect the performance of the system in such scenarios. Our results indicate that current evaluation practices adopted by the research community for chest x-ray classifiers may not reflect the performance of such systems for computer aided diagnosis in real clinical scenarios, and suggest alternatives to improve this situation.

Back to Top

© 2021 ML4H Organization Committee