ML4H 2021
  • Home
  • Accepted Papers
  • Attend
    • Registration
    • Participation Guide
    • Schedule
    • Speakers
    • Research Roundtables
    • Career Mentorship
    • Raffle
    • Code of Conduct
  • Submit
    • Call for Participation
    • Writing Guidelines
    • Reviewer Instructions
    • Submission Mentorship
    • Reviewer Mentorship
  • Organization
    • About
    • Organizers
  • Past Events
    • 2020
    • 2019
    • 2018
    • 2017
    • 2016

Improving the Fairness of Deep Chest X-ray Classifiers

Haoran Zhang, Natalie Dullerud, Karsten Roth, Stephen Pfohl, Marzyeh Ghassemi

Abstract: Deep learning models have reached or surpassed human-level performance in the field of medical imaging, especially in disease diagnosis using chest x-rays. However, prior work has found that such classifiers can exhibit biases in the form of gaps in predictive performance across protected groups. In this paper, we benchmark the performance of nine methods in improving the fairness of these classifiers. We utilize the minimax definition of fairness, which focuses on maximizing the performance of the worst-case group. Our experiments show that certain methods are able to improve worst-case performance for selected metrics and protected attributes. However, we find that the magnitude of such gains is limited. Finally, we provide best practices for selecting fairness definitions for use in the clinical setting.

Poster
Abstract: Deep learning models have reached or surpassed human-level performance in the field of medical imaging, especially in disease diagnosis using chest x-rays. However, prior work has found that such classifiers can exhibit biases in the form of gaps in predictive performance across protected groups. In this paper, we benchmark the performance of nine methods in improving the fairness of these classifiers. We utilize the minimax definition of fairness, which focuses on maximizing the performance of the worst-case group. Our experiments show that certain methods are able to improve worst-case performance for selected metrics and protected attributes. However, we find that the magnitude of such gains is limited. Finally, we provide best practices for selecting fairness definitions for use in the clinical setting.

Back to Top

© 2021 ML4H Organization Committee