How Should We View Biased Clinical Data in Medical Machine Learning? A Call for an Archaeological Perspective
Researchers from MIT, Johns Hopkins University, and the Alan Turing Institute argue that dealing with biased medical data in AI systems isn’t as simple as the saying “garbage in, garbage out” suggests. AI-biased models have become popular in the healthcare industry. Usually, when data is biased, people try to fix it by collecting more data from underrepresented groups or creating synthetic data to balance things out. However, the researchers think this technical approach needs a broader view. They say we should consider historical and current social factors too. By doing this, we can tackle bias in public health more effectively. The authors realized that we often treat data problems as technical annoyances. They compared data to a cracked mirror reflecting our past actions, which might not show the full truth. But once we understand our history through data, we can work towards addressing and improving our practices in the future.
In the paper titled “Considering Biased Data as Informative Artifacts in AI-Assisted Health Care,” three researchers argue that we should see biased medical data as valuable artifacts in archaeology or anthropology. These artifacts reveal practices, beliefs, and cultural values that have led to healthcare inequalities. For example, a widely used algorithm wrongly assumed that sicker Black patients needed the same care as healthier white patients because it didn’t consider unequal access to healthcare. The researchers suggest that instead of just fixing biased data or discarding it, we should use an “artifacts” approach. This means recognizing how social and historical factors influence data collection and clinical AI development. Computer scientists may not fully grasp the social and historical aspects behind the data they use, so collaboration is essential to make AI models work well for all groups in healthcare.
The researchers recognize a challenge in the artifact-based approach that figures out if data have been racially corrected, meaning they are based on the assumption that white male bodies are the standard for comparison. They mention an example where a kidney function measurement equation was corrected, assuming black people have more muscle mass. Researchers need to be ready to investigate such corrections during their research. In another paper, researchers found that including self-reported race in machine learning models can make things worse for minority groups. Self-reported race is a social construct and might not always help. The approach should depend on the evidence available.
Biased datasets should not be kept as they are, but they can be valuable when treated as artifacts. The researchers from the National Institutes of Health (NIH) emphasize ethical data collection. Understanding biases in different contexts can help create better AI for specific populations. This approach may also lead to new policies to eliminate bias. The researchers are still working on addressing current healthcare issues rather than fearing hypothetical AI problems in the future.
Check out the Paper 1, Paper 2, and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
Bhoumik Mhatre is a Third year UG student at IIT Kharagpur pursuing B.tech + M.Tech program in Mining Engineering and minor in economics. He is a Data Enthusiast. He is currently possessing a research internship at National University of Singapore. He is also a partner at Digiaxx Company. ‘I am fascinated about the recent developments in the field of Data Science and would like to research about them.’