Bias In Machine Learning: The COMPAS Example

Authors

  • Ana Farič University of Ljubljana
  • Ivan Bratko University of Ljubljana

Abstract

Introduction

The 2016 article There’s software used across the country to predict future criminals. And it’s biased against blacks [1] sparked a debate about the use of COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) in the U.S. justice system. As one of the most high-profile cases of alleged bias, it remains a focus of ongoing research. Yet, the question of bias remains unresolved, largely due to differing definitions of bias and the use of inconsistent evaluation metrics. This unresolved question is exactly what we want to address, and illustrate the challenges posed by the lack of standardized bias definition.

Methods

We conduct a literature review on bias in ML, covering key definitions, sources and approaches to its detection and measurement. We analyze how fairness is conceptualized across studies and how these concepts apply to COMPAS.

In the empirical section, we use the ProPublica dataset [1] and Orange tool to build and evaluate models. Model selection was informed by methodological and practical considerations. Logistic regression was widely used in previous studies, allowing for comparability and replication. Decision trees offer a high degree of interpretability and can be visualized effectively. Neural networks are included due to their increasing relevance in ML practice. 

The ProPublica dataset contains information on over 7000 individuals arrested in Broward County, Florida, between 2013 and 2014. It includes demographic variables, criminal history, COMPAS risk scores, and recidivism outcomes within a two-year follow-up period. 

We assess performance across metrics such as Area Under the Curve, false negative rate, false positive rate, and compare our findings to prior studies.

Limitations

Bias and fairness are inherently subjective and context-dependent concepts, making universal definitions elusive. This limits the generalizability of our conclusions in some ways.

Another issue is the ambiguity of the dataset – while publicly available, many variables are poorly documented, and prior studies often lack clarity on data filtering and attribute selection.

Expected Results

Literature suggests multiple valid but context-dependent definitions of bias. For instance, Hellstrom et al. [2] discuss inductive bias as a necessary aspect of the learning process, while Alelyani et al. [3] focus on data-driven biases.

So far, we have analyzed and prepared the dataset for our analysis, which we are currently working on. Preliminary results align with some of existing research.

Conclusion

Initial insights support the idea that different ML models yield similar biases regardless of their complexity, underscoring the importance of data quality and feature selection. Moreover, whether COMPAS is judged as biased depends heavily on which definition is applied. This reinforces the need for clear, context-appropriate bias criteria when evaluating ML systems in high-stakes domains like criminal justice.

References

[1] A. Angwin et al., “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And It’s Biased Against Blacks,” ProPublica, 2016.

[2] T. Hellstrom, V, Dignum, and S. Bensch, “Bias in Machine Learning - What is it Good For?,” 2024. [Online]. Available: https://arxiv.org/abs/2004.00686.

[3] S. Alelyani, “Detection and Evaluation of Machine Learning Bias,” Applied Sciences, vol. 11, no. 14, 2021.

Published

2025-06-10