In the last decade there has been a proliferation of risk assessment tools, mostly driven by the attempt to achieve “evidence based sentencing” or more rational pretrial release decisions. More recently there have been notable skeptics like former Attorney General Holder. But, even if you don’t go quite that far, there are judges who employ the instruments with such rigidity that they become a substitute for judgment, not an aid for judgment.
As reported by Professor Doug Berman in his Sentencing Law & Policy blog, a new research article in the latest issue of Science Advances provides a notable new perspective on the debate over risk assessment instruments. The article is authored by computer scientists Julia Dressel and Hany Farid and is titled, “The accuracy, fairness, and limits of predicting recidivism.”
Here are parts of its introduction:
In the criminal justice system, predictive algorithms have been used to predict where crimes will most likely occur, who is most likely to commit a violent crime, who is likely to fail to appear at their court hearing, and who is likely to reoffend at some point in the future.
One widely used criminal risk assessment tool, Correctional Offender Management Profiling for Alternative Sanctions (COMPAS; Northpointe, which rebranded itself to “equivant” in January 2017), has been used to assess more than 1 million offenders since it was developed in 1998. The recidivism prediction component of COMPAS — the recidivism risk scale — has been in use since 2000. This software predicts a defendant’s risk of committing a misdemeanor or felony within 2 years of assessment from 137 features about an individual and the individual’s past criminal record.
Although the data used by COMPAS do not include an individual’s race, other aspects of the data may be correlated to race that can lead to racial disparities in the predictions. In May 2016, writing for ProPublica, Angwin et al. analyzed the efficacy of COMPAS on more than 7000 individuals arrested in Broward County, Florida between 2013 and 2014. This analysis indicated that the predictions were unreliable and racially biased. COMPAS’s overall accuracy for white defendants is 67.0%, only slightly higher than its accuracy of 63.8% for black defendants. The mistakes made by COMPAS, however, affected black and white defendants differently: Black defendants who did not recidivate were incorrectly predicted to reoffend at a rate of 44.9%, nearly twice as high as their white counterparts at 23.5%; and white defendants who did recidivate were incorrectly predicted to not reoffend at a rate of 47.7%, nearly twice as high as their black counterparts at 28.0%. In other words, COMPAS scores appeared to favor white defendants over black defendants by under predicting recidivism for white and over predicting recidivism for black defendants….
While the debate over algorithmic fairness continues, we consider the more fundamental question of whether these algorithms are any better than untrained humans at predicting recidivism in a fair and accurate way. We describe the results of a study that shows that people from a popular online crowdsourcing marketplace — who, it can reasonably be assumed, have little to no expertise in criminal justice — are as accurate and fair as COMPAS at predicting recidivism. In addition, although Northpointe has not revealed the inner workings of their recidivism prediction algorithm, we show that the accuracy of COMPAS on one data set can be explained with a simple linear classifier. We also show that although COMPAS uses 137 features to make a prediction, the same predictive accuracy can be achieved with only two features. We further show that more sophisticated classifiers do not improve prediction accuracy or fairness. Collectively, these results cast significant doubt on the entire effort of algorithmic recidivism prediction.