This week, I happened to read two dichotomous news on machine learning.
1. Data Science leveraged to stop human trafficking:
The “Science Against Slavery” Hackathon, was an all-day Hackathon aimed sharing ideas and creating science-based solutions to the problem of human trafficking. Data scientists, students and hackers honed in on data that district attorneys would otherwise never find. Many focused on automating processes so agencies could use the technology with little guidance. Some focused primarily on generating data that could lead to a conviction—which is much easier said than done. One effort from EPIK Project founder Tom Perez included creating fake listings. They could then gather information on respondents, including real world coordinates. Other plans compared photos mined from escort ads and sites to those from missing person reports. Web crawling could eventually lead to geocoding phone numbers or understanding the distribution of buyers and sellers, as well as social network analysis.
2. Machine learning algorithm used to predict future criminals is biased against blacks:
We obtained the risk scores assigned to more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014 and checked to see how many were charged with new crimes over the next two years, the same benchmark used by the creators of the algorithm.
The score proved remarkably unreliable in forecasting violent crime: Only 20 percent of the people predicted to commit violent crimes actually went on to do so.
When a full range of crimes were taken into account — including misdemeanors such as driving with an expired license — the algorithm was somewhat more accurate than a coin flip. Of those deemed likely to re-offend, 61 percent were arrested for any subsequent crimes within two years.
We also turned up significant racial disparities, just as Holder feared. In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways.
1. Data Science leveraged to stop human trafficking:
The “Science Against Slavery” Hackathon, was an all-day Hackathon aimed sharing ideas and creating science-based solutions to the problem of human trafficking. Data scientists, students and hackers honed in on data that district attorneys would otherwise never find. Many focused on automating processes so agencies could use the technology with little guidance. Some focused primarily on generating data that could lead to a conviction—which is much easier said than done. One effort from EPIK Project founder Tom Perez included creating fake listings. They could then gather information on respondents, including real world coordinates. Other plans compared photos mined from escort ads and sites to those from missing person reports. Web crawling could eventually lead to geocoding phone numbers or understanding the distribution of buyers and sellers, as well as social network analysis.
2. Machine learning algorithm used to predict future criminals is biased against blacks:
We obtained the risk scores assigned to more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014 and checked to see how many were charged with new crimes over the next two years, the same benchmark used by the creators of the algorithm.
The score proved remarkably unreliable in forecasting violent crime: Only 20 percent of the people predicted to commit violent crimes actually went on to do so.
When a full range of crimes were taken into account — including misdemeanors such as driving with an expired license — the algorithm was somewhat more accurate than a coin flip. Of those deemed likely to re-offend, 61 percent were arrested for any subsequent crimes within two years.
We also turned up significant racial disparities, just as Holder feared. In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways.
- The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.
- White defendants were mislabeled as low risk more often than black defendants.
No comments:
Post a Comment