The very thought that computers may be subject to some of the same biases and prejudices as humans can be frightening, but the unsettling reality is something we as a society need to confront before the problem proliferates.
The very thought that computers may be subject to some of the same biases and prejudices as humans can be frightening, but the unsettling reality is something we as a society need to confront before the problem proliferates. We have entered an era where things like big data, machine learning, and adaptive algorithms are becoming commonplace among businesses, learning institutions, healthcare providers, and more. In many cases, these learning algorithms and the use of data are meant to overcome some of the shortcomings of human thoughts and actions, eliminating the biases that creep into our minds without our even realizing it. Sadly, many of the same prejudices could be copied over into how computers solve problems. Even worse, those biases may actually be magnified, and having the best of intentions doesn’t prevent the problems from spreading.
On the surface, using the likes of big data and machine learning (an application of Spark vs. Hadoop) would seem like an excellent way to get rid of biases. After all, if a computer is only dealing with hard data, how could bias even factor into the equation? Unfortunately, the use of big data doesn’t always lead to impartial outcomes. In the case of machine learning, algorithms look for ways to solve specific problems or come up with innovative solutions. While the process is a learning one, algorithms are still plugged in by human programmers, and unless specifically controlled and monitored for bias, those same algorithms may be learning in the wrong way, all against the intentions of the programmers.
Let’s take a look at some examples of how this can happen. One area where people want to use learning algorithms is in what is often called predictive policing. The idea is to use machine learning to find where to best allocate police resources in an effort to reduce crime. In this way, police would deploy officers and other resources according to where the data indicates and not based on racial profiling. While the goal is admirable, the algorithms are only as good as the data that is used, and if the data collected indicates police are already targeting a certain race and getting arrests out of it, the algorithms will tell police to focus on certain neighborhoods where that race is prevalent. Think of it as a form of confirmation bias, only made worse because police would now think they’re focusing their attention on the right spots because a computer told them to.
Another example comes from a British university which adopted a computer model that would help them with the admissions process. The program would base its selection criteria on historical patterns indicating which candidates were accepted and which were rejected. This would allow them during the first round of admissions to filter out candidates determined to have little chance of being accepted. Again, this is a case where the goal is an admirable one, yet the results ended up being troubling. Based off of the data fed into the algorithm, significant bias was shown against female candidates and those with non-European looking names. While the problem was discovered and addressed, the fact that a machine learning algorithm came to that conclusion should raise more than a few eyebrows.
There are many other examples that could be cited, but the news isn’t all doom and gloom. Many experts and groups are raising awareness of the problem before it becomes more widespread. And since the whole thing is looked at as accidental bias, there’s hope that steps can be taken to ensure more bias doesn’t creep into machine learning. First off, it’s possible to test for bias within algorithms, something that is relatively easy to do. Second, many are pushing for more engineers and programmers to become involved in these types of policy debates since they are the ones with the most understanding of the problem and how to solve it. Third, there’s a greater call for “algorithmic transparency”, which basically opens up the underlying algorithmic mechanisms to review and scrutiny, ensuring that bias is kept out as much as possible.
Perhaps it shouldn’t be surprising that computers could stereotype unwittingly. The major push is to develop artificial intelligence, which is meant to mimic the human mind in the first place. The fact that bias and prejudices are involved should be something expected but also controlled for. As more organizations adopt big data and related tools like Apache Spark, along with machine learning techniques, they’ll need to take care that the results they discover are the untampered truth and not the result of unintentional biases.