Algorithm Dysfunction

In her TED talk titled “How we're teaching computers to understand pictures” (Li, 2015), Fei-Fei Li demonstrates how an image recognition algorithm from the research facility Vision Lab recognises and labels what it perceives an image to be. She demonstrates the algorithm working flawlessly predicting multiple objects within images, generalising when it does not 100% know what the object or thing is and even picking out specific models of cars via Google Street View. She then goes on to demonstrate how they are working towards the algorithm building full sentences to describe imagery and the implication that this technology will have in the near future.

“When machines can see, doctors and nurses will have extra pairs of tireless eyes to help them to diagnose and take care of patients. Cars will run smarter and safer on the road. Robots, not just humans, will help us to brave the disaster zones to save the trapped and wounded. We will discover new species, better materials, and explore unseen frontiers with the help of the machines.”

- Fei-Fei Li, Director of Stanford’s Artificial Intelligence Lab and Vision Lab

The software itself does things such as spotting cats in pictures (an example used by Li) via the use of artificial neural networks within algorithms. An artificial neural network is a methodology in which algorithms promote the further intelligence of a systems decision making. Artificial due to its modelling after organic neural networks within our own brains, artificial neural networks are fed data; for our example of the cat pictures the network would be fed images with data linked to them telling the system what was in that image and where it was. That data is then stored and categorised for later interpretation. The system itself has no prior knowledge of what it is being fed, the system only knows what it is told via the input sets. Once an appropriate amount of data has been added (for the case of Li’s system 15 million images with more than a billion tags of objects) the system can then make use of algorithms to interpret new input data by looking back at what is has learnt, for this example images, and output what it thinks it is seeing (Nielsen, 2015). Artificial neural networks are quickly becoming common use when building systems which profit from artificial intelligence such as in facial recognition software in policing where CCTV can be used alongside systems such as these to target specific individuals over massive geographical areas (Nunn, 2001). Developments in neural networks are a promising step forward to reliable and extremely useful AI but as with all algorithms there are ways in which bias can influence its output.

“Gorillas”

In 2015 Jacky Alciné, a black man, was using Google Photos to sort though some pictures that he had taken that day. He noticed that the platform had automatically sorted through his photos and categorised them together by what it interpreted to be in the images. To his shock he found a category labelled as “Gorillas” containing selfies of him and his friend. The image recognition artificial neural network in this case had failed to produce an accurate result, not only mislabelling the images but also providing a label that has persecuted Alciné’s race for hundreds of years. He quickly took to twitter to blow the whistle on Google; they responded saying that “We’re appalled and genuinely sorry that this happened. We are taking immediate action to prevent this type of result from appearing,” (Curtis, 2019). Alciné took to twitter to vent his anger about the situation stating “Like I understand HOW this happens; the problem is moreso on the WHY. This is how you determine someone's target market.” (Alciné, 2015).

Being a computer programmer Alciné understood the parameters for which this could have happened but was more so shocked to why a company as big as Google enabled it to happen in the first place. Training of the network was largely due to “HOW” this event was allowed to take place. If the algorithm is not correctly trained with enough data to pair against the input, then the system will fall back to vaguely what it thinks it sees. This would be a clean example of how representation bias can happen, where specific groups within a population are under represented within training data and ultimately treated as a secondary user or an edge case. You could also say that historical bias was prevalent within this example; where stereotypes were reinforced by the labelling of “Gorillas”. Even if this was possibly an edge case, the event still caused massive distress to Alciné and others who had similar events happen with Google Photos. We can also speculate that evaluation bias was present in the testing of this product where output data produces a false positive. If Google Photos was tested thoroughly and with a wide range of users considering massive variables such as race, then the example could have been highlighted and dealt with prior to a public roll out.

As part of Google's explanation to the situation they stated that “Really interesting problems in image recognition here: obscured faces, different contrast processing needed for different skin tones and lighting, etc. We used to have a problem with people (of all races) being tagged as dogs, for similar reasons. We're also working on longer-term fixes around both linguistics (words to be careful about in photos of people [lang-dependent] and image recognition itself. (e.g., better recognition of dark-skinned faces)” (Zunger, 2015). Wachter-Boettcher (Wachter-Boettcher, 2017 pp. 133-135) points out that this very matter could have been avoided if Google had learned from history. Kodak had similar issues back in the 1950’s when they started issuing development kits to third parties. They did not provide for correct measurements of colour grading for black skin and instead just gave labs measurements of white skin. As a direct result of this, black people all over the US found that their photos were being incorrectly developed. It was not until during the 1970’s that they started releasing kits with black skin colour grading that this problem was rectified. Even when questioned about this Kodak stated that they had never really considered it an issue and had only made the addition due to complaints from chocolate manufactures.

manufactures. Another possible implantation of “HOW” would be related to ethnicity across Google’s workforce. Google’s annual diversity report (Brown, 2018) has been the measurement of the tech giants progress towards equality since 2014. In their 2018 report, Google’s chief diversity and inclusion officer, Danielle Brown, stated that “We care deeply about improving workforce representation and creating an inclusive culture for everyone. While we’re moving in the right direction, we are determined to accelerate progress.” This acceleration comes in the form of a menial increase of 0.6% black workers between 2014 and 2018; taking the total number of black workers to only 2.5%. Compare this to the 53.1% of white workers that are currently employed at Google and you can start to see how issues such as the “Gorilla” example happen. Google were not consciously designing their photos application to be discriminative, but if your workforce is only made up of 2 black individuals for every 100 workers then even with all the persona considerations, designers will inherently design for themselves (Keates and Clarkson, 2002). This “HOW” can also be considered an example of representation bias.

As for “WHY” workforce representation is allowed to persist even with Google’s statements about them “caring deeply”, Wachter-Boettcher (Wachter-Boettcher, 2017 pp. 22-25) tells us of computer science sophomore named Kayla Thomas. Thomas was an exceptional student with perfect grades and even had experience building the app for Entertainment Weekly. Even with her great CV, she found herself being shunned and ignored by internship recruiters from silicon valley tech companies while attending recruitment events. Thomas is black. She asked her friends not of her ethnicity if they had the same experience but they told her of how they were “swimming in job opportunities”. Thomas’s friends of colour all had similar experiences to recount. Tech giants such as Facebook blame the pipeline of new employees coming from colleges as the reason why their diversity statistics are so slow in acceleration (Wells, 2019). But the statistics do not reflect this claim; “4.5% of all new recipients of bachelor's degrees in computer science or computer engineering from prestigious research universities were African American” (Weise and Guynn, 2014). With approximately double the amount of black individuals graduating then being hired, the pipeline issue does not appear to be correct.

In Thomas’s article where she wrote about her disgust in finding out the aforementioned statistics (Thomas, 2016) she goes on to say that companies such Facebook selectively hire not on qualification or experience, but on the “cultural fit” within the company.

“I’m not interested in ping-pong, beer, or whatever other gimmick used to attract new grads. The fact that I don’t like those things shouldn’t mean I’m not a “culture fit.” I don’t want to work in tech to fool around, I want to create amazing things and learn from other smart people. That is the culture fit you should be looking for.”

- Kayla Thomas, sophomore Computer Science student at Dartmouth College

Evidence of this “cultural fit” can be seen in a satirical article by Sarah Cooper (Cooper, 2018) where she mocks how a hypothetical tech companies diversity council is made up of entirely men and how slivers on a pie chart represent education and experience for getting a job and the majority being taken up by “ability to fit in with existing culture”.

To summarise, Google’s oversight of its technology, complacent testing and unconscious-self design combined, produce a product that discriminated against the individuals who use it on a day to day basis. No matter how fast the developers get back to their users on twitter, it is still wrong for an individual to be racially stereotyped. Even with public facing efforts to try and solve issues such as this from happening, they still treat these examples as edge cases, while focusing on the majority of its users to streamline design and development. Conscious efforts to break existing stereotypes of traditional tech workplaces are not being reaffirmed, highlighting an example of system justification whereby the silicon valley tech companies such as Facebook and Google acknowledge themselves to be a stereotype, but have implicitly worked against themselves. Weise and Guynn’s (2014) highlighting of the fallacy of the “pipeline” illustrates that the system itself is rejecting potential candidates that would ultimately profit the system but do not do so in fear of breaking the culture of said stereotype.

Predictive Policing

In 2013 Robert McDanie, a black man, had a knock on his door from the Police Commander of his city. The policeman warned McDanie that he should not commit any more crimes even though he did not have a violent criminal record. The reason for the commanders visit was due to McDanie being highlighted as one of 400 individuals how had been put on a “heat list” of suspected individuals who were likely to commit violent crime. The “heat list” itself was generated by a predictive machine learning algorithm utilising historical geographical and arrest data (Lum and Isaac, 2016). Police departments all over the world are now utilising black box machine learning algorithms to predict when and where crime will be committed and who is will be committed by.

Private companies such as PredPol provide police departments with in depth predictive outputs based upon historical data that the police department choses to input into the system. Once the system is running, police react to predictions made by the system; issuing patrols and highlighting areas that are statistically more likely to have higher crime rates. On paper this works with examples from PredPol such as “The Alhambra, CA Police Department reported a 32% drop in burglaries and a 20% drop in vehicle theft since deploying in January 2013.” (Predpol, 2018). As we have highlighted in previous example; if the input itself is bias then the output is going to be bias also. Unlike previous examples where data sets are already inherently discriminatory, PredPol insist that their algorithms are neutral stating the system “uses ONLY 3 data points – crime type, crime location, and crime date/time – to create its predictions.” they go on to add “This eliminates the possibility for privacy or civil rights violations seen with other intelligence-led or predictive policing models.” (PredPol, 2019). Predpol could be referring to how algorithmic systems used by judges in the US utilise the data set of “race” to factor how long an individual should stay in prison for or if they should ultimately die for what they did (O'Neil, 2018 pp. 24-27). Unlike these systems, PredPol does not use race as a data set but this doesn’t exclude the data to be interpreted in a way which promotes equality; it is ultimately down to the police staff themselves to interpret the data and act upon its predictions at their choosing.

Lum and Isaac (2016) propose that although these systems use neutral data sets, they provide feedback loops that then lead to discriminatory policing. This happens when police act upon PredPols predictions, arrests individuals or reports crime in the predicted area and then go onto input the new arrest and crime witness data back into the system. The system itself then then re-interprets the historical data but also with the newly included data on top of that; this then only adds to the systems initial prediction and thus highlights the same area as its next prediction causing a loop. Venkatasubramanian states that “Because this data is collected as a by-product of police activity, predictions made on the basis of patterns learned from this data do not pertain to future instances of crime on the whole,” and that “In this sense, predictive policing is aptly named: it is predicting future policing, not future crime.” (Ensign et al., 2018) In their video titled “American segregation, mapped at day and night” Chang, Posner and Lee (Chang, Posner and Lee, 2019) illustrate how although white and ethnic minorities work in the same location during the day, there is very much a segregation when it comes to the places they call home; showing that segregation in cities is still very much a part of American culture. When both the research from Ensign et al and Chang, Posner and Lee is combined, data that points to possible racial discrimination can be surmised. Lum and Isaac (2016) found that due to possible historical biases, input “drug use” data, published by Oakland police department, indicated that black individuals where being arrested at twice the rate of white individuals. Upon applying a recently publicly available PredPol algorithm, through simulation the authors found that the bias in the sample data was only reaffirmed with black individuals being targeted at twice the rate of white individuals. This targeting came in the form of known black residential areas being highlighted as possible areas of crime. Even though PredPol points to drug use taking place in black majority and often poor areas, drug use in the greater population points to use across both races being almost equal. Between rich and poor, black and white. (Knafo, 2013).

To address how PredPol could work naturally, one would first have to remove the representation bias within the sample data. This is difficult due to the previously mentioned historical bias that links to the stereotyping and persecution of black individuals over the past century (Black, 2003). Investigations into police communication with citizens shows that police change their language, often to one of greater confrontational tone, depending upon if a citizen is of an ethnicity other than white; where tone data was largely positive (Voigt et al., 2017). This highlights inequality at the very lowest of the police department’s interaction with citizens. If police struggle to communicate naturally face to face with ethnic citizens, it would not be unlikely that they take these either conscious or unconscious biases into greater policing systems like predictive policing. Taking a look back at PredPols own published “results” (PredPol, 2018), nearly all of the cases of positive statistics point to a drop in burglary. This is an easy statistic to use to sell the system and an example of measurement bias. Criminals such as burglars are aware of variables such as police presence in an area (Pearsall, 2010) and thus will not attempt to commit crimes in that area; as a direct result of this burglary rates can quickly recede after the implementation of a predictive policing system. With this understanding, it is thus viable to surmise that system justification by police departments is taking place. Police have access to medical records of drug use to measure their data against and still do not alter their policing methods even if it would reduce the overall crime rate in an area; promoting a negative reinforcement of their stereotype.

The algorithm within PredPol isn't inherently bad. Lung points out that it is actually relatively simple in its mathematics stating: , “In practice, at least for the data that I as a researcher have looked at, it reduced to, for the most part, not anything that was really significantly different than just a moving average.” (Haskins, 2019). For the algorithm to function without inequality the data that is input and how police ultimately interpret it are the variables that matter when considering how PredPol can be used as a viable tool. Pearsall (Pearsall, 2010) points out how predictive policing systems should not just be used as fact but instead used as a tool to aid police officers to make calculated decisions. Technology news website, Motherboard (Haskins, 2019) proposes a theory of “self-exciting point modelling” whereby PredPol proceeds to exist because police feed the loop and thus make them dependant upon the software’s function. Upon questioning by Motherboard on the matter, PredPol refused to comment. Predictive policing is “currently being used to help protect one out of every 33 people in the United States” (PredPol, 2018) illustrating how quickly this potentially discriminatory software is being adopted by police departments. Only now after seven years of active use is PredPol starting to be actively examined by academics such as Lum and Isaac (2016) and Ensign et al (Ensign et al., 2018), but also by the justice system as they start to analyse unfair examples of prosecution and its possible cause (Winston, 2018). With cases such as these, discrimination and inequality are now coming to light, allowing greater scrutiny of algorithmic based predictive policing systems.

Algorithm Dysfunction

“Gorillas”

Predictive Policing

Designed by Thomas Moore