Metacomment: This post is based on personal reflections. It’s not a scholarly post: I mostly just cite things that I had already read or that people suggested to me. This means that a lot of what I say here may have been said much better somewhere else, and there’s probably a lot of relevant literature that I don’t mention. I’m posting it because I want my blog to be place where I feel comfortable posting casual musings, but I think it’s important to flag that these are casual musings. Suggestions of relevant and related literature are very welcome in the comments.
In this post I argue that attempts to reduce bias in AI decision-making face two ‘ethical locality’ problems. The first ethical locality problem is the problem of practical locality: we are limited in what we can do because the actions available to us depend on the society we find ourselves in. The second ethical locality problem is the problem of epistemic locality: we are limited in what we can do because ethical views evolve over time and vary across regions.
The practical locality problem implies that we can have relatively fair procedures whose outputs nonetheless reflect the biases of the society they are embedded in. The epistemic locality problem gives us reason to understand the problems of AI bias to be instances of the broader problem of AI alignment: or the problem of getting AI to act in accordance with our values. Given this, I echo others in saying that our goal should not be to ‘solve’ AI bias. Instead, our goal should be to build AI systems that mostly reflect current values on questions of bias and that facilitate and are responsive to the progress we make on these questions over time.
Jenny and the clock factory
You are a progressive factory owner in the 1860s. Your factory makes clocks and hires scientists to help develop the clocks, managers to oversee people, and workers to build the clocks. The scientists and managers are in low supply and the roles are paid well, while the workers are in higher supply and receive less compensation. You’ve already increased wages as much as you can, but you want to make sure your hiring practices are fair. So you hire a person called Jenny to find and recruit candidates to each role.
Jenny notes that in order to be a scientist or a manager, a person has to have many years of schooling and training. Women cannot currently receive this training and the factory cannot provide this training because it lacks the resources and expertise needed to do so. Many female candidates show at least as much promise as male candidates, but their lack of this crucial prior training makes them unsuited to any role except worker. Despite her best efforts, Jenny ends up hiring only men to the roles of scientist and manager, and hires both men and women as workers.
Jenny’s awareness of all the ways in which the factory’s hiring practices are unfair is limited, however, because there are sources of unfairness that have yet to be adequately recognized in the 1860s. For example, it is not considered unfair to reject candidates with physical disabilities for worker roles rather than trying to make adequate accommodations for these disabilities. Given this, Jenny rejects many candidates with physical disabilities rather than considering ways in which their disabilities could be accommodated.
The practical locality problem
How fair is Jenny being with respect to gender? To try to answer this, we need to think about the relations between three important variables: gender (G), training (T) and hiring (H).
Deciding to hire a candidate only if they have relevant training (T→H) seems fair since the training is necessary for the job. Deciding to hire a candidate based on their gender alone (G→H) seems unfair, since gender is irrelevant to the job. The fact that women cannot receive the training (G→T) also seems unfair. But, unlike the relationship between T and H and the relationship between G and H, the relationship between G and T is exogenous to Jenny’s decision: it is one that Jenny cannot affect.
To model the situation, we can use dashed arrows to represent exogenous causal relationships—in this case, the relationship between G and T—and solid arrows to represent endogenous causal relationships. We can use red arrows to indicate causal relationships that actually exist between G, T, and H and we can use grey arrows to highlight the fairness of possible causal relationships. Jenny’s situation is as follows:
In this case, there is an important sense in which Jenny’s hiring decision not to hire is fair to each woman who applies because Jenny would have made the same decision had a man with the same level of training applied. If women were given the necessary training, Jenny would hire them. If men were denied the necessary training, Jenny would not hire them. (Her decision therefore satisfies the counterfactual definition of fairness given by Kusner et al, though see Kohler-Hausmann for a critique of counterfactual causal models of discrimination.)
But there is also an important sense in which the fact that Jenny hires only men into scientist and manager roles is unfair. The unfairness is upstream of Jenny. The outcome is unfair because her options are limited by unfair societal practices, i.e. by the fact that women are denied the schooling and training necessary to become scientists and managers.
I’m going to use the term ‘procedurally unfair’ to refer to decisions that are unfair because of unfairness in the decision-making procedure being used. Chiappa and Gillum say that ‘a decision is fair toward an individual if it coincides with the one that would have been taken in a counterfactual world in which the sensitive attribute along the unfair pathways were different’. Building on this, I will say that a decision is procedurally unfair if it diverges from the one that would have been taken in a counterfactual world in which the sensitive attribute along the unfair endogenous pathways were different.
I’m going to use the term ‘reflectively unfair’ to refer to decisions that may or may not be procedurally unfair, but whose inputs are the result of unfair processes, and where the outcomes ‘reflect’ the unfairness of those processes. This is closely related to Chiappa and Isaac’s account of the fairness of a dataset as ‘the presence of an unfair causal path in the data-generation mechanism’. I will say that a decision is reflectively unfair if it diverges from the one that would have been taken in a counterfactual world in which the sensitive attribute along the unfair exogenous pathways were different.
Since decision-makers cannot always control or influence the process that generates the inputs to their decisions, the most procedurally fair options available to decision-makers can still be quite reflectively unfair. This is the situation Jenny finds herself in when it comes to hiring women as scientists and managers.
When it comes to hiring and gender, Jenny has encountered what I will call the practical locality problem. The options available to Jenny depend on the practices of the society she is embedded in. This means that even the most procedurally fair choice can reflect the unfair practices of this society. (What’s worse is that all of the options available to Jenny may not only reflect but to some degree reinforce those practices. Hiring women who cannot perform well in a given role and failing to hire any women into those roles could both be used to reinforce people’s belief that women are not capable of performing these roles.)
The epistemic locality problem
How fair is Jenny with respect to disability status? I think that Jenny is being unfair to candidates with physical disabilities. But the primary cause of her unfairness isn’t malice or negligence: it’s the fact that Jenny lives in a society hasn’t yet recognized that her treatment of those with physical disabilities is unfair. Although we may wish Jenny would realize this, we can hardly call it negligent of Jenny to not have advanced beyond the moral understanding of almost all of her contemporaries.
If we use D to indicate disability status and a subscript to indicate the values and beliefs that a decision is considered fair or unfair with respect to (i.e. FAIRX means ‘this was generally considered fair in year X’), the model of Jenny’s situation is fairly simple:
When it comes to hiring and disability, Jenny is facing what I will call the epistemic locality problem. As we learn more about the world and reflect more on our values, our ethical views become more well-informed and coherent. (For moral realists, they can get better simpliciter. For subjectivists, they can get better by our own lights.) The limits of our collective empirical knowledge and our collective ethical understanding can place limits on how ethical it is possible for us to be at a given time, even by our own lights. This is the epistemic locality problem.
I call these problems ‘ethical locality’ problems because they’re a bit like ethical analogs of the principle of locality in physics. The practical locality problem points to the fact that the set of actions available to us is directly impacted by the practices of those close to us in space and time. The epistemic locality problem points to the fact that our ethical knowledge is directly impacted by the ethical knowledge of those that are close to us in space and time. (But, as in physics, the causal chain that generated the local circumstances may go back a long way.)
Current AI systems and the problems of ethical locality
Are AI systems in the 2020s in a qualitatively different position than the one Jenny finds herself in? Do they have a way of avoiding these two ethical locality problems? It seems clear to me that they do not.
AI systems today face the practical locality problem because we continue live in a society with a deeply unfair past that is reflected in current social institutions and practices. For example, there are still large differences in education across countries and social groups. This doesn’t mean that there isn’t a lot of work that we need to do to reduce procedural bias in existing AI systems. But AI systems with little or no procedural bias as defined above will still make decisions or perform in ways that are reflectively biased, just as Jenny does.
AI systems today also face the epistemic locality problem. Even if we think we have made a lot of progress on questions of bias since the 1860s, we are still making progress on what constitutes bias, who it is directed at, and how to mitigate it. And there are almost certainly attributes that we are biased against that aren’t currently legally or ethically recognized. In the future, the US may recognize social class and other attributes as targets of bias. The standards used to identify such attributes are also likely to change over time.
Future accounts of bias may also rely less on the concept of a sensitive attribute. Sensitive attributes like gender, race, etc. are features of people that are often used to discriminate against them. Although it makes sense to use these broad categories for legal purposes, it seems likely that more characteristics are discriminated against than the law currently recognizes (or can feasibly recognize). In the future, our concept of bias could be sensitive to bias against individuals for idiosyncratic reasons, such as bias against a job candidate because their parents didn’t donate to the right political party.
I hope it’s not controversial to say that we probably haven’t reached the end of moral progress on questions of bias. This means we can be confident that current AI systems, like Jenny, face the problem of epistemic locality.
Consequences of the practical locality problem for AI ethics
The practical locality problem shows that we can have procedurally fair systems whose outputs nonetheless reflect the biases of the society they are embedded in. Given this, I think that we should try to avoid implying that AI systems that are procedurally fair by our current standards are fair simpliciter. Suppose the factory owner were to point to Jenny and say ‘I know that I’ve only hired men as scientists and managers, but it’s Jenny that made the hiring decisions and she is clearly a fair decision-maker.’ By focusing on the procedural fairness of the decisions only, the owner’s statement downplays their reflective unfairness.
We therefore need to be aware of the ways in which AI systems can contribute to and reinforce existing unfair processes even if those systems are procedurally fair by our current standards.
The practical locality problem also indicates that employing more procedurally fair AI systems is not likely to be sufficient if our goal is to build a fair society. Getting rid of the unfairness that we have inherited from the past—such as different levels of investment in education and health across nations and social groups—may require proactive interventions. We may even want to make decisions that are less procedurally fair in the short-term if doing so will reduce societal unfairness in long-term. For example, we could think that positive discrimination is procedurally unfair and yet all-things-considered justified.
Whether proactive interventions like positive discrimination are effective at reducing societal unfairness (as we currently understand it) is an empirical question. Regardless of how it lands, we should recognize that increasing procedural fairness may compete with other things we value, such as reducing societal unfairness. Building beneficial AI systems means building systems that make appropriate trade-offs between these competing values.
Consequences of the epistemic locality problem for AI ethics
If we think we have not reached the end of moral progress on ethical topics like bias, the language of ‘solving’ problems of bias in AI seems too ambitious. We can build AI systems that are less procedurally biased, but saying that we can ‘solve’ a problem implies that the problem is a fixed target. The ethical problems of bias are best thought of as moving targets, since our understanding of them updates over time. Rather than treating them like well-formed problems just waiting for solutions, I suspect we should aim to improve our performance with respect to the current best target. (This is consistent with the view that particular subproblems relating to AI bias that are fixed targets that can be solved.)
In general, I think a good rule of thumb is ‘if a problem hasn’t been solved despite hundreds of years of human attention, we probably shouldn’t build our AI systems in a way that presupposes finding the solution to that problem.’
If our values change over time—i.e. if they update as we get more information and engage in more ethical deliberation—then what is the ultimate goal of work on AI bias and AI ethics more generally? I think it should be to build AI systems that aligned with our values, and that promote and are responsive to ongoing moral progress (or ‘moral change’ for those that don’t think the concept of progress is appropriate here). This includes changes in our collective views about bias.
What does it mean to say that we should build AI systems that ‘align with our values’? Am I saying that systems should align to actual preferences, ideal preferences, or partially ideal preferences? Am I saying that they should align to individual or group preferences and, if the latter, how do we aggregate those preferences and how do we account for differences in preferences? Moreover, how do we take into account problems like the tyranny of the majority or unjust preferences? These are topics that I will probably return to in other posts (see Gabriel (2020) for a discussion of them). For the purposes of this post, it is enough to say that building AI systems that are align with our values means building AI systems that reflect current best practices on issues like bias.
Progress on AI alignment is imperative if we want to build systems that reflect our current and future values about bias.
Problems in AI bias also constitute concrete misalignment problems. Building systems that don’t conflict with our values on bias means giving the right weight to any values that conflict, figuring out how to respond to differences in values across different regions, and building systems that are consistent with local laws. These present us with very real, practical problems when it comes to aligning current AI systems with our values. More powerful AI systems will likely present novel alignment problems, but the work we do on problems today could help build out the knowledge and infrastructure needed to respond to the alignment problems that could arise as AI systems get more powerful.
If this picture is correct then the relationship between AI alignment and AI bias is bidirectional. Progress in AI alignment can help us to improve our work on AI bias, and progress in AI bias can help us to improve our work on the problem of AI alignment.
Thanks to Miles Brundage, Gretchen Krueger, Arram Sabeti and others that provided useful comments on the drafts of this post.