Amanda Askell

Self Serving Utilitarian Arguments

2021-03-20T00:00:00-07:00

In AI ethics, “bad” isn’t good enough

2020-12-14T00:00:00-08:00

In ethics we use the term “pro tanto”, meaning “to that extent”, to refer to things that have some bearing on what we ought to do but that can be outweighed. The fact that your dog is afraid of the vet is a pro tanto reason not to take him. But perhaps you ought to take him despite this pro tanto reason not to, because keeping him in good health is worth the cost of a single unpleasant experience. If that’s true then we say you have “all things considered” reasons to take your dog to the vet.

In AI ethics, we often point to things that systems do that are harmful. A system might make biased decisions, use a lot of energy in training, produce toxic outputs, and so on. These are all pro tanto harms. Noting that a system does these things doesn’t tell us about the overall benefits or harms of the system or what we have all things considered reasons to do. It just tells us about one particular harm the system causes.

It’s useful to identify pro tanto harms. Pro tanto harms give us pro tanto reasons to do things. When we identify a pro tanto harm we have a pro tanto reason to fix the problem, to analyze it more, to delay deploying the system, to train systems differently in the future, and so on.

But most things that have any significance in the world create some pro tanto harms. And identifying pro tanto harms often doesn’t give us all that much information about what we should do all things considered, including whether we should do anything to reduce the pro tanto harm.

To see why, suppose an article points out that some surgical procedure results in painful stitches. The article draws no conclusions from this: it merely points out one bad thing about the surgery is that it results in these painful stitches, and describes the harm these stitches do in some detail.

There are three ways the harm of these stitches could be mitigated: by not performing the surgery, by giving patients stronger painkillers, or by reducing the length of the incision. But the surgery is essential for long-term health, the stronger painkillers are addictive, and a smaller incision is associated with worse outcomes. In fact, patients with larger incisions who are given fewer painkillers do much better than those from any other group.

In this case, although the pain of the surgery is a pro tanto harm, we actually have all things considered reasons to take actions that will increase that harm, since we ought to increase incision length and give fewer painkillers.

So it’s a mistake to assume that if we identify a pro tanto harm from an AI system, it must be the case that someone has done something wrong, something needs to be done to correct it, or the system shouldn’t be deployed. Maybe none of those things are true. Maybe all of them are. We can’t tell based solely on a discussion of the pro tanto harm alone.

While pro tanto harms don’t entail that we have all-things-considered reasons to do things differently, they do waggle their eyebrows suggestively while mouthing ‘look over there’. In order to know whether a pro tanto harm is waggling its eyebrows towards something we should do differently, we need to ask things like:

If the system isn’t deployed, what are the likely alternatives to it?
What are the benefits of deploying the system, relative to the alternatives?
What are the harms of deploying the system, relative to the alternatives?
What are the different options for deploying the system?
What resources would it take to get rid of the pro tanto harms of the system?
What would these resources otherwise be used for?
Would deploying the system prevent better alternatives from arising?
Can we gain useful information about future systems from this one?

If we want to figure out what we have all things considered reasons to do, it’s not good enough to point out the bad consequences of an AI system, even if we also point out how to address these consequences. We need to weigh up all of the relevant moral considerations by answering questions like the ones above.

To give a more concrete example, suppose a decision system makes biased decisions about how to set bail. Should we change it? All else being equal, we should. But suppose it’s very difficult to fix the things that give rise to this bias. Does this mean that we shouldn’t deploy the system until we can fix it? Well, surely that depends on other things like what the existing bail system is like. If the existing system involves humans making extremely biased and harmful decisions, deploying a less biased (but far from perfect) system might be a matter of moral urgency. This is especially true if the deployed system can be improved over time.

Different moral theories will say different things about what we have all things considered reasons to do. If you’re a deontologist, finding out that a system violates someone’s right might imply that you shouldn’t deploy that system, even if the alternative is a system with much worse rights violations. If you’re a consequentialist about rights, you might prefer to replace the current system with one that violates fewer rights.

Regardless of your views about moral theories, arguments of the form “this system does something harmful” are very different from arguments of the form “we ought to develop this system differently” or “we ought not to deploy this system”. The former only requires arguing that, in isolation, the system does something harmful. The latter requires arguing that an action ought to be performed given all of the morally-relevant facts.

Since we can’t be certain about any one moral theory and since we have to try to represent a plurality of views, coming to all things considered judgments in AI ethics will often require a fairly complex evaluation of many relevant factors. Given this, it’s important that we don’t try to derive conclusions about what we have all things considered reasons to do about AI systems solely from pro tanto harm arguments.

It would be a mistake to read an article about painful stitches and to conclude that we should no longer carry out surgeries. And it would be a mistake to read an article about a harm caused by an AI system and conclude that we shouldn’t be using that AI system. Similarly, it would be a mistake to read an article about a benefit caused by an AI system and conclude that it’s fine for us to use that system.

Drawing conclusions about what we have all things considered reasons to do from pro tanto arguments discourages us from carrying out work that is essential to AI ethics. It discourages us from exploring alternative ways of deploying systems, evaluating the benefits of those systems, or assessing the harms of the existing institutions and systems that they could replace.

This is why we have to bear in mind that in AI ethics, “bad” often isn’t good enough.

Price gouging: are we shooting the messenger of inequality?

2020-10-30T00:00:00-07:00

Early in the pandemic, some people bought important supplies when the cost was low and sold them for marked up prices, i.e. they engaged in price gouging. There’s usually a pretty strong backlash against this and sometimes laws are even passed to prevent it from happening.

People with an economics background often get annoyed by this backlash. Suppose hand sanitizer was \$1 before the pandemic and a small number people bought it. After the pandemic hits, there are many more people who want to buy hand sanitizer at that \$1 price: far more people than the available number of hand sanitizers. By buying at the original price and charging more, price gougers ensure that the people that want the hand sanitizer most—as reflected in their willingness to pay more for it—actually get it. Increased demand also incentivizes companies to produce more of the relevant goods.

Why does price gouging feel wrong to us? Here’s a preliminary explanation: when we witness price gouging, we see a situation in which only two kinds of people can buy a scarce good: those who desperately need it and have to shell out a huge amount of money for it, and those with a lot of money who are simply happy to pay the increased cost. This feels like it exploits those who are desperate, and unfairly advantages those that are wealthy.

Are these intuitions about exploitation or unfairness justified?

Suppose a low-income parent with a sick child has to pay \$50 for a \$1 bottle of hand sanitizer from a price gouger. It looks a lot like the price gouger is just profiting from their desperation and creating no value. But if price gouging weren’t allowed, it’s not true that the parent would have the \$1 hand sanitizer. Instead, they’d probably have no hand sanitizer at all: it would be gone from the shelves before they arrive. Since the parent would rather have the hand sanitizer and be down \$50 than have no hand sanitizer at all, a world where price gouging is allowed is better for her than the one in which it isn’t. Price gouging makes it more likely that the hand sanitizer goes to the low-income parent and not to someone who doesn’t really need it.¹

Can price gouging ever be exploitative if the exchange involves no deception and leads to an outcome that the price gouger and the parent both prefer? In ethics, this gets called “mutually beneficial exploitation” and there’s a lot of debate about whether it’s possible.

We might think that this kind of exchange is wrong because there’s a more welfare-maximizing option available to the price gouger: namely, to sell the hand sanitizer to those that need it most. This is different from what actually happens when the price gouger sells their goods because welfare isn’t tracked all that well by willingness to pay. Richer people are willing to pay more for goods that bring them less welfare, since the marginal cost of losing a dollar is lower for them.

But if “there’s a more welfare-maximizing option available” is our standard for exploitation almost all transactions are exploitative, including the store selling the hand sanitizer for \$1. There are almost certainly people who will not pay \$1 for hand sanitizer, but who would derive more welfare from the hand sanitizer than some of the people who are willing to pay \$1 for it.

Perhaps the most plausible response is that price gouging is just a particularly extreme example of this disparity between the market exchange and the one that maximizes welfare.

There are practical problems here, however. In order to determine the welfare-maximizing price, the price gouger would have to assess the needs and resources of each potential buyer and adjust their price accordingly. But identifying what resources they have and genuine need is extremely hard. A higher willingness to pay might be one of the most efficient ways for the price gouger to identify those with a higher need, given what they know.

Perhaps those with more information could try to distribute key goods in a way that maximizes welfare. For example, governments could come together and distribute hand sanitizer to those that need it most at a subsidized price. But the fact that they don’t do this is hardly something that can be blamed on the price gouger. So why do we direct our ire at them?

Ultimately, I suspect that at least some of our intuition that price gouging is wrong comes from the fact that when there are large wealth disparities, willingness to pay is a worse proxy for welfare. If someone with only \$10 is dying in the desert and comes across a water seller, the water will go to any wealthy person who is willing to pay \$11 to take a shower.

When we see these kinds of outcomes, I think we’re inclined to shoot the messenger of inequality: i.e. to blame whoever happens to be the person carrying out the final transaction. But this person is hardly to blame for the fact that such wealth inequality exists. They are also not in a good position to correct for it and are likely to be out competed if they try. (To say nothing of how this correction would affect the supply of important goods.)

If this is correct then we might want to redirect our ire at those with the ability to nudge things in a more welfare-maximizing direction. Governments can do so by redistributing some of the economic wealth we generate to the worst off, for example. But when governments outlaw price gouging, they’re probably just shooting the messenger of inequality. They haven’t improved the underlying situation—if anything, they seem to have made it worse—they’ve just shot the guy that was drawing attention to it.

Of course, it’s unlikely that the optimal distribution of wealth is a totally equal one. Wealth equality won’t be sustained if people are rewarded in accordance with the value they create and a smaller portion of a bigger pie is often better than a more equal portion of a smaller pie. So the world in which welfare is maximized in the long term might inevitably involve individual transactions along the way that are bad from a welfare-maximizing point of view. But it’s also unlikely that the optimal distribution of wealth involves the kind of disparities between the rich and the poor that results in some people taking showers next to others that are dying of thirst.

I won’t take a stance on the best balance to strike on growth and equality - I’ve already skirted some heavy economic territory here. I just want to point out that we’re often inclined to shoot the messengers of inequality even if they’re doing something that makes the world better, like making it more likely that important goods go to those that need them.

Shooting the messenger of inequality happens elsewhere too. People often think it’s exploitative to open low-wage factories or run drug trials in developing countries, for example. This is true even if people choose whether to work in those factories or to be enrolled in those drug trials, and even if their choice to do so seems reasonable given their other options.

If shooting the messenger of inequality is a real part of this phenomenon, it seems like one we should try to avoid. After all, yelling at doctors for telling us we’re sick won’t make us any healthier.

There might be some kind of luck-based view about fairness at play here: i.e. it’s better for everyone to have a similar chance at getting a single hand sanitizer for \$1 than for some people to have a higher chance at getting hand sanitizer by paying more for it. The system of restricting supply per person approximates this, but there are many issues with it. For example, it requires that people be prevented from re-selling their hand sanitizer if it’s to avoid devolving into distributed price gouging. This means it can result in an outcome that everyone would prefer to change: each winner prefers to sell to a loser that prefers to buy. I’d be interested in hearing a defense of restricting supply per person, however, since it seems to be a common practice. ↩

Fairness, evidence, and predictive equality

2020-08-17T00:00:00-07:00

In-person exams in the UK were canceled this year because of the pandemic, so results were given using a modeling system that looked at “the ranking order of pupils and the previous exam results of schools and colleges”. I don’t know how the modelling system took into account previous results of schools and colleges, but I’m going to assume that students from schools with a worse track record on exams were predicted to have lower grades. This has, understandably, caused a lot of controversy.

I think this might be a good example of a case where using information feels unfair even if it makes our decision more accurate. It’s very likely that previous school performance helps us make better predictions about current school performance. Yet it feels quite unfair to give people from lower performing schools worse grades than those from higher performing schools if everything else about them is the same.

To take a similar kind of case, suppose a judge’s goal is to get people who have committed a crime to show up to court in a way that minimizes costs to defendants and the public. How should she take into account statistical evidence about defendants?

First, let’s consider spurious correlations in the data that are not predictive. Suppose we divide defendants into small groups, such as “red-headed Elvis fans born in April”. If we do this, we’ll find that lots of these groups have higher than average rates of not showing up for court. But if these are mostly statistical artifacts that aren’t caused by any underlying confounders, the judge would do better by her own lights if she mostly just ignored them.

Things get trickier when the correlations are predictive. For example, suppose that night shift workers are less likely to show up to court on average. Their court date is always set for a time when they aren’t working, so being a night shift worker doesn’t seem to be a direct cause of not showing up to court. But the correlation is predictive. Given this, the judge would do better by the standards above if she increases someone’s bail amount when she finds out they’re a night shift worker. This is true even if most night shift workers would show up to court.

As in the UK grades case, this feels intuitively unfair to night shift workers.

One principle that might be thought to ground our intuition for why this is unfair is the following:

Causal fairness principle (CFP): it’s fair to factor properties of people into our decision-making if and only if those properties directly cause an outcome that we have reason to care about¹

This principle looks plausible and would explain why the grades case and the night shift workers case both feel unfair. Night shift work doesn’t seem to cause not showing up to court, and going to a low performing school doesn’t directly cause getting a lower grade. But I think this principle is inconsistent with our intuitions in other cases.

To see why, suppose that night shift workers are more likely to live along poor bus routes. This means that they often miss their court appointment because their bus was running late or didn’t show up. And this explains the entire disparity between night shift workers and others: if a night shift worker doesn’t live along a poor bus route then they will show up to court as much as the average person and if a non-night shift worker that lives along a poor bus route, they will show up at court at the same (lower) rate as night shift workers that live along poor bus routes.

The judge receives this new information and responds by increasing the bail of anyone who lives along poor bus routes. By CFP her decision would be fair, since it only takes into account properties that are direct causes of the outcome she cares about. (And the outcomes will better relative to her goals because this heuristic gets at the underlying causal facts more than the night shift workers heuristic does). But I think her decision is intuitively unfair.

In response to this case, we might adjust CFP to say that a decision is fair only if the causal factors in question are currently within the control of the agent.

This addition makes some intuitive sense because factors outside of an agent’s control are often not going to be responsive to whatever incentives we are trying to create. In this case, however, the place that the agent lives is at least partially in their control, even if moving would be very financially difficult for them. The behavior of people who live along poor bus routes is also likely to be responsive to incentives. People who live along poor bus routes are more likely to leave earlier to get to court if failing to show up means foregoing a high bail amount.

We also often think that it’s fair to consider causally relevant factors that are outside someone’s control when making decisions. Suppose you’re deciding whether to hire someone as a lawyer or not and you see that one of the applicants is finishing a medical degree rather than a law degree. It seems fair to take this into account when making your decision about whether to hire them, even if we suppose that the candidate currently has no control over the fact that they will have a medical degree rather than a law degree, e.g. because they can’t switch in time to get a law degree before the position starts.

These are reasons to be skeptical of CFP in the “if” direction (if a property is causally relevant then it’s fair to consider it) but I believe we also have reasons to be skeptical of the principle in the “only if” direction (it’s only fair to consider a property if it’s causally relevant).

To see why, consider a case in which the judge asks a defendant “are you going to show up to your court date?” and the defendant replies “no, I have every intention of fleeing the country”. Should the judge take this utterance into account when deciding how to set bail? This utterance is evidence that the defendant has an intention to flee the country, and having this intention is the thing that’s likely to cause them to not show up to their court date. The utterance itself doesn’t cause the intention and it won’t cause them to flee the country: the utterance is just highly correlated with the defendant having an intention to flee (because this intention is likely the cause of the utterance). So CFP says that it’s unfair for the judge to take this utterance into account when making her decision. That doesn’t seem right.

To avoid this, we might try to weaken CFP and say that it’s fair to take properties of someone into account only if having those properties is evidence that a person has another property that’s causally relevant to the outcome. But this weakens the original principle excessively, since even the most spurious of correlations will be evidence that a person has a property that’s causally relevant to the outcomes we care about. This includes race, gender, etc. since in an unequal society many important properties will covary with these properties. In an ideal world, we would only get evidence that someone has a causally relevant property when the person actually has the causally relevant property. But we don’t live in an ideal world.

Perhaps we can get around some of these problems by moving to a more graded notion of fairness. This would allow us to amend the principle above as follows:

Graded causal fairness principle (GCFP): Factoring a piece of evidence about someone into our decision-making is fair to the degree that it is evidence that the person has properties that directly cause an outcome that we have reason to care about²

Since coincidental correlations will typically be weaker evidence of a causally-relevant property than correlations that are the result of a confounding variable, GCFP will typically say that it’s less fair to take into account properties that could just be coincidentally correlated with the outcome we care about.

Although this seems like an improvement, GCFP still doesn’t capture a lot of our intuitions about fairness. To see this, consider again the case of night shift workers. Suppose that we don’t yet know why night shift work is so predictive of not showing up to court. By GCFP, it would be fair for the judge to assign night shift workers higher bail as long as the correlation between night shift work and not showing up to court were sufficiently predictive, since a correlation being more predictive is evidence that there’s an underlying causally relevant factor. Once again, though, I think a lot of people would not consider this to be fair.

Let’s throw in a curve ball. Imagine that two candidates are being interviewed for the same position. Both seem equally likely to succeed, but each of them has one property that is consistently correlated with poor job performance. The first candidate is fluent in several languages, which has been found to be correlated with underperformance for reasons not yet known (getting bored in the role, perhaps). The second candidate got a needs-based scholarship in college, which has also been found to be correlated with underperformance for reasons not yet known (getting less time to study in college, perhaps).

Suppose the candidates both want the job equally and that these properties are equally correlated with poor performance. The company can hire both of the candidates, one of them, or neither. How unfair does it feel if the company hires the person fluent in many languages but not the person who received a needs-based scholarship to college? How unfair does it feel if the company hires the person who received a needs-based scholarship to college but not the person who is fluent in many languages?

I don’t know if others share my intuitions, but even if it feels unfair for the company to hire only one of the candidates instead of both or neither, the situation in which they reject the candidate who received a needs-based scholarship feels worse to me than the situation in which they reject the candidate who is fluent in several languages.

One possible explanation for this is that we implicitly believe in a kind of “predictive equality”.

We often need to make decisions based on facts about people that are predictive of the properties that are causally-relevant to our decision but aren’t themslves causally-relevant. We probably don’t feel so bad about this if the property in question is not generally disadvantageous, i.e. over the course of a person’s life the property is just as likely to be on the winning and losing end of predictive decisions.

Let’s use the term “predictively disadvantageous properties” to refer to properties that need not be bad in themselves (they could be considered neutral or positive) but that are generally correlated with worse predicted outcomes. It often feels unfair to base our decisions on predictively disadvantageous properties because we can foresee that these properties will more often land someone on the losing end of predictive decisions.

Consider a young adult who was raised in poverty. They are likely predicted to have a higher likelihood of defaulting on a loan, more difficulty maintaining employment, and worse physical and mental health than someone who wasn’t raised in poverty. Using their childhood poverty as a predictor of outcomes is therefore likely to result in them fairly consistently having decisions being made in ways that assumes worse outcomes from them. And it can be hard to do well—to get a loan to start a business, say—if people believe you’re less likely to flourish.

Cullen O’Keefe put this in a way that I think is useful (and I’m now paraphrasing): we want to make efficient decisions based on all relevant information, but we also want risks to be spread fairly across society. We could get both by just making the most efficient decisions and then redistributing the benefits of these decisions. But many people will have control only over one of these things: e.g. hirers have control over the decisions but not what to do with the surplus.

In order to balance efficiency and the fair distribution of risks, hirers can try to improve the accuracy of their predictions but also make decisions and structure outcomes in a way that mitigates negative compounding effects of predictively disadvantageous properties.

For example, imagine you’re an admissions officer considering whether to accept someone to a college and you know that students from disadvantaged areas tend to do drop out more. It would probably be bad to simply pretend that this isn’t the case when deciding which students to accept. Ignoring higher dropout rates could result in applicants from disadvantages areas taking on large amounts of student debt that they will struggle to pay off if they don’t complete the course.³ But it might be good in the long-term if you err on the side of approving people from disadvantaged areas in more borderline cases, and if you try to find interventions that reduce the likelihood that these students will drop out.

Why should we think that doing this kind of thing is socially beneficial in the long-term? Because even if predictions based on features like childhood poverty are more accurate, failing to improve the prospects of people with predictively disadvantageous properties can compound their harms and create circumstances that it’s hard for people to break out of. Trying to improve the prospects of those with predictively disadvantageous properties gives them the opportunity to break out of a negative prediction spiral: one that they can find themselves in through no fault of their own.

But taking actions based on predictively negative properties doesn’t always seem unfair. Consider red flags of an abusive partner, like someone talking negatively about all or most of their ex-partners. Having a disposition to talk negatively about ex-partners is not a cause of being abusive, it’s predictive of being abusive. This makes it a predictively disadvantageous property, since it’s correlated with worse predictive outcomes. But being cautious about getting into a relationship with someone who has this property doesn’t seem unfair.

Maybe this is just explained by the fact that we want to make decisions that lead to better outcomes in the long-term. Long-term, encouraging colleges to admit fewer students from disadvantageous areas is likely to entrench social inequality, which is bad. Long-term, encouraging people to avoid relationships with those who show signs of being abusive is likely to reduce the number of abusive relationships, which is good.

How can we tell if our decisions will lead to better outcomes in the long-term? This generally requires asking things like whether our decision could help to detach factors that are correlated with harmful outcomes from those harmful outcomes (e.g. by creating the right incentives), whether they could help us isolate causal from non-causal factors over time, and whether the goals we have specified are the right ones. The short but unsatisfactory answer is: it’s complicated.

Thanks to Rob Long for a useful conversation on this topic and for recommending Ben Eidelson’s book, which I haven’t manage to read but will now recklessly recommend to others. Thanks also to Rob Long and Cullen O’Keefe for their helpful comments on this post.

I added the “have reason to care about” clause because if a judge cared about “being a woman and showing up to court” then gender would be causally relevant to the outcome they we care about and therefore admissible, but it seems ad hoc and unreasonable to care about this outcome. ↩
An ideal but more complicated version of this principle would likely talk about the weight that we give to a piece of evidence rather than just whether it is a factor in our decision. ↩
Thanks to Rob Long for pointing out this kind of case. ↩

AI bias and the problems of ethical locality

2020-08-05T00:00:00-07:00

Summary

In this post I argue that attempts to reduce bias in AI decision-making face two ‘ethical locality’ problems. The first ethical locality problem is the problem of practical locality: we are limited in what we can do because the actions available to us depend on the society we find ourselves in. The second ethical locality problem is the problem of epistemic locality: we are limited in what we can do because ethical views evolve over time and vary across regions.

The practical locality problem implies that we can have relatively fair procedures whose outputs nonetheless reflect the biases of the society they are embedded in. The epistemic locality problem gives us reason to understand the problems of AI bias to be instances of the broader problem of AI alignment: or the problem of getting AI to act in accordance with our values. Given this, I echo others in saying that our goal should not be to ‘solve’ AI bias. Instead, our goal should be to build AI systems that mostly reflect current values on questions of bias and that facilitate and are responsive to the progress we make on these questions over time.

Jenny and the clock factory

You are a progressive factory owner in the 1860s. Your factory makes clocks and hires scientists to help develop the clocks, managers to oversee people, and workers to build the clocks. The scientists and managers are in low supply and the roles are paid well, while the workers are in higher supply and receive less compensation. You’ve already increased wages as much as you can, but you want to make sure your hiring practices are fair. So you hire a person called Jenny to find and recruit candidates to each role.

Jenny notes that in order to be a scientist or a manager, a person has to have many years of schooling and training. Women cannot currently receive this training and the factory cannot provide this training because it lacks the resources and expertise needed to do so. Many female candidates show at least as much promise as male candidates, but their lack of this crucial prior training makes them unsuited to any role except worker. Despite her best efforts, Jenny ends up hiring only men to the roles of scientist and manager, and hires both men and women as workers.

Jenny’s awareness of all the ways in which the factory’s hiring practices are unfair is limited, however, because there are sources of unfairness that have yet to be adequately recognized in the 1860s. For example, it is not considered unfair to reject candidates with physical disabilities for worker roles rather than trying to make adequate accommodations for these disabilities. Given this, Jenny rejects many candidates with physical disabilities rather than considering ways in which their disabilities could be accommodated.

The practical locality problem

How fair is Jenny being with respect to gender? To try to answer this, we need to think about the relations between three important variables: gender (G), training (T) and hiring (H).

Deciding to hire a candidate only if they have relevant training (T→H) seems fair since the training is necessary for the job. Deciding to hire a candidate based on their gender alone (G→H) seems unfair, since gender is irrelevant to the job. The fact that women cannot receive the training (G→T) also seems unfair. But, unlike the relationship between T and H and the relationship between G and H, the relationship between G and T is exogenous to Jenny’s decision: it is one that Jenny cannot affect.

To model the situation, we can use dashed arrows to represent exogenous causal relationships—in this case, the relationship between G and T—and solid arrows to represent endogenous causal relationships. We can use red arrows to indicate causal relationships that actually exist between G, T, and H and we can use grey arrows to highlight the fairness of possible causal relationships. Jenny’s situation is as follows:

In this case, there is an important sense in which Jenny’s hiring decision not to hire is fair to each woman who applies because Jenny would have made the same decision had a man with the same level of training applied. If women were given the necessary training, Jenny would hire them. If men were denied the necessary training, Jenny would not hire them. (Her decision therefore satisfies the counterfactual definition of fairness given by Kusner et al, though see Kohler-Hausmann for a critique of counterfactual causal models of discrimination.)

But there is also an important sense in which the fact that Jenny hires only men into scientist and manager roles is unfair. The unfairness is upstream of Jenny. The outcome is unfair because her options are limited by unfair societal practices, i.e. by the fact that women are denied the schooling and training necessary to become scientists and managers.

I’m going to use the term ‘procedurally unfair’ to refer to decisions that are unfair because of unfairness in the decision-making procedure being used. Chiappa and Gillum say that ‘a decision is fair toward an individual if it coincides with the one that would have been taken in a counterfactual world in which the sensitive attribute along the unfair pathways were different’. Building on this, I will say that a decision is procedurally unfair if it diverges from the one that would have been taken in a counterfactual world in which the sensitive attribute along the unfair endogenous pathways were different.

I’m going to use the term ‘reflectively unfair’ to refer to decisions that may or may not be procedurally unfair, but whose inputs are the result of unfair processes, and where the outcomes ‘reflect’ the unfairness of those processes. This is closely related to Chiappa and Isaac’s account of the fairness of a dataset as ‘the presence of an unfair causal path in the data-generation mechanism’. I will say that a decision is reflectively unfair if it diverges from the one that would have been taken in a counterfactual world in which the sensitive attribute along the unfair exogenous pathways were different.

Since decision-makers cannot always control or influence the process that generates the inputs to their decisions, the most procedurally fair options available to decision-makers can still be quite reflectively unfair. This is the situation Jenny finds herself in when it comes to hiring women as scientists and managers.

When it comes to hiring and gender, Jenny has encountered what I will call the practical locality problem. The options available to Jenny depend on the practices of the society she is embedded in. This means that even the most procedurally fair choice can reflect the unfair practices of this society. (What’s worse is that all of the options available to Jenny may not only reflect but to some degree reinforce those practices. Hiring women who cannot perform well in a given role and failing to hire any women into those roles could both be used to reinforce people’s belief that women are not capable of performing these roles.)

The epistemic locality problem

How fair is Jenny with respect to disability status? I think that Jenny is being unfair to candidates with physical disabilities. But the primary cause of her unfairness isn’t malice or negligence: it’s the fact that Jenny lives in a society hasn’t yet recognized that her treatment of those with physical disabilities is unfair. Although we may wish Jenny would realize this, we can hardly call it negligent of Jenny to not have advanced beyond the moral understanding of almost all of her contemporaries.

If we use D to indicate disability status and a subscript to indicate the values and beliefs that a decision is considered fair or unfair with respect to (i.e. FAIR_X means ‘this was generally considered fair in year X’), the model of Jenny’s situation is fairly simple:

When it comes to hiring and disability, Jenny is facing what I will call the epistemic locality problem. As we learn more about the world and reflect more on our values, our ethical views become more well-informed and coherent. (For moral realists, they can get better simpliciter. For subjectivists, they can get better by our own lights.) The limits of our collective empirical knowledge and our collective ethical understanding can place limits on how ethical it is possible for us to be at a given time, even by our own lights. This is the epistemic locality problem.

I call these problems ‘ethical locality’ problems because they’re a bit like ethical analogs of the principle of locality in physics. The practical locality problem points to the fact that the set of actions available to us is directly impacted by the practices of those close to us in space and time. The epistemic locality problem points to the fact that our ethical knowledge is directly impacted by the ethical knowledge of those that are close to us in space and time. (But, as in physics, the causal chain that generated the local circumstances may go back a long way.)

Current AI systems and the problems of ethical locality

Are AI systems in the 2020s in a qualitatively different position than the one Jenny finds herself in? Do they have a way of avoiding these two ethical locality problems? It seems clear to me that they do not.

AI systems today face the practical locality problem because we continue live in a society with a deeply unfair past that is reflected in current social institutions and practices. For example, there are still large differences in education across countries and social groups. This doesn’t mean that there isn’t a lot of work that we need to do to reduce procedural bias in existing AI systems. But AI systems with little or no procedural bias as defined above will still make decisions or perform in ways that are reflectively biased, just as Jenny does.

AI systems today also face the epistemic locality problem. Even if we think we have made a lot of progress on questions of bias since the 1860s, we are still making progress on what constitutes bias, who it is directed at, and how to mitigate it. And there are almost certainly attributes that we are biased against that aren’t currently legally or ethically recognized. In the future, the US may recognize social class and other attributes as targets of bias. The standards used to identify such attributes are also likely to change over time.

Future accounts of bias may also rely less on the concept of a sensitive attribute. Sensitive attributes like gender, race, etc. are features of people that are often used to discriminate against them. Although it makes sense to use these broad categories for legal purposes, it seems likely that more characteristics are discriminated against than the law currently recognizes (or can feasibly recognize). In the future, our concept of bias could be sensitive to bias against individuals for idiosyncratic reasons, such as bias against a job candidate because their parents didn’t donate to the right political party.

I hope it’s not controversial to say that we probably haven’t reached the end of moral progress on questions of bias. This means we can be confident that current AI systems, like Jenny, face the problem of epistemic locality.

Consequences of the practical locality problem for AI ethics

The practical locality problem shows that we can have procedurally fair systems whose outputs nonetheless reflect the biases of the society they are embedded in. Given this, I think that we should try to avoid implying that AI systems that are procedurally fair by our current standards are fair simpliciter. Suppose the factory owner were to point to Jenny and say ‘I know that I’ve only hired men as scientists and managers, but it’s Jenny that made the hiring decisions and she is clearly a fair decision-maker.’ By focusing on the procedural fairness of the decisions only, the owner’s statement downplays their reflective unfairness.

We therefore need to be aware of the ways in which AI systems can contribute to and reinforce existing unfair processes even if those systems are procedurally fair by our current standards.

The practical locality problem also indicates that employing more procedurally fair AI systems is not likely to be sufficient if our goal is to build a fair society. Getting rid of the unfairness that we have inherited from the past—such as different levels of investment in education and health across nations and social groups—may require proactive interventions. We may even want to make decisions that are less procedurally fair in the short-term if doing so will reduce societal unfairness in long-term. For example, we could think that positive discrimination is procedurally unfair and yet all-things-considered justified.

Whether proactive interventions like positive discrimination are effective at reducing societal unfairness (as we currently understand it) is an empirical question. Regardless of how it lands, we should recognize that increasing procedural fairness may compete with other things we value, such as reducing societal unfairness. Building beneficial AI systems means building systems that make appropriate trade-offs between these competing values.

Consequences of the epistemic locality problem for AI ethics

If we think we have not reached the end of moral progress on ethical topics like bias, the language of ‘solving’ problems of bias in AI seems too ambitious. We can build AI systems that are less procedurally biased, but saying that we can ‘solve’ a problem implies that the problem is a fixed target. The ethical problems of bias are best thought of as moving targets, since our understanding of them updates over time. Rather than treating them like well-formed problems just waiting for solutions, I suspect we should aim to improve our performance with respect to the current best target. (This is consistent with the view that particular subproblems relating to AI bias that are fixed targets that can be solved.)

In general, I think a good rule of thumb is ‘if a problem hasn’t been solved despite hundreds of years of human attention, we probably shouldn’t build our AI systems in a way that presupposes finding the solution to that problem.’

If our values change over time—i.e. if they update as we get more information and engage in more ethical deliberation—then what is the ultimate goal of work on AI bias and AI ethics more generally? I think it should be to build AI systems that aligned with our values, and that promote and are responsive to ongoing moral progress (or ‘moral change’ for those that don’t think the concept of progress is appropriate here). This includes changes in our collective views about bias.

What does it mean to say that we should build AI systems that ‘align with our values’? Am I saying that systems should align to actual preferences, ideal preferences, or partially ideal preferences? Am I saying that they should align to individual or group preferences and, if the latter, how do we aggregate those preferences and how do we account for differences in preferences? Moreover, how do we take into account problems like the tyranny of the majority or unjust preferences? These are topics that I will probably return to in other posts (see Gabriel (2020) for a discussion of them). For the purposes of this post, it is enough to say that building AI systems that are align with our values means building AI systems that reflect current best practices on issues like bias.

Progress on AI alignment is imperative if we want to build systems that reflect our current and future values about bias.

Problems in AI bias also constitute concrete misalignment problems. Building systems that don’t conflict with our values on bias means giving the right weight to any values that conflict, figuring out how to respond to differences in values across different regions, and building systems that are consistent with local laws. These present us with very real, practical problems when it comes to aligning current AI systems with our values. More powerful AI systems will likely present novel alignment problems, but the work we do on problems today could help build out the knowledge and infrastructure needed to respond to the alignment problems that could arise as AI systems get more powerful.

If this picture is correct then the relationship between AI alignment and AI bias is bidirectional. Progress in AI alignment can help us to improve our work on AI bias, and progress in AI bias can help us to improve our work on the problem of AI alignment.

Thanks to Miles Brundage, Gretchen Krueger, Arram Sabeti and others that provided useful comments on the drafts of this post.

When robustly tolerable beats precariously optimal

2020-07-01T00:00:00-07:00

Something is “robustly tolerable” if it performs adequately under a wide range of circumstances. Robustly tolerable things have decent insulation against negative shocks. A car with excellent safety features but a low top speed is robustly tolerable. A fast but dangerous sports car is not.

We often have to pay a price to make something more robustly tolerable. Sometimes we need to trade off performance. If I can only perform an amazing gymnastics routine 10% of the time, it might be better for me to opt for a less amazing routine that I can get right 90% of the time. Sometimes we need to trade off agility. If a large company develops checks on their decision-making processes over time, this may make their decisions more robustly tolerable but reduce the speed at which they can make those decisions.

Being robustly tolerable is not a particularly valuable trait when the expected costs of failure are low, but it’s an extremely valuable trait when the expected costs of failure are high. The more high impact something is—the more widely a technology is used or the more important a piece of infrastructure is, for example—the more we want it to be robustly tolerable. When a lot is on the line, we’re more likely to opt for a product that is worse most of the time but has fewer critical failures.

What are examples where the expected costs of failure are high? It’s clearly very bad if an entire country is suddenly governed poorly. The costs of total failures of governance have historically been very high. This is why being robustly tolerable is a very desirable feature of large-scale governance structures. If your country is already functioning adequately with democractic elections, term limits, a careful legislative process, and checks on power—several branches of government, an independent judiciary, a free press, laws against corruption, and so on—then it seems less likely to suddenly be plunged into an authoritarian dictatorship or to experience political catastrophes like hyperinflation or famine.

I think we can undervalue the property of being robustly tolerable. When we see something that is robustly tolerable, sometimes all we see is a thing that could clearly perform better. (The car could go faster, the decision-making process could be less burdensome, etc.) We don’t take into account the fact that—even if the thing never behaves optimally—it’s also less likely to do something terrible. How well something functions often is in plain sight. But the downside risk isn’t visible most of the time, so it’s easy to forget to look at how robust its performance is. Overlooking robustness could be especially harmful if the only way to improve something’s performance involves making it less robust.

For example, if a candidate we dislike gets elected, it can be tempting to blame the democratic process that allowed it to happen. People can even claim that it would be better to have less democracy than to have people elect such bad representatives. But the very same democratic process often limits the power of that individual and lets people vote them out. A benevolent dictatorship may seem surprisingly alluring in bad times, but any political system that enables a benevolent dictatorship also puts you at much greater risk of a malevolent one. (As an aside, I find it a bit odd when people’s reaction to “bad decisions by the electorate” is to give up on democracy rather than, say, trying to build a more educated and informed electorate.)

Actions and plans can also vary in how robustly tolerable they are. Risk-taking behaviors like starting a company are generally less robustly tolerable, while lower-variance plans, e.g. getting a medical degree, are more robustly tolerable. In line with what I noted in a previous post, we should generally be in favor of robustly tolerable actions and plans when the expected cost of failure is high, and in favor of more fragile but high-yield behaviors and plans when the expected cost of failure is low.

Being robustly tolerable is not always a virtue worth having more of. We can tip the balance between too far in favor of robustness, and we can sacrifice too much performance or agility in order to achieve it. If we do, we can find ourselves in a robust mediocrity that it’s difficult to get out of. (You may believe that some of the examples I give above are robustly mediocre rather than robustly tolerable.)

But if something is robustly tolerable then the worst case scenarios are less likely and less harmful. This is a valuable trait to have in domains where the cost of failure is high. It’s also a trait that’s easy to overlook if we focus exclusively on how well something is performing in the here and now, and forget to consider how well it performs in the worst case scenario.

The virtues and vices of shark curiosity

2020-06-22T00:00:00-07:00

In philosophy, you spend years learning how to attack arguments. If you keep doing philosophy, you’ll attack others and they’ll attack you in what feels like a kind of constant epistemic trial by fire. It’s not always fun, but it does seem to make people better at arguing.

Sometimes people ask how they can hone these skills. The least useful answer to this is some variant of “sorry, you just have be good at it”. The degree to which argumentative skill is an innate talent is unclear. Even if most of those who end up in fields like philosophy are often innately good at it, this could just be an example of an unfortunate selection spiral in which only those who are innately good at the thing pursue it, and therefore only those who never really needed to be taught the thing end up teaching it.

A slightly more useful answer involves recommending texts on critical thinking, classes in formal logic, and so on. But this isn’t how most people become good at arguing. I haven’t ever taken a critical thinking class, and I didn’t learn formal logic until after I had already developed a lot of the skills that I’m talking about here. So what’s going on?

I once heard that sharks generally don’t bite people because they want to eat them. They bite people because they reflexively bite at anything that looks kind of like a fish (which can include humans) and because biting us is their way of trying to figure out what we are.

Like the intellectual equivalent of sharks, people who are very good at arguing seem to have a habit of reflexively attacking most claims and arguments that come their way. For example, they might see “up to 40% off” and get annoyed by the fact that the claim tells you nothing except that the store definitely won’t give you more than 40% off, which can be claimed by a store offering 0% off. Attacking a claim is their default response, even if the claim is fairly trivial.

For me, this reflex is often at its strongest when I’m confused by something. If someone puts forward a claim that doesn’t make sense to me, I do the intellectual equivalent of biting it to figure out what it is (i.e. I try to tear it apart). This strategy can be pretty effective, since people will often put effort into clarifying what they mean when their views are challenged.

So an effective way to improve your argumentative skills and become a clearer thinker may be to become more curious about the world and, at the same time, more aggressive towards it. You investigate more things, but your reflexive method of investigation is somewhat bitey. We can call this the “curious shark” approach. This strikes me as similar to what a lot of philosophy programs actually do in practice. They throw argument after argument at you and force you to come up with counterargument after counterargument. In order to get better at both defending and attacking, you’ll probably try to learn some logic or probability theory, but it’s the unrelenting practice that forces you to find better strategies over time. (Alan Hájek has helpfully distilled some of the strategies that many philosophers converge on.)

I think this partly explains why philosophers often end up defending pretty weird views. The discipline of philosophy is obsessed with argumentative prowess. Since it’s not all that hard to argue for something that most people find plausible, those arguments are not very impressive. But if you manage to argue that all possible worlds are real and meet the inevitable argumentative onslaught that follows, that’s pretty damn impressive. Arguing for an implausible conclusion is like tying your hands behind your back before entering a tank full of sharks. You’re definitely going to get attacked, but everyone will be all the more impressed if you come out successful.

One problem with the curious shark approach is that, from the point of view of anything they bite, sharks are assholes. That’s not bad for the shark because they don’t particularly want to make friends with the things they’re biting. But people do want to make friends with those around them (or at least not lose friends they have), and constantly tearing down their arguments isn’t exactly the best way to do that.

A related problem with the approach is that most ideas have to start out life as vulnerable little fish before they can grow into something more robust (see this post). If you create an environment where people have to defend their ideas from hungry sharks from day one, people will learn to either hide their ideas or stop coming up with ideas in domains they’re not already extremely well-versed in.

This was true of my philosophy grad program. It was a competitive environment, which was good for honing your ideas once you’d been working on them for a while. But it felt like no one really wanted to express nascent ideas. You knew that if you put forward an idea it would be attacked ruthlessly. So it made more sense to hole away and do the work yourself, and to only show your ideas when they had grown robust enough to withstand the attack. This is unfortunate because early discussions of ideas can be extremely helpful, and is presumably how you get the most value from having other grad students around.

I’ve also experienced the other extreme. I once went to a conference that was trying to move away from the traditional aggression seen in philosophy conferences and embrace a more supportive atmosphere. I thought I saw a problem in a paper and stated it honestly in the Q&A. I felt like my problem was never really fully addressed but most of the remaining comments were things like “here’s an interesting domain where your analysis might apply” or “have you read so-and-so’s related work? I think you’d like it.” At the time I felt like I’d breached a social norm by pointing out a problem with the paper so bluntly, but I also felt like I was doing a bigger favor to the author than any of the more supportive commenters were because the paper would be strengthened the most by fixing problems like the one I was pointing to.

So what are we supposed to do here? If we’re too aggressive with ideas we can kill promising but unrefined ideas when they’re most vulnerable, but if you coddle ideas you can fail to strengthen them early on and set them on the right path (or, worse, let someone work on an idea for a long time that really should have been abandoned much sooner).

Some people try to get around this dilemma by distinguishing between aggressive content and an aggressive tone. The thought is that if we deliver our biting criticism with a kinder tone, we can avoid the chilling effect that comes with biting criticism. It’s true that an aggressive tone can make an intellectual attack feel even more stressful, and perhaps an aggressive tone should never be necessary. But I don’t think a friendlier tone would fully eliminate the chilling effect or the “you’re an asshole” effect. It’s a little bit like moving from a barroom-brawl to a well-regulated boxing match: rules might help, but getting punched in the face is still going to hurt.

Here’s the only thing I’ve found that helps: I point out problems with ideas at every stage of development, but I try my hardest to solve any problem that I identify. Even if I don’t succeed in getting over my own objection, I make an effort. If you show that the goal of your attack isn’t to merely destroy the other person’s idea and declare a personal victory, but to jointly get at the truth and build on whatever part of the idea seems promising, the attacks you level are more likely to have the effect of strengthening rather than killing a promising but unrefined idea. And if the idea does die (as some ideas will), it’s more likely to do so because you’ve both tried to make it work and jointly concluded that it won’t, which ideally doesn’t discourage the other person from voicing similar promising but unrefined ideas in the future.

So if you want to become a sharper thinker, the adversarial training you get from habitually attacking ideas and welcoming attacks from others seems pretty effective. But I think you can do this while minimizing the chilling effect and the “you’re an asshole” effect by treating it as your job to try to counter your own attacks to the best of your ability. I’m not sure if this is the best solution to this problem, but it’s the best one I’ve come up with so far.

The optimal rate of failure

2020-06-15T00:00:00-07:00

It was apparently George Stigler that said “If you never miss a plane, you’re spending too much time at the airport.” The broader lesson is that if you find you’re never failing, there’s a good chance you’re being too risk averse, and being too risk averse is costly. Although people have discussed this principle in other contexts (e.g. in learning and startup investing), I still think that this lesson is generally underappreciated. For anything we try to do, the optimal rate of failure often isn’t zero: in fact, sometimes it’s very, very far from zero.

To give a different example, I was having an argument with a friend about whether some new social policy should be implemented. They presented some evidence that the policy wouldn’t be successful and argued that it therefore shouldn’t be implemented. I pointed out we didn’t need to show that the policy would be successful, we just needed to show that the expected cost of implementing it was lower than the expected value we’d get back both in social value and—more importantly—information value. Since the policy in question hadn’t tried before, wasn’t expensive to implement, and was unlikely to be actively harmful, the fact that it would likely be a failure wasn’t, by itself, a convincing argument against implementing it. (It looks like a similar argument is given in p. 236-7 of this book.)

This is why I often find myself saying things like “I think this has about a 90% chance of failure—we should totally do it!” (Also, there’s a reason why I’m not a startup founder or motivational speaker.)

The expected value of trying anything is just the sum of (i) the expected gains if it’s successful, (ii) the expected losses if it fails, and (iii) the expected cost of trying. This includes direct value (some benefit or loss to you or the world), option value (being in a better or worse position in the future) and information value (having more or less knowledge for future decisions).

The optimal rate of failure indicates how often you should expect to fail if you’re taking the right number of chances. So we can use our estimates of (i), (ii), and (iii) to work out what the optimal rate of failure for a course of action is, given the options available to us. The optimal rate of failure will be lower whenever trying is costly (e.g. trying it takes years and cheaper options aren’t available), failure is really bad (e.g. it carries a high risk of death), and the gains from succeeding are low. And the optimal rate of failure will be higher whenever trying is cheap (e.g. you enjoy doing it), the cost of failure is low, and the gains from succeeding are high.

If the optimal rate of failure of the best course of action is high, it may be a good thing to see a lot of failure (even though the course of action is best in spite of, rather than because of, its high rate of failure). I think we’re often able to internalize this: we recognize that someone has to play a lot of terrible music before they become a great musician, for example. But we’re not always good at internalizing the other side of this coin: if you never see someone fail, there’s a good chance that they’re doing something very wrong. If someone wants to be a good musician, it’s better to see them failing than to never hear them play.

So far, this probably reads like a life or business advice article (“don’t just promote people who succeed, or you’ll promote people who never take chances!“). But I actually think that failing to reflect on the optimal rate of failure can have some pretty significant ethical consequences too.

Politics is a domain in which things can go awry if we don’t stop to think about optimal rates of failure. Politicians have a strong personal incentive to not have the responsibility of failure pinned directly on them. We can see why if we consider the way that George H.W. Bush used the case of Willie Horton against Michael Dukakis in the 1988 presidential campaign. If a Massachusetts furlough program had not existed, Bush couldn’t have pointed to this case in his campaign. Not having any furlough program may be quite costly to many prisoners and their families, but “Dukakis didn’t support a more liberal furlough program” is unlikely to show up on many campaign ads. Now I don’t know if the Massachusetts furlough program was a good idea or not, but if politicians are held responsible for the costs of trying and failing but not for the costs of not trying, we should expect the public to pay the price of their risk aversion. (More generally, if we never see someone fail, we should probably pay more attention to whether it is them or someone else that bears the costs of their risk aversion.)

I think this entails some things that are pretty counterintuitive. For example, if you see crimes being committed in a society, you might think this is necessarily a bad sign. But if you were to find yourself in a society with no crime, it’s not very likely that you’ve stumbled into a peaceful utopia: it’s more likely that you’ve stumbled into an authoritarian police state. Given the costs that are involved in getting crime down to zero—e.g. locking away every person for every minor infraction—the optimal amount of crime we should expect to see in a well-functioning society is greater than zero. To put it another way: just as seeing too much crime is a bad sign for your society, so is seeing too little.

We can accept that seeing too little crime can be a bad sign even if we believe that every instance of crime is undesirable and that, all else being equal, it would be better for us to have no crime than for us to have any crime at all. We can accept both things because “all else being equal” really means “if we hold fixed the costs in both scenarios”. But if you hold fixed the costs of eliminating a bad thing then it is, of course, better to have less of it than more.

One objection that’s worth addressing here is this: can’t we point to the optimal rate of failure to claim that we were warranted in taking almost any action that later fails? I think that this is a real worry. To mitigate it somewhat, we should try to make concrete predictions about optimal rates of failure of our plans in advance, to argue why a plan is justified even if it has a high optimal rate of failure, and to later assess whether the actual rate of failure was in line with the predicted one. This doesn’t totally eliminate the concern, but it helps.

I first started thinking about optimal rates of failure relation to issues in effective altruism. The first question I had is: what is the optimal rate of failure for effective interventions? It seems like it might actually be quite high because, among other things, people are more likely to under-invest in domains with a high risk of failure, because of risk aversion or loss aversion or whatever else. I still think this is true, but I also think that in recent years there has been a general shift towards greater exploration over exploitation when it comes to effective interventions.

The second question I had is: what is the optimal rate of failure for individuals who want to have a positive impact and the plans they are pursuing? Again, I think the optimal rate of failure might be relatively high here, and for similar reasons. But this raises the following problem: taking risks is something a lot of people cannot afford to do. The optimal rate of failure for someone’s plans depends a lot on the cost of failure. If failure is less costly for someone, they are more free to pursue things that have a greater expected payoff but a higher likelihood of failure. Since people without a safety net can’t afford to weather large failures, they’re less free to embark on risky courses of action. And if these less risky courses of action produce less value for themselves and for others, this is a pretty big loss to the world.

To put it another way: if you’re able to behave in a way that’s less sensitive to risks, you’re probably either pretty irrational or pretty privileged. Since many of the people who could do the most good are not that irrational and not that privileged, enabling them to choose a more risk neutral course of action might itself be a valuable cause area. Investing in individuals or providing insurance against failure for those pursuing ethical careers would enable more people to take the kinds of risks that are necessary to do the most good.

Does deliberation limit prediction?

2019-07-09T00:00:00-07:00

There is a longstanding debate about the claim that “deliberation crowds out prediction” (DCP). The question at the center of this debate is whether I can treat an action as a live option for me and at the same time assign a probability to whether I will do it. Spohn and Levi argue that we cannot assign such probabilities, for example, while Joyce and Hájek argue that we can.

A claim related to DCP that I’ve been thinking about is as follows:

Deliberation limits prediction (DLP): If an agent is free to choose between her options, it will not always be possible to predict what action an agent will perform in a given state even if (i) we have full information about the state and the agent, and (ii) the agent does not use a stochastic decision procedure.

DLP is weaker than DCP in at least one respect: it doesn’t say that agents can never make accurate predictions about things they are deliberating about, just that they can’t always do so. DLP also stronger than DCP in at least one respect: it extends to the predictions that others make about the actions of agents and not just to the predictions that agents make about themselves.

Here is a case that I think we can use to support a claim like DLP:

The Prediction Machine

Researchers have created a machine that can predict what someone will do next with 99% accuracy. One of the new test subjects, Bob, is a bit of a rebel. If someone predicts he’ll do something with probability ≥50%, he’ll choose not to do it. And if someone predicts he’ll do something with probability <50%, he’ll choose to do it. The prediction machine is 99% accurate at predicting what Bob will do when Bob hasn’t seen its prediction. The researchers decide to ask the machine what Bob will do next if Bob is shown its prediction.

We know that no matter what the machine predicts, Bob will try to act in a way that makes its prediction inaccurate. So it seems that either the prediction machine won’t accurately predict what Bob will do, or Bob won’t rebel against the machine’s prediction. The first possibility is in tension with the claim that we can always accurately predict what an agent will do if we have access to enough information, while the second possibility is in tension with the claim that Bob is free to choose what to do.

(Note that we could turn this into a problem involving self-prediction by supposing that Bob is both the prediction machine and the rebellious agent: i.e. that Bob is very good at predicting his own actions and is also inclined to do the opposite of what he ultimately predicts. But since self-prediction is more complex and DLP isn’t limited to self-predication, it’s helpful to illustrate it with a case in which Bob and the prediction machine are distinct.)

The structure of the prediction machine problem is similar to that of many problems of self-reference (e.g. the grandfather paradox, the barber paradox; the halting problem). It’s built on the following general assumptions:

Prediction: there is a process f that, for any process, always produces an accurate prediction about the outcome of that process

Rebellion: there is a process g that, when fed a prediction about its behavior, always outputs a behavior different than the predicted behavior

Co-implementation: the process g(f) is successfully implemented

In this case, f is whatever process the prediction machine uses to predict Bob’s actions, g is the (deterministic) process that Bob uses when deciding between actions, and g(f) is implemented whenever Bob uses f as a subroutine of g. We can see that if process g(f) is implemented then either f does not produce an accurate prediction (contrary to Prediction) or g does not output a behavior different than the predicted one (contrary to Rebellion). Therefore it cannot be the case that there exists a process f and there exists a process g and the process g(f) is implemented, contrary to Co-implementation. So if agents are free to act and to use a deterministic decision procedure like Bob’s to pick their actions, it will not be possible to predict what they will do in all states (e.g. those described in the prediction machine example) even if we have full information about the state and the agent, as DLP states.

Joyce (p. 79–80) responds to a similar style of argument against our ability to assign subjective probabilities assigned to actions. The argument is that “Allowing act probabilities might make it permissible for agents to use the fact that they are likely (or unlikely) to perform an act as a reason for performing it.” Joyce’s response to this argument is as follows:

I entirely agree that it is absurd for an agent’s views about the advisability of performing any act to depend on how likely she takes that act to be. Reasoning of the form “I am likely (unlikely) to A, so I should A” is always fallacious. While one might be tempted to forestall it by banishing act probabilities altogether, this is unnecessary. We run no risk of sanctioning fallacious reasoning as long A’s probability does not figure into the calculation of its own expected utility, or that or any other act. No decision theory based on the General Equation will allow this. While GE requires that each act A be associated with a probability P(• || A), the values of this function do not depend on A’s unconditional probability (or those of other acts). Since act probabilities “wash out” in the calculation of expected utilities in both CDT and EDT, neither allows agents to use their beliefs about what they are likely to do as reasons for action.

The General Equation Joyce states that the expected value of an action is the probability of a state given that an action is performed (e.g. the state of getting measles given that you received a measles vaccine), multiplied by the utility of the outcome of performing that act in that state (e.g. the utility of the outcome “received vaccine and got measles”), where we sum over all possible states. This is expressed as Exp(A) = Σ P(S || A) u(o[A, S])).

But suppose that Bob derives some pleasure from acting in a way that is contrary to his or others’ predictions about how he will act. If this is the case, it certainly does not seem fallacious for his beliefs about others’ predictions of his actions to play a role in his deliberations (Joyce’s comments don’t bear on this question). Moreover, it does not seem fallacious for his own prior beliefs about how he will act to play a role in his decision about how to act, even if such reasoning would result in a situation in which either he either fails to accurately predict his own actions or fails to act in accordance with his own preferences. (Similar issues are also discussed in Liu & Price, p. 19–20)

When confronted with self-reference problems like this, we generally deny either Prediction or Rebellion. The halting problem is an argument against its variant of Prediction, for example. It shows that there is no program that can detect whether any program will halt on any input. (If the halting program is computable then the program that uses it as a subroutine is also computable, meaning that we can’t drop Rebellion and retain Prediction in response to it). The grandfather paradox, on the other hand, is generally taken to be an argument against its variant of Rebellion: there’s an important sense in which you can’t travel back in time and kill your own grandfather.

Denying Co-implementation is less common. This is because there is often no independent reason for thinking that g and f can never be co-implemented. And the argument shows that there is no instance in which f and g could ever be co-implemented, which remains true even if no one ever actually attempts to do so. Most of us would conclude from this that the processes cannot be co-implemented. (One could, in the spirit of compatiblism, argue that all we have shown is that f and g are never co-implemented and not that they cannot be co-implemented, but I assume most would reject this view.)

In the case of the prediction machine, we can deny that it’s possible for Bob to act in a way that’s contrary to the predictions that are made about him. This might be defensible in the case of self-prediction: if Bob cannot prevent himself from forming an accurate prediction about what he will do between the time that he forms the intention to act and the time that he acts, then he will never be able to rebel against his own predictions. But it is much less plausible in cases where Bob is responding to the predictions of others.

Alternatively, we could try to argue that Bob and the prediction machine will simply never communicate: perhaps every time the researchers try to run this experiment the machine will break down or spit out nonsense, for example. But this response is unsatisfactory for the reasons outlined above.

Finally, we could simply embrace DLP and concede that we cannot always produce accurate predictions about what agents like Bob will do, even if we have access to all of the relevant information about Bob and the state he is in. Embracing DLP might seem like a bad option, but the states we’ve identified in which we can’t make accurate predictions about agents are states in which our predictions causally affect the very thing that we are attempting to predict. It might not be surprising if it’s often impossible to make accurate predictions in cases where our predictions play this kind of causal role.

Conclusion: It seems like DLP could be true but, if it is, it might not be something that should concern us too much.

Disagreeing with content and disagreeing with connotations

2018-02-13T00:00:00-08:00

Suppose someone writes an article entitled “rates of false sexual assault accusation on the rise”. Now, suppose you care about sexual assault victims and you’re worried about unreported sexual assaults. When you see a title like this you think “this person just wants to smear sexual assault victims” and you promptly conclude that the article is wrong or that the person writing it has malicious intentions. (This article title and content are made up: the idea is just that it’s a controversial claim that might nonetheless be well supported.)

We often have a reflexive reaction to an article like this that we don’t even notice. It starts with a reasonable-looking inference: “This article is wrong, therefore something in the article must be wrong.” You then either dismiss the article outright (“false accusation rates are not increasing”) or you try to find some claim the article makes that is false and that blocks the conclusion (“one of the key studies you appealed to here isn’t very good”) or you just point out that the authors must have immoral views (“you’re claiming we shouldn’t believe the victims of sexual assault.”)

It’s possible that the article does in fact contain an error and is incorrect, in which case it’s good that you pointed out the error. But it’s also possible that if you sat down and read the article closely, you wouldn’t actually be able to find any key claim, argument, or conclusion in the article that you truly disagree with. For example, the article on false accusation rates may contain no errors and be fairly humble in its conclusions. It may be completely accurate and fairly boring report on recent studies into, say, prosecution rates for malicious false accusations that doesn’t say anything about how we should respond to this increase. You might still feel like you disagree with the article, but you can’t actually point the author to precisely what you think they got wrong.

This leads to a really bad dynamic between authors and their critics in which the author feels unfairly maligned: they were trying to say something true and reasonable and now all these people on the internet keep misconstruing what they are saying or offering objections that seem beside the point or are claiming that the author is a bad person. The critic doesn’t change their mind and is angry at the author for saying such false things and annoyed that they don’t see how wrong they are.

What we can miss here is that the reasonable-looking inference “This article is wrong, therefore something in the article must be wrong” is not quite correct. It’s possible to agree with every claim in an article (to think that the article is technically correct in most respects) but but to think that the conclusions that many readers might draw from the article are wrong. You have a reasonable belief that an article on increased false accusation rates will be used to justify disbelieving victims, even if this was never something that the author actually endorsed or even if it’s something they went out of their way to reject. What you actually disagree with is the article’s connotations: what you think others will believe the article justifies.

I think it’s good for us to notice when we primarily disagree with the connotations of an article and not its content. We can then point out that we disagree with the conclusions people might draw from the article without misrepresenting it or its author. E.g. “This is an interesting [fictional] article that does seem to show an increase in false accusation prosecutions. Of course, it’s worth bearing in mind that the base rate of false accusations are relatively low and that this wouldn’t justify a sudden change in how much credence we place in the testimony of victims.”

An importnat worry we might have is that some authors will their article because they want people to draw the conclusion that it doesn’t state (“sexual assault victims shouldn’t be believed”) but they also want to avoid being criticized for supporting that conclusion. So they only state things that are technically true and let the reader draw the conclusion. That is a problem, and I think that this is why authors should try to be explicit about what they think does and doesn’t follow from the claims they are making. But this criticism can also be stated directly. We can say: “In your article you say x and many people are going to feel it’s reasonable to conclude y from this. I think that y is wrong and that it doesn’t follow from x, and that you never really did enough to rule out that inference.” This strikes me as a valid criticism but one that I don’t often see articulated.