Self-Serving utilitarian arguments


Summary: Utilitarianism can give rise to self-serving arguments. These are arguments in which people justify prioritizing themselves over others based on the amount of good they will do. I argue that although self-serving arguments can be misused, they are sometimes morally justified. We need ways of distinguishing good faith from bad faith self-serving arguments and I suggest a few ways we might do this.

Tim is a utilitarian and has dedicated his life to doing good. He works on existential risk by day and runs a hedge fund by night that funnels money to the global poor. Tim calculates that he’s on track to save a 3 million lives during his lifetime.

One day, Tim gets sick and is rushed to hospital. The doctors inform Tim that there’s a 10% chance he’s going to die if he doesn’t drink exactly 10ml of medicine X (which makes Tim start to suspect he’s in some kind of philosophical thought experiment). The hospital only has 10ml of medicine X. To make matters worse, there are ten other people in the lobby that all have a 100% chance of dying if they don’t each receive 1ml of medicine X. Should Tim take the medicine himself or let the ten other people have it?

Tim knows that the average person saves around 3 lives over the course of their lifetime, whereas he has 1 million times this impact. And if he dies, there isn’t some Tim-like figure waiting in the aisles who will do the same amount of good. So a 10% chance that he dies is a loss of 300,000 lives in expectation. If the ten others die that’s only a loss of 40 lives in expectation (the original 10 plus the three they will each save). So surely, according to utilitarianism, Tim ought to take the medicine and let the other ten people die.1

Utilitarianism values someone’s continued existence and quality of life by how much value it produces. Most people’s lives are of similar intrinsic value since humans have short lives and a pretty limited range of wellbeing. But the indirect value produced by each person can vary a lot because the range of impacts people can have on other lives is pretty huge. Imagine each person has a number above their head representing how much good they are expected to produce if they continue to live. For Tim this might be some large positive number, while for Stalin it might be some large negative number.

In general, this means that utilitarianism can put a radically different value on extending or improving people’s lives. Not because it thinks some people are vastly more intrinsically valuable than others, but because it thinks the people Tim will save are no less intrinsically valuable than the people sitting in the lobby.

This discrepancy can be used to justify self-serving utilitarian arguments.2 A self-serving utilitarian argument is an argument of the form “I expect to do a lot of good over the course of my life. So a small extension of my life or improvement in my productivity is extremely valuable: much more valuable than it is for the average person. Therefore I should be willing to prioritize myself above others or to violate commonsense morality if I expect it to extend my life or improve my productivity.”

There’s something inherently icky and objectionable about self-serving utilitarian arguments. If Tim’s numbers are correct, there would have to be over 7500 people dying in the lobby before Tim should save them rather than accept a 10% chance of death. Sounds pretty objectionable.

Discrepancies in the amount of indirect good people do can also mandate self-sacrifice. If you have to choose between extending your own life and extending the life of someone who will do much more good than you, the same utilitarian reasoning demands that you sacrifice yourself. But self-serving arguments are more concerning than self-sacrificing arguments. A purely self-interested person could make a utilitarian self-serving argument in order to do immoral things or prioritize their own wellbeing above that of others - potentially by a significant amount. As movies like to remind us, utilitarianism is a dangerous tool in the hands of well-intended people who reason poorly and ill-intended people who reason well.

Utilitarians might object that this problem only arises because we’ve engaged in some pretty naive utilitarian reasoning and that, for non-naive utilitarians, self-serving arguments will rarely be justified.

There are many reasons for non-naive utilitarians to frown on self-serving arguments. Using self-serving arguments will harm your reputation and the reputation of utilitarianism as a whole. They may contribute to a bad character, which we have utilitarian reasons to avoid. They flout longstanding social norms that we have utilitarian reasons to respect and uphold. They involve using utilitarianism as a decision procedure rather than a criterion of rightness, which is often a bad idea. And they don’t show sufficient deference to moral theories that object to this behavior and that we ought to give some weight to.

While I agree that non-naive utilitarians will encounter good self-serving arguments much more rarely than naive utilitarians, I’m not convinced the non-naive utilitarian can caveat them out of existence. Sometimes acting in accordance with self-serving arguments seems morally justified. In fact, failing to act in accordance with a good self-serving utilitarian argument can be a sign of bad character.

To see why, imagine you’ve been tasked with delivering \$10M in a sealed suitcase to the bank. The money is going to be used to buy medicine for people in developing countries. You can either cycle a dangerous and tiring route to the bank or take a comfortable, secure car for \$50, which is the only other money you have on you. If you cycle there’s a 10% chance the money will be lost (let’s suppose you have to cycle next to a convenient pit of fire that would completely destroy it) so you decide to take the car. At that moment a homeless person approaches asking if you can spare some money for food. What do you do?

In this scenario you have a self-interested reason to prefer to take the car since it’s more comfortable than cycling. And the comfort you get from taking the car is clearly less than the benefit the homeless person will get from buying food. But if you give the homeless person \$50 you’re risking a 10% chance that people in developing countries will miss out on \$10M. Wouldn’t this be a horribly reckless thing to do? Shouldn’t you consider yourself a steward of the good this money can do? Wouldn’t it be a pure pretense of niceness to give the person \$50 in these circumstances? Wouldn’t you feel an aching sense of moral shame if you gave the person \$50 only to have the \$10M tumble into the fire pit? I certainly would.

In this case, it seems like you would be justified in using your \$50 to take the car to the bank. This happens to be the thing that’s in your self-interest, but that’s not the primary reason you’re doing it: you’re doing it because you know that you have a responsibility to protect the good that will come from the money in the suitcase. So couldn’t someone who is going to give away \$10M in future earnings be justified in spending money to eliminate a 10% chance of death for them even if that money could eliminate a higher chance of death for someone else?

You might want to argue that there’s a big difference between sacrificing the happiness of the homeless person to ensure you can do more good in the future and sacrificing ten people’s lives to ensure you can do more good in the future. But the difference seems to be one of degree rather than kind. If we accept that you should use the money to take the comfortable car, we’ve accepted that it’s sometimes okay for a person to take an action that’s in their self-interest and that sacrifices the self-interest of others in order to protect some amount of good they will do in the future. We’ve accepted a self-serving utilitarian argument.

It’s worth noting that the ability to make credible self-serving arguments doesn’t come cheap. If someone justifies a \$50 car ride because they plan to give away \$10M of their money, the argument only works if they’re actually going to give away that money. But it’s hard to distinguish between the people making good faith arguments for self-serving actions (buying a safer car, renting an expensive office space, etc.) from the people making bad faith arguments for the same actions. This is bad news for utilitarianism. You don’t want to be the moral theory that people appeal to just so they can cut in line at the ER.

Of course, we all sometimes act in ways that are purely in our self-interest. And some of those actions are going to be unethical. When I buy a nice pair of speakers instead of giving the money to the poor, I could try to give a long-winded utilitarian justification of how the speakers make me more productive at my job, but the truth is that I wanted the speakers and so I bought them. I was mostly just being selfish.

When it comes to self-interested actions, a self-serving utilitarian argument made in bad faith is clearly worse than a self-serving utilitarian argument made in good faith. But a self-serving utilitarian argument made in bad faith is also worse than no argument at all. If we do something wrong for self-interested reasons but are honest about it, we’ve done something bad. If we also try to give a self-serving utilitarian argument to avoid taking responsibility for it, we’ve done something worse.

So it’s important to be able to distinguish between good faith and bad faith self-serving arguments. If we assume that the self-serving arguments being made are equally plausible, there are still several ways that utilitarians can distinguish themselves from those making self-serving arguments in bad faith:

The first is simply through prior behavior. Has a person directed their life towards doing good? If utilitarianism demands self-sacrifice, have they done it? If so, it’s more likely that they’re arguing in good faith. If someone only pulls out utilitarian reasoning when it suits them, we should be skeptical.

The second is through pre-commitments. If someone has pre-committed to do good — if they’ve taken a public pledge, put money into a donor-advised fund, or committed to having a high-impact career — we can be more confident they are arguing in good faith. It’s easy to claim that you’re going to do a lot of good somewhere down the line. But the person who pre-commits is showing they’re willing to pay the piper.

The third is through the use of independent adjudication. We can’t really trust ourselves to evaluate self-serving arguments, since we’re clearly not impartial about the outcome. But we can ask a morally upstanding friend or colleague to decide for us. Appealing to independent adjudication is a good sign that someone is arguing in good faith. (This kind of independent adjudication also reduces the likelihood that utilitarians will underinvest in their own wellbeing, e.g. because they’re afraid of being seen as selfish.)

Self-serving utilitarian arguments are easy to abuse and we clearly have reason to be skeptical of them. But if they can be morally justified, even if rarely, then we might not want to reject them out of hand. We do want some way of distinguishing good faith from bad faith self-serving arguments. If a well-reasoned self-serving argument is coming from a person that has a good track record of altruistic behavior, has pre-committed to doing good in the future, and has deferred to independent adjudicators about what they ought to do, I think we can be pretty confident it’s being made in good faith.

  1. Of course, those people save three lives, and the people they save also save three lives, and so on. But the same is true of the 3 million people Tim saves. I’m just going to assume that the saving 3 million lives over a short period of time is better than saving far fewer lives over that period. 

  2. I wanted to call them “gross utilitarian arguments” but if there’s anything the repugnant conclusion has taught me, it’s that there can be downstream costs to putting a negative judgment in a name. It’s kind of like naming your kid “Bad Pete”. 

In AI ethics, “bad” isn’t good enough


Summary: Lately I've been thinking about AI ethics and the norms we should want the field to adopt. It's fairly common for AI ethicists to focus on harmful consequences of AI systems. While this is useful, we shouldn't conflate arguments that AI systems have harmful consequences with arguments about what we should do. Arguments about what should do have to consider far more factors than arguments focused solely on harmful consequences. The title is a shameless riff on the title of this article.

In ethics we use the term “pro tanto”, meaning “to that extent”, to refer to things that have some bearing on what we ought to do but that can be outweighed. The fact that your dog is afraid of the vet is a pro tanto reason not to take him. But perhaps you ought to take him despite this pro tanto reason not to, because keeping him in good health is worth the cost of a single unpleasant experience. If that’s true then we say you have “all things considered” reasons to take your dog to the vet.

In AI ethics, we often point to things that systems do that are harmful. A system might make biased decisions, use a lot of energy in training, produce toxic outputs, and so on. These are all pro tanto harms. Noting that a system does these things doesn’t tell us about the overall benefits or harms of the system or what we have all things considered reasons to do. It just tells us about one particular harm the system causes.

It’s useful to identify pro tanto harms. Pro tanto harms give us pro tanto reasons to do things. When we identify a pro tanto harm we have a pro tanto reason to fix the problem, to analyze it more, to delay deploying the system, to train systems differently in the future, and so on.

But most things that have any significance in the world create some pro tanto harms. And identifying pro tanto harms often doesn’t give us all that much information about what we should do all things considered, including whether we should do anything to reduce the pro tanto harm.

To see why, suppose an article points out that some surgical procedure results in painful stitches. The article draws no conclusions from this: it merely points out one bad thing about the surgery is that it results in these painful stitches, and describes the harm these stitches do in some detail.

There are three ways the harm of these stitches could be mitigated: by not performing the surgery, by giving patients stronger painkillers, or by reducing the length of the incision. But the surgery is essential for long-term health, the stronger painkillers are addictive, and a smaller incision is associated with worse outcomes. In fact, patients with larger incisions who are given fewer painkillers do much better than those from any other group.

In this case, although the pain of the surgery is a pro tanto harm, we actually have all things considered reasons to take actions that will increase that harm, since we ought to increase incision length and give fewer painkillers.

So it’s a mistake to assume that if we identify a pro tanto harm from an AI system, it must be the case that someone has done something wrong, something needs to be done to correct it, or the system shouldn’t be deployed. Maybe none of those things are true. Maybe all of them are. We can’t tell based solely on a discussion of the pro tanto harm alone.

While pro tanto harms don’t entail that we have all-things-considered reasons to do things differently, they do waggle their eyebrows suggestively while mouthing ‘look over there’. In order to know whether a pro tanto harm is waggling its eyebrows towards something we should do differently, we need to ask things like:

If we want to figure out what we have all things considered reasons to do, it’s not good enough to point out the bad consequences of an AI system, even if we also point out how to address these consequences. We need to weigh up all of the relevant moral considerations by answering questions like the ones above.

To give a more concrete example, suppose a decision system makes biased decisions about how to set bail. Should we change it? All else being equal, we should. But suppose it’s very difficult to fix the things that give rise to this bias. Does this mean that we shouldn’t deploy the system until we can fix it? Well, surely that depends on other things like what the existing bail system is like. If the existing system involves humans making extremely biased and harmful decisions, deploying a less biased (but far from perfect) system might be a matter of moral urgency. This is especially true if the deployed system can be improved over time.

Different moral theories will say different things about what we have all things considered reasons to do. If you’re a deontologist, finding out that a system violates someone’s right might imply that you shouldn’t deploy that system, even if the alternative is a system with much worse rights violations. If you’re a consequentialist about rights, you might prefer to replace the current system with one that violates fewer rights.

Regardless of your views about moral theories, arguments of the form “this system does something harmful” are very different from arguments of the form “we ought to develop this system differently” or “we ought not to deploy this system”. The former only requires arguing that, in isolation, the system does something harmful. The latter requires arguing that an action ought to be performed given all of the morally-relevant facts.

Since we can’t be certain about any one moral theory and since we have to try to represent a plurality of views, coming to all things considered judgments in AI ethics will often require a fairly complex evaluation of many relevant factors. Given this, it’s important that we don’t try to derive conclusions about what we have all things considered reasons to do about AI systems solely from pro tanto harm arguments.

It would be a mistake to read an article about painful stitches and to conclude that we should no longer carry out surgeries. And it would be a mistake to read an article about a harm caused by an AI system and conclude that we shouldn’t be using that AI system. Similarly, it would be a mistake to read an article about a benefit caused by an AI system and conclude that it’s fine for us to use that system.

Drawing conclusions about what we have all things considered reasons to do from pro tanto arguments discourages us from carrying out work that is essential to AI ethics. It discourages us from exploring alternative ways of deploying systems, evaluating the benefits of those systems, or assessing the harms of the existing institutions and systems that they could replace.

This is why we have to bear in mind that in AI ethics, “bad” often isn’t good enough.

Price gouging: are we shooting the messenger of inequality?


Summary: People price gouge when they buy goods during an emergency in order to re-sell them for a higher price. Why does price gouging feel wrong to us? In this post I consider a couple of possible reasons and argue that price gouging feels wrong because when people price gouge others, only two kinds of people can buy a scarce good: the rich and the desparate. So it makes prior inequalities between people more salient. I call this "shooting the messenger of inequality" and argue that doing this is often counterproductive.

Early in the pandemic, some people bought important supplies when the cost was low and sold them for marked up prices, i.e. they engaged in price gouging. There’s usually a pretty strong backlash against this and sometimes laws are even passed to prevent it from happening.

People with an economics background often get annoyed by this backlash. Suppose hand sanitizer was \$1 before the pandemic and a small number people bought it. After the pandemic hits, there are many more people who want to buy hand sanitizer at that \$1 price: far more people than the available number of hand sanitizers. By buying at the original price and charging more, price gougers ensure that the people that want the hand sanitizer most—as reflected in their willingness to pay more for it—actually get it. Increased demand also incentivizes companies to produce more of the relevant goods.

Why does price gouging feel wrong to us? Here’s a preliminary explanation: when we witness price gouging, we see a situation in which only two kinds of people can buy a scarce good: those who desperately need it and have to shell out a huge amount of money for it, and those with a lot of money who are simply happy to pay the increased cost. This feels like it exploits those who are desperate, and unfairly advantages those that are wealthy.

Are these intuitions about exploitation or unfairness justified?

Suppose a low-income parent with a sick child has to pay \$50 for a \$1 bottle of hand sanitizer from a price gouger. It looks a lot like the price gouger is just profiting from their desperation and creating no value. But if price gouging weren’t allowed, it’s not true that the parent would have the \$1 hand sanitizer. Instead, they’d probably have no hand sanitizer at all: it would be gone from the shelves before they arrive. Since the parent would rather have the hand sanitizer and be down \$50 than have no hand sanitizer at all, a world where price gouging is allowed is better for her than the one in which it isn’t. Price gouging makes it more likely that the hand sanitizer goes to the low-income parent and not to someone who doesn’t really need it.1

Can price gouging ever be exploitative if the exchange involves no deception and leads to an outcome that the price gouger and the parent both prefer? In ethics, this gets called “mutually beneficial exploitation” and there’s a lot of debate about whether it’s possible.

We might think that this kind of exchange is wrong because there’s a more welfare-maximizing option available to the price gouger: namely, to sell the hand sanitizer to those that need it most. This is different from what actually happens when the price gouger sells their goods because welfare isn’t tracked all that well by willingness to pay. Richer people are willing to pay more for goods that bring them less welfare, since the marginal cost of losing a dollar is lower for them.

But if “there’s a more welfare-maximizing option available” is our standard for exploitation almost all transactions are exploitative, including the store selling the hand sanitizer for \$1. There are almost certainly people who will not pay \$1 for hand sanitizer, but who would derive more welfare from the hand sanitizer than some of the people who are willing to pay \$1 for it.

Perhaps the most plausible response is that price gouging is just a particularly extreme example of this disparity between the market exchange and the one that maximizes welfare.

There are practical problems here, however. In order to determine the welfare-maximizing price, the price gouger would have to assess the needs and resources of each potential buyer and adjust their price accordingly. But identifying what resources they have and genuine need is extremely hard. A higher willingness to pay might be one of the most efficient ways for the price gouger to identify those with a higher need, given what they know.

Perhaps those with more information could try to distribute key goods in a way that maximizes welfare. For example, governments could come together and distribute hand sanitizer to those that need it most at a subsidized price. But the fact that they don’t do this is hardly something that can be blamed on the price gouger. So why do we direct our ire at them?

Ultimately, I suspect that at least some of our intuition that price gouging is wrong comes from the fact that when there are large wealth disparities, willingness to pay is a worse proxy for welfare. If someone with only \$10 is dying in the desert and comes across a water seller, the water will go to any wealthy person who is willing to pay \$11 to take a shower.

When we see these kinds of outcomes, I think we’re inclined to shoot the messenger of inequality: i.e. to blame whoever happens to be the person carrying out the final transaction. But this person is hardly to blame for the fact that such wealth inequality exists. They are also not in a good position to correct for it and are likely to be out competed if they try. (To say nothing of how this correction would affect the supply of important goods.)

If this is correct then we might want to redirect our ire at those with the ability to nudge things in a more welfare-maximizing direction. Governments can do so by redistributing some of the economic wealth we generate to the worst off, for example. But when governments outlaw price gouging, they’re probably just shooting the messenger of inequality. They haven’t improved the underlying situation—if anything, they seem to have made it worse—they’ve just shot the guy that was drawing attention to it.

Of course, it’s unlikely that the optimal distribution of wealth is a totally equal one. Wealth equality won’t be sustained if people are rewarded in accordance with the value they create and a smaller portion of a bigger pie is often better than a more equal portion of a smaller pie. So the world in which welfare is maximized in the long term might inevitably involve individual transactions along the way that are bad from a welfare-maximizing point of view. But it’s also unlikely that the optimal distribution of wealth involves the kind of disparities between the rich and the poor that results in some people taking showers next to others that are dying of thirst.

I won’t take a stance on the best balance to strike on growth and equality - I’ve already skirted some heavy economic territory here. I just want to point out that we’re often inclined to shoot the messengers of inequality even if they’re doing something that makes the world better, like making it more likely that important goods go to those that need them.

Shooting the messenger of inequality happens elsewhere too. People often think it’s exploitative to open low-wage factories or run drug trials in developing countries, for example. This is true even if people choose whether to work in those factories or to be enrolled in those drug trials, and even if their choice to do so seems reasonable given their other options.

If shooting the messenger of inequality is a real part of this phenomenon, it seems like one we should try to avoid. After all, yelling at doctors for telling us we’re sick won’t make us any healthier.

  1. There might be some kind of luck-based view about fairness at play here: i.e. it’s better for everyone to have a similar chance at getting a single hand sanitizer for \$1 than for some people to have a higher chance at getting hand sanitizer by paying more for it. The system of restricting supply per person approximates this, but there are many issues with it. For example, it requires that people be prevented from re-selling their hand sanitizer if it’s to avoid devolving into distributed price gouging. This means it can result in an outcome that everyone would prefer to change: each winner prefers to sell to a loser that prefers to buy. I’d be interested in hearing a defense of restricting supply per person, however, since it seems to be a common practice. 

Fairness, evidence, and predictive equality


Summary: Sometimes information that makes a prediction more accurate can make that prediction feel less fair. In this post, I explore some possible causal principles that could be beneath this kind of intuition, but argue that these principles are inconsistent with out intuitions in other cases. I then argue that our intuitions may reflect a desire to move towards more "predictive equality" in order to mitigate some of the negative social effects that come from making predictions based on properties generally correlated with worse outcomes.

This is mostly an exploration of the intuitions about fairness that arose as I wrote my previous post. I'm pretty sure others will have said most of what I say here. The thing I have most confidence in after writing it is that our intuitions about fairness are quite messy. There's definitely a more precise and action-guiding version of the utilitarian-ish view I sketch at the end.

In-person exams in the UK were canceled this year because of the pandemic, so results were given using a modeling system that looked at “the ranking order of pupils and the previous exam results of schools and colleges”. I don’t know how the modelling system took into account previous results of schools and colleges, but I’m going to assume that students from schools with a worse track record on exams were predicted to have lower grades. This has, understandably, caused a lot of controversy.

I think this might be a good example of a case where using information feels unfair even if it makes our decision more accurate. It’s very likely that previous school performance helps us make better predictions about current school performance. Yet it feels quite unfair to give people from lower performing schools worse grades than those from higher performing schools if everything else about them is the same.

To take a similar kind of case, suppose a judge’s goal is to get people who have committed a crime to show up to court in a way that minimizes costs to defendants and the public. How should she take into account statistical evidence about defendants?

First, let’s consider spurious correlations in the data that are not predictive. Suppose we divide defendants into small groups, such as “red-headed Elvis fans born in April”. If we do this, we’ll find that lots of these groups have higher than average rates of not showing up for court. But if these are mostly statistical artifacts that aren’t caused by any underlying confounders, the judge would do better by her own lights if she mostly just ignored them.

Things get trickier when the correlations are predictive. For example, suppose that night shift workers are less likely to show up to court on average. Their court date is always set for a time when they aren’t working, so being a night shift worker doesn’t seem to be a direct cause of not showing up to court. But the correlation is predictive. Given this, the judge would do better by the standards above if she increases someone’s bail amount when she finds out they’re a night shift worker. This is true even if most night shift workers would show up to court.

As in the UK grades case, this feels intuitively unfair to night shift workers.

One principle that might be thought to ground our intuition for why this is unfair is the following:

Causal fairness principle (CFP): it’s fair to factor properties of people into our decision-making if and only if those properties directly cause an outcome that we have reason to care about1

This principle looks plausible and would explain why the grades case and the night shift workers case both feel unfair. Night shift work doesn’t seem to cause not showing up to court, and going to a low performing school doesn’t directly cause getting a lower grade. But I think this principle is inconsistent with our intuitions in other cases.

To see why, suppose that night shift workers are more likely to live along poor bus routes. This means that they often miss their court appointment because their bus was running late or didn’t show up. And this explains the entire disparity between night shift workers and others: if a night shift worker doesn’t live along a poor bus route then they will show up to court as much as the average person and if a non-night shift worker that lives along a poor bus route, they will show up at court at the same (lower) rate as night shift workers that live along poor bus routes.

The judge receives this new information and responds by increasing the bail of anyone who lives along poor bus routes. By CFP her decision would be fair, since it only takes into account properties that are direct causes of the outcome she cares about. (And the outcomes will better relative to her goals because this heuristic gets at the underlying causal facts more than the night shift workers heuristic does). But I think her decision is intuitively unfair.

In response to this case, we might adjust CFP to say that a decision is fair only if the causal factors in question are currently within the control of the agent.

This addition makes some intuitive sense because factors outside of an agent’s control are often not going to be responsive to whatever incentives we are trying to create. In this case, however, the place that the agent lives is at least partially in their control, even if moving would be very financially difficult for them. The behavior of people who live along poor bus routes is also likely to be responsive to incentives. People who live along poor bus routes are more likely to leave earlier to get to court if failing to show up means foregoing a high bail amount.

We also often think that it’s fair to consider causally relevant factors that are outside someone’s control when making decisions. Suppose you’re deciding whether to hire someone as a lawyer or not and you see that one of the applicants is finishing a medical degree rather than a law degree. It seems fair to take this into account when making your decision about whether to hire them, even if we suppose that the candidate currently has no control over the fact that they will have a medical degree rather than a law degree, e.g. because they can’t switch in time to get a law degree before the position starts.

These are reasons to be skeptical of CFP in the “if” direction (if a property is causally relevant then it’s fair to consider it) but I believe we also have reasons to be skeptical of the principle in the “only if” direction (it’s only fair to consider a property if it’s causally relevant).

To see why, consider a case in which the judge asks a defendant “are you going to show up to your court date?” and the defendant replies “no, I have every intention of fleeing the country”. Should the judge take this utterance into account when deciding how to set bail? This utterance is evidence that the defendant has an intention to flee the country, and having this intention is the thing that’s likely to cause them to not show up to their court date. The utterance itself doesn’t cause the intention and it won’t cause them to flee the country: the utterance is just highly correlated with the defendant having an intention to flee (because this intention is likely the cause of the utterance). So CFP says that it’s unfair for the judge to take this utterance into account when making her decision. That doesn’t seem right.

To avoid this, we might try to weaken CFP and say that it’s fair to take properties of someone into account only if having those properties is evidence that a person has another property that’s causally relevant to the outcome. But this weakens the original principle excessively, since even the most spurious of correlations will be evidence that a person has a property that’s causally relevant to the outcomes we care about. This includes race, gender, etc. since in an unequal society many important properties will covary with these properties. In an ideal world, we would only get evidence that someone has a causally relevant property when the person actually has the causally relevant property. But we don’t live in an ideal world.

Perhaps we can get around some of these problems by moving to a more graded notion of fairness. This would allow us to amend the principle above as follows:

Graded causal fairness principle (GCFP): Factoring a piece of evidence about someone into our decision-making is fair to the degree that it is evidence that the person has properties that directly cause an outcome that we have reason to care about2

Since coincidental correlations will typically be weaker evidence of a causally-relevant property than correlations that are the result of a confounding variable, GCFP will typically say that it’s less fair to take into account properties that could just be coincidentally correlated with the outcome we care about.

Although this seems like an improvement, GCFP still doesn’t capture a lot of our intuitions about fairness. To see this, consider again the case of night shift workers. Suppose that we don’t yet know why night shift work is so predictive of not showing up to court. By GCFP, it would be fair for the judge to assign night shift workers higher bail as long as the correlation between night shift work and not showing up to court were sufficiently predictive, since a correlation being more predictive is evidence that there’s an underlying causally relevant factor. Once again, though, I think a lot of people would not consider this to be fair.

Let’s throw in a curve ball. Imagine that two candidates are being interviewed for the same position. Both seem equally likely to succeed, but each of them has one property that is consistently correlated with poor job performance. The first candidate is fluent in several languages, which has been found to be correlated with underperformance for reasons not yet known (getting bored in the role, perhaps). The second candidate got a needs-based scholarship in college, which has also been found to be correlated with underperformance for reasons not yet known (getting less time to study in college, perhaps).

Suppose the candidates both want the job equally and that these properties are equally correlated with poor performance. The company can hire both of the candidates, one of them, or neither. How unfair does it feel if the company hires the person fluent in many languages but not the person who received a needs-based scholarship to college? How unfair does it feel if the company hires the person who received a needs-based scholarship to college but not the person who is fluent in many languages?

I don’t know if others share my intuitions, but even if it feels unfair for the company to hire only one of the candidates instead of both or neither, the situation in which they reject the candidate who received a needs-based scholarship feels worse to me than the situation in which they reject the candidate who is fluent in several languages.

One possible explanation for this is that we implicitly believe in a kind of “predictive equality”.

We often need to make decisions based on facts about people that are predictive of the properties that are causally-relevant to our decision but aren’t themslves causally-relevant. We probably don’t feel so bad about this if the property in question is not generally disadvantageous, i.e. over the course of a person’s life the property is just as likely to be on the winning and losing end of predictive decisions.

Let’s use the term “predictively disadvantageous properties” to refer to properties that need not be bad in themselves (they could be considered neutral or positive) but that are generally correlated with worse predicted outcomes. It often feels unfair to base our decisions on predictively disadvantageous properties because we can foresee that these properties will more often land someone on the losing end of predictive decisions.

Consider a young adult who was raised in poverty. They are likely predicted to have a higher likelihood of defaulting on a loan, more difficulty maintaining employment, and worse physical and mental health than someone who wasn’t raised in poverty. Using their childhood poverty as a predictor of outcomes is therefore likely to result in them fairly consistently having decisions being made in ways that assumes worse outcomes from them. And it can be hard to do well—to get a loan to start a business, say—if people believe you’re less likely to flourish.

Cullen O’Keefe put this in a way that I think is useful (and I’m now paraphrasing): we want to make efficient decisions based on all relevant information, but we also want risks to be spread fairly across society. We could get both by just making the most efficient decisions and then redistributing the benefits of these decisions. But many people will have control only over one of these things: e.g. hirers have control over the decisions but not what to do with the surplus.

In order to balance efficiency and the fair distribution of risks, hirers can try to improve the accuracy of their predictions but also make decisions and structure outcomes in a way that mitigates negative compounding effects of predictively disadvantageous properties.

For example, imagine you’re an admissions officer considering whether to accept someone to a college and you know that students from disadvantaged areas tend to do drop out more. It would probably be bad to simply pretend that this isn’t the case when deciding which students to accept. Ignoring higher dropout rates could result in applicants from disadvantages areas taking on large amounts of student debt that they will struggle to pay off if they don’t complete the course.3 But it might be good in the long-term if you err on the side of approving people from disadvantaged areas in more borderline cases, and if you try to find interventions that reduce the likelihood that these students will drop out.

Why should we think that doing this kind of thing is socially beneficial in the long-term? Because even if predictions based on features like childhood poverty are more accurate, failing to improve the prospects of people with predictively disadvantageous properties can compound their harms and create circumstances that it’s hard for people to break out of. Trying to improve the prospects of those with predictively disadvantageous properties gives them the opportunity to break out of a negative prediction spiral: one that they can find themselves in through no fault of their own.

But taking actions based on predictively negative properties doesn’t always seem unfair. Consider red flags of an abusive partner, like someone talking negatively about all or most of their ex-partners. Having a disposition to talk negatively about ex-partners is not a cause of being abusive, it’s predictive of being abusive. This makes it a predictively disadvantageous property, since it’s correlated with worse predictive outcomes. But being cautious about getting into a relationship with someone who has this property doesn’t seem unfair.

Maybe this is just explained by the fact that we want to make decisions that lead to better outcomes in the long-term. Long-term, encouraging colleges to admit fewer students from disadvantageous areas is likely to entrench social inequality, which is bad. Long-term, encouraging people to avoid relationships with those who show signs of being abusive is likely to reduce the number of abusive relationships, which is good.

How can we tell if our decisions will lead to better outcomes in the long-term? This generally requires asking things like whether our decision could help to detach factors that are correlated with harmful outcomes from those harmful outcomes (e.g. by creating the right incentives), whether they could help us isolate causal from non-causal factors over time, and whether the goals we have specified are the right ones. The short but unsatisfactory answer is: it’s complicated.

Thanks to Rob Long for a useful conversation on this topic and for recommending Ben Eidelson’s book, which I haven’t manage to read but will now recklessly recommend to others. Thanks also to Rob Long and Cullen O’Keefe for their helpful comments on this post.

  1. I added the “have reason to care about” clause because if a judge cared about “being a woman and showing up to court” then gender would be causally relevant to the outcome they we care about and therefore admissible, but it seems ad hoc and unreasonable to care about this outcome. 

  2. An ideal but more complicated version of this principle would likely talk about the weight that we give to a piece of evidence rather than just whether it is a factor in our decision. 

  3. Thanks to Rob Long for pointing out this kind of case. 

AI bias and the problems of ethical locality


Summary: In this post I argue that attempts to reduce bias in AI decision-making face the problem of practical locality—we are limited in what we can do because the actions available to us depend on the society we find ourselves in—and the problem of epistemic locality—we are limited in what we can do because ethical views evolve over time and vary across regions. Both problems have consequences for work on AI bias, and the epistemic locality problem highlights the important links between AI bias and the alignment problem.

This post is based on personal reflections. It's not a scholarly post— I mostly just cite things that I had already read or that people suggested to me. This means that a lot of what I say here may have been said much better somewhere else, and there's probably a lot of relevant literature that I don't mention. I'm posting it because I want my blog to be place where I feel comfortable posting casual musings, but I think it's important to flag that these are casual musings. Suggestions of relevant and related literature are very welcome in the comments.


In this post I argue that attempts to reduce bias in AI decision-making face two ‘ethical locality’ problems. The first ethical locality problem is the problem of practical locality: we are limited in what we can do because the actions available to us depend on the society we find ourselves in. The second ethical locality problem is the problem of epistemic locality: we are limited in what we can do because ethical views evolve over time and vary across regions.

The practical locality problem implies that we can have relatively fair procedures whose outputs nonetheless reflect the biases of the society they are embedded in. The epistemic locality problem gives us reason to understand the problems of AI bias to be instances of the broader problem of AI alignment: or the problem of getting AI to act in accordance with our values. Given this, I echo others in saying that our goal should not be to ‘solve’ AI bias. Instead, our goal should be to build AI systems that mostly reflect current values on questions of bias and that facilitate and are responsive to the progress we make on these questions over time.

Jenny and the clock factory

You are a progressive factory owner in the 1860s. Your factory makes clocks and hires scientists to help develop the clocks, managers to oversee people, and workers to build the clocks. The scientists and managers are in low supply and the roles are paid well, while the workers are in higher supply and receive less compensation. You’ve already increased wages as much as you can, but you want to make sure your hiring practices are fair. So you hire a person called Jenny to find and recruit candidates to each role.

Jenny notes that in order to be a scientist or a manager, a person has to have many years of schooling and training. Women cannot currently receive this training and the factory cannot provide this training because it lacks the resources and expertise needed to do so. Many female candidates show at least as much promise as male candidates, but their lack of this crucial prior training makes them unsuited to any role except worker. Despite her best efforts, Jenny ends up hiring only men to the roles of scientist and manager, and hires both men and women as workers.

Jenny’s awareness of all the ways in which the factory’s hiring practices are unfair is limited, however, because there are sources of unfairness that have yet to be adequately recognized in the 1860s. For example, it is not considered unfair to reject candidates with physical disabilities for worker roles rather than trying to make adequate accommodations for these disabilities. Given this, Jenny rejects many candidates with physical disabilities rather than considering ways in which their disabilities could be accommodated.

The practical locality problem

How fair is Jenny being with respect to gender? To try to answer this, we need to think about the relations between three important variables: gender (G), training (T) and hiring (H).

Deciding to hire a candidate only if they have relevant training (T→H) seems fair since the training is necessary for the job. Deciding to hire a candidate based on their gender alone (G→H) seems unfair, since gender is irrelevant to the job. The fact that women cannot receive the training (G→T) also seems unfair. But, unlike the relationship between T and H and the relationship between G and H, the relationship between G and T is exogenous to Jenny’s decision: it is one that Jenny cannot affect.

To model the situation, we can use dashed arrows to represent exogenous causal relationships—in this case, the relationship between G and T—and solid arrows to represent endogenous causal relationships. We can use red arrows to indicate causal relationships that actually exist between G, T, and H and we can use grey arrows to highlight the fairness of possible causal relationships. Jenny’s situation is as follows:

In this case, there is an important sense in which Jenny’s hiring decision not to hire is fair to each woman who applies because Jenny would have made the same decision had a man with the same level of training applied. If women were given the necessary training, Jenny would hire them. If men were denied the necessary training, Jenny would not hire them. (Her decision therefore satisfies the counterfactual definition of fairness given by Kusner et al, though see Kohler-Hausmann for a critique of counterfactual causal models of discrimination.)

But there is also an important sense in which the fact that Jenny hires only men into scientist and manager roles is unfair. The unfairness is upstream of Jenny. The outcome is unfair because her options are limited by unfair societal practices, i.e. by the fact that women are denied the schooling and training necessary to become scientists and managers.

I’m going to use the term ‘procedurally unfair’ to refer to decisions that are unfair because of unfairness in the decision-making procedure being used. Chiappa and Gillum say that ‘a decision is fair toward an individual if it coincides with the one that would have been taken in a counterfactual world in which the sensitive attribute along the unfair pathways were different’. Building on this, I will say that a decision is procedurally unfair if it diverges from the one that would have been taken in a counterfactual world in which the sensitive attribute along the unfair endogenous pathways were different.

I’m going to use the term ‘reflectively unfair’ to refer to decisions that may or may not be procedurally unfair, but whose inputs are the result of unfair processes, and where the outcomes ‘reflect’ the unfairness of those processes. This is closely related to Chiappa and Isaac’s account of the fairness of a dataset as ‘the presence of an unfair causal path in the data-generation mechanism’. I will say that a decision is reflectively unfair if it diverges from the one that would have been taken in a counterfactual world in which the sensitive attribute along the unfair exogenous pathways were different.

Since decision-makers cannot always control or influence the process that generates the inputs to their decisions, the most procedurally fair options available to decision-makers can still be quite reflectively unfair. This is the situation Jenny finds herself in when it comes to hiring women as scientists and managers.

When it comes to hiring and gender, Jenny has encountered what I will call the practical locality problem. The options available to Jenny depend on the practices of the society she is embedded in. This means that even the most procedurally fair choice can reflect the unfair practices of this society. (What’s worse is that all of the options available to Jenny may not only reflect but to some degree reinforce those practices. Hiring women who cannot perform well in a given role and failing to hire any women into those roles could both be used to reinforce people’s belief that women are not capable of performing these roles.)

The epistemic locality problem

How fair is Jenny with respect to disability status? I think that Jenny is being unfair to candidates with physical disabilities. But the primary cause of her unfairness isn’t malice or negligence: it’s the fact that Jenny lives in a society hasn’t yet recognized that her treatment of those with physical disabilities is unfair. Although we may wish Jenny would realize this, we can hardly call it negligent of Jenny to not have advanced beyond the moral understanding of almost all of her contemporaries.

If we use D to indicate disability status and a subscript to indicate the values and beliefs that a decision is considered fair or unfair with respect to (i.e. FAIRX means ‘this was generally considered fair in year X’), the model of Jenny’s situation is fairly simple:

When it comes to hiring and disability, Jenny is facing what I will call the epistemic locality problem. As we learn more about the world and reflect more on our values, our ethical views become more well-informed and coherent. (For moral realists, they can get better simpliciter. For subjectivists, they can get better by our own lights.) The limits of our collective empirical knowledge and our collective ethical understanding can place limits on how ethical it is possible for us to be at a given time, even by our own lights. This is the epistemic locality problem.

I call these problems ‘ethical locality’ problems because they’re a bit like ethical analogs of the principle of locality in physics. The practical locality problem points to the fact that the set of actions available to us is directly impacted by the practices of those close to us in space and time. The epistemic locality problem points to the fact that our ethical knowledge is directly impacted by the ethical knowledge of those that are close to us in space and time. (But, as in physics, the causal chain that generated the local circumstances may go back a long way.)

Current AI systems and the problems of ethical locality

Are AI systems in the 2020s in a qualitatively different position than the one Jenny finds herself in? Do they have a way of avoiding these two ethical locality problems? It seems clear to me that they do not.

AI systems today face the practical locality problem because we continue live in a society with a deeply unfair past that is reflected in current social institutions and practices. For example, there are still large differences in education across countries and social groups. This doesn’t mean that there isn’t a lot of work that we need to do to reduce procedural bias in existing AI systems. But AI systems with little or no procedural bias as defined above will still make decisions or perform in ways that are reflectively biased, just as Jenny does.

AI systems today also face the epistemic locality problem. Even if we think we have made a lot of progress on questions of bias since the 1860s, we are still making progress on what constitutes bias, who it is directed at, and how to mitigate it. And there are almost certainly attributes that we are biased against that aren’t currently legally or ethically recognized. In the future, the US may recognize social class and other attributes as targets of bias. The standards used to identify such attributes are also likely to change over time.

Future accounts of bias may also rely less on the concept of a sensitive attribute. Sensitive attributes like gender, race, etc. are features of people that are often used to discriminate against them. Although it makes sense to use these broad categories for legal purposes, it seems likely that more characteristics are discriminated against than the law currently recognizes (or can feasibly recognize). In the future, our concept of bias could be sensitive to bias against individuals for idiosyncratic reasons, such as bias against a job candidate because their parents didn’t donate to the right political party.

I hope it’s not controversial to say that we probably haven’t reached the end of moral progress on questions of bias. This means we can be confident that current AI systems, like Jenny, face the problem of epistemic locality.

Consequences of the practical locality problem for AI ethics

The practical locality problem shows that we can have procedurally fair systems whose outputs nonetheless reflect the biases of the society they are embedded in. Given this, I think that we should try to avoid implying that AI systems that are procedurally fair by our current standards are fair simpliciter. Suppose the factory owner were to point to Jenny and say ‘I know that I’ve only hired men as scientists and managers, but it’s Jenny that made the hiring decisions and she is clearly a fair decision-maker.’ By focusing on the procedural fairness of the decisions only, the owner’s statement downplays their reflective unfairness.

We therefore need to be aware of the ways in which AI systems can contribute to and reinforce existing unfair processes even if those systems are procedurally fair by our current standards.

The practical locality problem also indicates that employing more procedurally fair AI systems is not likely to be sufficient if our goal is to build a fair society. Getting rid of the unfairness that we have inherited from the past—such as different levels of investment in education and health across nations and social groups—may require proactive interventions. We may even want to make decisions that are less procedurally fair in the short-term if doing so will reduce societal unfairness in long-term. For example, we could think that positive discrimination is procedurally unfair and yet all-things-considered justified.

Whether proactive interventions like positive discrimination are effective at reducing societal unfairness (as we currently understand it) is an empirical question. Regardless of how it lands, we should recognize that increasing procedural fairness may compete with other things we value, such as reducing societal unfairness. Building beneficial AI systems means building systems that make appropriate trade-offs between these competing values.

Consequences of the epistemic locality problem for AI ethics

If we think we have not reached the end of moral progress on ethical topics like bias, the language of ‘solving’ problems of bias in AI seems too ambitious. We can build AI systems that are less procedurally biased, but saying that we can ‘solve’ a problem implies that the problem is a fixed target. The ethical problems of bias are best thought of as moving targets, since our understanding of them updates over time. Rather than treating them like well-formed problems just waiting for solutions, I suspect we should aim to improve our performance with respect to the current best target. (This is consistent with the view that particular subproblems relating to AI bias that are fixed targets that can be solved.)

In general, I think a good rule of thumb is ‘if a problem hasn’t been solved despite hundreds of years of human attention, we probably shouldn’t build our AI systems in a way that presupposes finding the solution to that problem.’

If our values change over time—i.e. if they update as we get more information and engage in more ethical deliberation—then what is the ultimate goal of work on AI bias and AI ethics more generally? I think it should be to build AI systems that aligned with our values, and that promote and are responsive to ongoing moral progress (or ‘moral change’ for those that don’t think the concept of progress is appropriate here). This includes changes in our collective views about bias.

What does it mean to say that we should build AI systems that ‘align with our values’? Am I saying that systems should align to actual preferences, ideal preferences, or partially ideal preferences? Am I saying that they should align to individual or group preferences and, if the latter, how do we aggregate those preferences and how do we account for differences in preferences? Moreover, how do we take into account problems like the tyranny of the majority or unjust preferences? These are topics that I will probably return to in other posts (see Gabriel (2020) for a discussion of them). For the purposes of this post, it is enough to say that building AI systems that are align with our values means building AI systems that reflect current best practices on issues like bias.

Progress on AI alignment is imperative if we want to build systems that reflect our current and future values about bias.

Problems in AI bias also constitute concrete misalignment problems. Building systems that don’t conflict with our values on bias means giving the right weight to any values that conflict, figuring out how to respond to differences in values across different regions, and building systems that are consistent with local laws. These present us with very real, practical problems when it comes to aligning current AI systems with our values. More powerful AI systems will likely present novel alignment problems, but the work we do on problems today could help build out the knowledge and infrastructure needed to respond to the alignment problems that could arise as AI systems get more powerful.

If this picture is correct then the relationship between AI alignment and AI bias is bidirectional. Progress in AI alignment can help us to improve our work on AI bias, and progress in AI bias can help us to improve our work on the problem of AI alignment.

Thanks to Miles Brundage, Gretchen Krueger, Arram Sabeti and others that provided useful comments on the drafts of this post.

When robustly tolerable beats precariously optimal


Summary: Something is "robustly tolerable" if it performs adequately under a wide range of circumstances, including unexpectedly bad circumstances. In this post, I argue that when the costs of failure are high, it's better for something to be robustly tolerable even if this means taking a hit on performance or agility.

Something is “robustly tolerable” if it performs adequately under a wide range of circumstances. Robustly tolerable things have decent insulation against negative shocks. A car with excellent safety features but a low top speed is robustly tolerable. A fast but dangerous sports car is not.

We often have to pay a price to make something more robustly tolerable. Sometimes we need to trade off performance. If I can only perform an amazing gymnastics routine 10% of the time, it might be better for me to opt for a less amazing routine that I can get right 90% of the time. Sometimes we need to trade off agility. If a large company develops checks on their decision-making processes over time, this may make their decisions more robustly tolerable but reduce the speed at which they can make those decisions.

Being robustly tolerable is not a particularly valuable trait when the expected costs of failure are low, but it’s an extremely valuable trait when the expected costs of failure are high. The more high impact something is—the more widely a technology is used or the more important a piece of infrastructure is, for example—the more we want it to be robustly tolerable. When a lot is on the line, we’re more likely to opt for a product that is worse most of the time but has fewer critical failures.

What are examples where the expected costs of failure are high? It’s clearly very bad if an entire country is suddenly governed poorly. The costs of total failures of governance have historically been very high. This is why being robustly tolerable is a very desirable feature of large-scale governance structures. If your country is already functioning adequately with democractic elections, term limits, a careful legislative process, and checks on power—several branches of government, an independent judiciary, a free press, laws against corruption, and so on—then it seems less likely to suddenly be plunged into an authoritarian dictatorship or to experience political catastrophes like hyperinflation or famine.

I think we can undervalue the property of being robustly tolerable. When we see something that is robustly tolerable, sometimes all we see is a thing that could clearly perform better. (The car could go faster, the decision-making process could be less burdensome, etc.) We don’t take into account the fact that—even if the thing never behaves optimally—it’s also less likely to do something terrible. How well something functions often is in plain sight. But the downside risk isn’t visible most of the time, so it’s easy to forget to look at how robust its performance is. Overlooking robustness could be especially harmful if the only way to improve something’s performance involves making it less robust.

For example, if a candidate we dislike gets elected, it can be tempting to blame the democratic process that allowed it to happen. People can even claim that it would be better to have less democracy than to have people elect such bad representatives. But the very same democratic process often limits the power of that individual and lets people vote them out. A benevolent dictatorship may seem surprisingly alluring in bad times, but any political system that enables a benevolent dictatorship also puts you at much greater risk of a malevolent one. (As an aside, I find it a bit odd when people’s reaction to “bad decisions by the electorate” is to give up on democracy rather than, say, trying to build a more educated and informed electorate.)

Actions and plans can also vary in how robustly tolerable they are. Risk-taking behaviors like starting a company are generally less robustly tolerable, while lower-variance plans, e.g. getting a medical degree, are more robustly tolerable. In line with what I noted in a previous post, we should generally be in favor of robustly tolerable actions and plans when the expected cost of failure is high, and in favor of more fragile but high-yield behaviors and plans when the expected cost of failure is low.

Being robustly tolerable is not always a virtue worth having more of. We can tip the balance between too far in favor of robustness, and we can sacrifice too much performance or agility in order to achieve it. If we do, we can find ourselves in a robust mediocrity that it’s difficult to get out of. (You may believe that some of the examples I give above are robustly mediocre rather than robustly tolerable.)

But if something is robustly tolerable then the worst case scenarios are less likely and less harmful. This is a valuable trait to have in domains where the cost of failure is high. It’s also a trait that’s easy to overlook if we focus exclusively on how well something is performing in the here and now, and forget to consider how well it performs in the worst case scenario.

The virtues and vices of shark curiosity


Summary: Embracing the kind of aggressive curiosity of sharks seems to be a good way of getting better at arguing. But it can have a chilling effect on discourse and friendships. In this post, I explain what I mean by shark curiosity, and how we can strike the right balance between nurturing and testing new ideas.

In philosophy, you spend years learning how to attack arguments. If you keep doing philosophy, you’ll attack others and they’ll attack you in what feels like a kind of constant epistemic trial by fire. It’s not always fun, but it does seem to make people better at arguing.

Sometimes people ask how they can hone these skills. The least useful answer to this is some variant of “sorry, you just have be good at it”. The degree to which argumentative skill is an innate talent is unclear. Even if most of those who end up in fields like philosophy are often innately good at it, this could just be an example of an unfortunate selection spiral in which only those who are innately good at the thing pursue it, and therefore only those who never really needed to be taught the thing end up teaching it.

A slightly more useful answer involves recommending texts on critical thinking, classes in formal logic, and so on. But this isn’t how most people become good at arguing. I haven’t ever taken a critical thinking class, and I didn’t learn formal logic until after I had already developed a lot of the skills that I’m talking about here. So what’s going on?

I once heard that sharks generally don’t bite people because they want to eat them. They bite people because they reflexively bite at anything that looks kind of like a fish (which can include humans) and because biting us is their way of trying to figure out what we are.

Like the intellectual equivalent of sharks, people who are very good at arguing seem to have a habit of reflexively attacking most claims and arguments that come their way. For example, they might see “up to 40% off” and get annoyed by the fact that the claim tells you nothing except that the store definitely won’t give you more than 40% off, which can be claimed by a store offering 0% off. Attacking a claim is their default response, even if the claim is fairly trivial.

For me, this reflex is often at its strongest when I’m confused by something. If someone puts forward a claim that doesn’t make sense to me, I do the intellectual equivalent of biting it to figure out what it is (i.e. I try to tear it apart). This strategy can be pretty effective, since people will often put effort into clarifying what they mean when their views are challenged.

So an effective way to improve your argumentative skills and become a clearer thinker may be to become more curious about the world and, at the same time, more aggressive towards it. You investigate more things, but your reflexive method of investigation is somewhat bitey. We can call this the “curious shark” approach. This strikes me as similar to what a lot of philosophy programs actually do in practice. They throw argument after argument at you and force you to come up with counterargument after counterargument. In order to get better at both defending and attacking, you’ll probably try to learn some logic or probability theory, but it’s the unrelenting practice that forces you to find better strategies over time. (Alan Hájek has helpfully distilled some of the strategies that many philosophers converge on.)

I think this partly explains why philosophers often end up defending pretty weird views. The discipline of philosophy is obsessed with argumentative prowess. Since it’s not all that hard to argue for something that most people find plausible, those arguments are not very impressive. But if you manage to argue that all possible worlds are real and meet the inevitable argumentative onslaught that follows, that’s pretty damn impressive. Arguing for an implausible conclusion is like tying your hands behind your back before entering a tank full of sharks. You’re definitely going to get attacked, but everyone will be all the more impressed if you come out successful.

One problem with the curious shark approach is that, from the point of view of anything they bite, sharks are assholes. That’s not bad for the shark because they don’t particularly want to make friends with the things they’re biting. But people do want to make friends with those around them (or at least not lose friends they have), and constantly tearing down their arguments isn’t exactly the best way to do that.

A related problem with the approach is that most ideas have to start out life as vulnerable little fish before they can grow into something more robust (see this post). If you create an environment where people have to defend their ideas from hungry sharks from day one, people will learn to either hide their ideas or stop coming up with ideas in domains they’re not already extremely well-versed in.

This was true of my philosophy grad program. It was a competitive environment, which was good for honing your ideas once you’d been working on them for a while. But it felt like no one really wanted to express nascent ideas. You knew that if you put forward an idea it would be attacked ruthlessly. So it made more sense to hole away and do the work yourself, and to only show your ideas when they had grown robust enough to withstand the attack. This is unfortunate because early discussions of ideas can be extremely helpful, and is presumably how you get the most value from having other grad students around.

I’ve also experienced the other extreme. I once went to a conference that was trying to move away from the traditional aggression seen in philosophy conferences and embrace a more supportive atmosphere. I thought I saw a problem in a paper and stated it honestly in the Q&A. I felt like my problem was never really fully addressed but most of the remaining comments were things like “here’s an interesting domain where your analysis might apply” or “have you read so-and-so’s related work? I think you’d like it.” At the time I felt like I’d breached a social norm by pointing out a problem with the paper so bluntly, but I also felt like I was doing a bigger favor to the author than any of the more supportive commenters were because the paper would be strengthened the most by fixing problems like the one I was pointing to.

So what are we supposed to do here? If we’re too aggressive with ideas we can kill promising but unrefined ideas when they’re most vulnerable, but if you coddle ideas you can fail to strengthen them early on and set them on the right path (or, worse, let someone work on an idea for a long time that really should have been abandoned much sooner).

Some people try to get around this dilemma by distinguishing between aggressive content and an aggressive tone. The thought is that if we deliver our biting criticism with a kinder tone, we can avoid the chilling effect that comes with biting criticism. It’s true that an aggressive tone can make an intellectual attack feel even more stressful, and perhaps an aggressive tone should never be necessary. But I don’t think a friendlier tone would fully eliminate the chilling effect or the “you’re an asshole” effect. It’s a little bit like moving from a barroom-brawl to a well-regulated boxing match: rules might help, but getting punched in the face is still going to hurt.

Here’s the only thing I’ve found that helps: I point out problems with ideas at every stage of development, but I try my hardest to solve any problem that I identify. Even if I don’t succeed in getting over my own objection, I make an effort. If you show that the goal of your attack isn’t to merely destroy the other person’s idea and declare a personal victory, but to jointly get at the truth and build on whatever part of the idea seems promising, the attacks you level are more likely to have the effect of strengthening rather than killing a promising but unrefined idea. And if the idea does die (as some ideas will), it’s more likely to do so because you’ve both tried to make it work and jointly concluded that it won’t, which ideally doesn’t discourage the other person from voicing similar promising but unrefined ideas in the future.

So if you want to become a sharper thinker, the adversarial training you get from habitually attacking ideas and welcoming attacks from others seems pretty effective. But I think you can do this while minimizing the chilling effect and the “you’re an asshole” effect by treating it as your job to try to counter your own attacks to the best of your ability. I’m not sure if this is the best solution to this problem, but it’s the best one I’ve come up with so far.

The optimal rate of failure


Summary: We sometimes assume that seeing someone fail implies that they are doing something wrong, but I argue that the ideal rate at which our plans should fail is often quite high. I note that this has consequences in politics and ethics that are often underappreciated.

It was apparently George Stigler that said “If you never miss a plane, you’re spending too much time at the airport.” The broader lesson is that if you find you’re never failing, there’s a good chance you’re being too risk averse, and being too risk averse is costly. Although people have discussed this principle in other contexts (e.g. in learning and startup investing), I still think that this lesson is generally underappreciated. For anything we try to do, the optimal rate of failure often isn’t zero: in fact, sometimes it’s very, very far from zero.

To give a different example, I was having an argument with a friend about whether some new social policy should be implemented. They presented some evidence that the policy wouldn’t be successful and argued that it therefore shouldn’t be implemented. I pointed out we didn’t need to show that the policy would be successful, we just needed to show that the expected cost of implementing it was lower than the expected value we’d get back both in social value and—more importantly—information value. Since the policy in question hadn’t tried before, wasn’t expensive to implement, and was unlikely to be actively harmful, the fact that it would likely be a failure wasn’t, by itself, a convincing argument against implementing it. (It looks like a similar argument is given in p. 236-7 of this book.)

This is why I often find myself saying things like “I think this has about a 90% chance of failure—we should totally do it!” (Also, there’s a reason why I’m not a startup founder or motivational speaker.)

The expected value of trying anything is just the sum of (i) the expected gains if it’s successful, (ii) the expected losses if it fails, and (iii) the expected cost of trying. This includes direct value (some benefit or loss to you or the world), option value (being in a better or worse position in the future) and information value (having more or less knowledge for future decisions).

The optimal rate of failure indicates how often you should expect to fail if you’re taking the right number of chances. So we can use our estimates of (i), (ii), and (iii) to work out what the optimal rate of failure for a course of action is, given the options available to us. The optimal rate of failure will be lower whenever trying is costly (e.g. trying it takes years and cheaper options aren’t available), failure is really bad (e.g. it carries a high risk of death), and the gains from succeeding are low. And the optimal rate of failure will be higher whenever trying is cheap (e.g. you enjoy doing it), the cost of failure is low, and the gains from succeeding are high.

If the optimal rate of failure of the best course of action is high, it may be a good thing to see a lot of failure (even though the course of action is best in spite of, rather than because of, its high rate of failure). I think we’re often able to internalize this: we recognize that someone has to play a lot of terrible music before they become a great musician, for example. But we’re not always good at internalizing the other side of this coin: if you never see someone fail, there’s a good chance that they’re doing something very wrong. If someone wants to be a good musician, it’s better to see them failing than to never hear them play.

So far, this probably reads like a life or business advice article (“don’t just promote people who succeed, or you’ll promote people who never take chances!“). But I actually think that failing to reflect on the optimal rate of failure can have some pretty significant ethical consequences too.

Politics is a domain in which things can go awry if we don’t stop to think about optimal rates of failure. Politicians have a strong personal incentive to not have the responsibility of failure pinned directly on them. We can see why if we consider the way that George H.W. Bush used the case of Willie Horton against Michael Dukakis in the 1988 presidential campaign. If a Massachusetts furlough program had not existed, Bush couldn’t have pointed to this case in his campaign. Not having any furlough program may be quite costly to many prisoners and their families, but “Dukakis didn’t support a more liberal furlough program” is unlikely to show up on many campaign ads. Now I don’t know if the Massachusetts furlough program was a good idea or not, but if politicians are held responsible for the costs of trying and failing but not for the costs of not trying, we should expect the public to pay the price of their risk aversion. (More generally, if we never see someone fail, we should probably pay more attention to whether it is them or someone else that bears the costs of their risk aversion.)

I think this entails some things that are pretty counterintuitive. For example, if you see crimes being committed in a society, you might think this is necessarily a bad sign. But if you were to find yourself in a society with no crime, it’s not very likely that you’ve stumbled into a peaceful utopia: it’s more likely that you’ve stumbled into an authoritarian police state. Given the costs that are involved in getting crime down to zero—e.g. locking away every person for every minor infraction—the optimal amount of crime we should expect to see in a well-functioning society is greater than zero. To put it another way: just as seeing too much crime is a bad sign for your society, so is seeing too little.

We can accept that seeing too little crime can be a bad sign even if we believe that every instance of crime is undesirable and that, all else being equal, it would be better for us to have no crime than for us to have any crime at all. We can accept both things because “all else being equal” really means “if we hold fixed the costs in both scenarios”. But if you hold fixed the costs of eliminating a bad thing then it is, of course, better to have less of it than more.

One objection that’s worth addressing here is this: can’t we point to the optimal rate of failure to claim that we were warranted in taking almost any action that later fails? I think that this is a real worry. To mitigate it somewhat, we should try to make concrete predictions about optimal rates of failure of our plans in advance, to argue why a plan is justified even if it has a high optimal rate of failure, and to later assess whether the actual rate of failure was in line with the predicted one. This doesn’t totally eliminate the concern, but it helps.

I first started thinking about optimal rates of failure relation to issues in effective altruism. The first question I had is: what is the optimal rate of failure for effective interventions? It seems like it might actually be quite high because, among other things, people are more likely to under-invest in domains with a high risk of failure, because of risk aversion or loss aversion or whatever else. I still think this is true, but I also think that in recent years there has been a general shift towards greater exploration over exploitation when it comes to effective interventions.

The second question I had is: what is the optimal rate of failure for individuals who want to have a positive impact and the plans they are pursuing? Again, I think the optimal rate of failure might be relatively high here, and for similar reasons. But this raises the following problem: taking risks is something a lot of people cannot afford to do. The optimal rate of failure for someone’s plans depends a lot on the cost of failure. If failure is less costly for someone, they are more free to pursue things that have a greater expected payoff but a higher likelihood of failure. Since people without a safety net can’t afford to weather large failures, they’re less free to embark on risky courses of action. And if these less risky courses of action produce less value for themselves and for others, this is a pretty big loss to the world.

To put it another way: if you’re able to behave in a way that’s less sensitive to risks, you’re probably either pretty irrational or pretty privileged. Since many of the people who could do the most good are not that irrational and not that privileged, enabling them to choose a more risk neutral course of action might itself be a valuable cause area. Investing in individuals or providing insurance against failure for those pursuing ethical careers would enable more people to take the kinds of risks that are necessary to do the most good.

Does deliberation limit prediction?


Summary: There is a longstanding debate about whether deliberation prevents us from making any predictions about actions. In this post I will argue for a weaker thesis, namely that deliberation limits our ability to predict actions.

There is a longstanding debate about the claim that “deliberation crowds out prediction” (DCP). The question at the center of this debate is whether I can treat an action as a live option for me and at the same time assign a probability to whether I will do it. Spohn and Levi argue that we cannot assign such probabilities, for example, while Joyce and Hájek argue that we can.

A claim related to DCP that I’ve been thinking about is as follows:

Deliberation limits prediction (DLP): If an agent is free to choose between her options, it will not always be possible to predict what action an agent will perform in a given state even if (i) we have full information about the state and the agent, and (ii) the agent does not use a stochastic decision procedure.

DLP is weaker than DCP in at least one respect: it doesn’t say that agents can never make accurate predictions about things they are deliberating about, just that they can’t always do so. DLP also stronger than DCP in at least one respect: it extends to the predictions that others make about the actions of agents and not just to the predictions that agents make about themselves.

Here is a case that I think we can use to support a claim like DLP:

The Prediction Machine

Researchers have created a machine that can predict what someone will do next with 99% accuracy. One of the new test subjects, Bob, is a bit of a rebel. If someone predicts he’ll do something with probability ≥50%, he’ll choose not to do it. And if someone predicts he’ll do something with probability <50%, he’ll choose to do it. The prediction machine is 99% accurate at predicting what Bob will do when Bob hasn’t seen its prediction. The researchers decide to ask the machine what Bob will do next if Bob is shown its prediction.

We know that no matter what the machine predicts, Bob will try to act in a way that makes its prediction inaccurate. So it seems that either the prediction machine won’t accurately predict what Bob will do, or Bob won’t rebel against the machine’s prediction. The first possibility is in tension with the claim that we can always accurately predict what an agent will do if we have access to enough information, while the second possibility is in tension with the claim that Bob is free to choose what to do.

(Note that we could turn this into a problem involving self-prediction by supposing that Bob is both the prediction machine and the rebellious agent: i.e. that Bob is very good at predicting his own actions and is also inclined to do the opposite of what he ultimately predicts. But since self-prediction is more complex and DLP isn’t limited to self-predication, it’s helpful to illustrate it with a case in which Bob and the prediction machine are distinct.)

The structure of the prediction machine problem is similar to that of many problems of self-reference (e.g. the grandfather paradox, the barber paradox; the halting problem). It’s built on the following general assumptions:

Prediction: there is a process f that, for any process, always produces an accurate prediction about the outcome of that process

Rebellion: there is a process g that, when fed a prediction about its behavior, always outputs a behavior different than the predicted behavior

Co-implementation: the process g(f) is successfully implemented

In this case, f is whatever process the prediction machine uses to predict Bob’s actions, g is the (deterministic) process that Bob uses when deciding between actions, and g(f) is implemented whenever Bob uses f as a subroutine of g. We can see that if process g(f) is implemented then either f does not produce an accurate prediction (contrary to Prediction) or g does not output a behavior different than the predicted one (contrary to Rebellion). Therefore it cannot be the case that there exists a process f and there exists a process g and the process g(f) is implemented, contrary to Co-implementation. So if agents are free to act and to use a deterministic decision procedure like Bob’s to pick their actions, it will not be possible to predict what they will do in all states (e.g. those described in the prediction machine example) even if we have full information about the state and the agent, as DLP states.

Joyce (p. 79–80) responds to a similar style of argument against our ability to assign subjective probabilities assigned to actions. The argument is that “Allowing act probabilities might make it permissible for agents to use the fact that they are likely (or unlikely) to perform an act as a reason for performing it.” Joyce’s response to this argument is as follows:

I entirely agree that it is absurd for an agent’s views about the advisability of performing any act to depend on how likely she takes that act to be. Reasoning of the form “I am likely (unlikely) to A, so I should A” is always fallacious. While one might be tempted to forestall it by banishing act probabilities altogether, this is unnecessary. We run no risk of sanctioning fallacious reasoning as long A’s probability does not figure into the calculation of its own expected utility, or that or any other act. No decision theory based on the General Equation will allow this. While GE requires that each act A be associated with a probability P(• || A), the values of this function do not depend on A’s unconditional probability (or those of other acts). Since act probabilities “wash out” in the calculation of expected utilities in both CDT and EDT, neither allows agents to use their beliefs about what they are likely to do as reasons for action.

The General Equation Joyce states that the expected value of an action is the probability of a state given that an action is performed (e.g. the state of getting measles given that you received a measles vaccine), multiplied by the utility of the outcome of performing that act in that state (e.g. the utility of the outcome “received vaccine and got measles”), where we sum over all possible states. This is expressed as Exp(A) = Σ P(S || A) u(o[A, S])).

But suppose that Bob derives some pleasure from acting in a way that is contrary to his or others’ predictions about how he will act. If this is the case, it certainly does not seem fallacious for his beliefs about others’ predictions of his actions to play a role in his deliberations (Joyce’s comments don’t bear on this question). Moreover, it does not seem fallacious for his own prior beliefs about how he will act to play a role in his decision about how to act, even if such reasoning would result in a situation in which either he either fails to accurately predict his own actions or fails to act in accordance with his own preferences. (Similar issues are also discussed in Liu & Price, p. 19–20)

When confronted with self-reference problems like this, we generally deny either Prediction or Rebellion. The halting problem is an argument against its variant of Prediction, for example. It shows that there is no program that can detect whether any program will halt on any input. (If the halting program is computable then the program that uses it as a subroutine is also computable, meaning that we can’t drop Rebellion and retain Prediction in response to it). The grandfather paradox, on the other hand, is generally taken to be an argument against its variant of Rebellion: there’s an important sense in which you can’t travel back in time and kill your own grandfather.

Denying Co-implementation is less common. This is because there is often no independent reason for thinking that g and f can never be co-implemented. And the argument shows that there is no instance in which f and g could ever be co-implemented, which remains true even if no one ever actually attempts to do so. Most of us would conclude from this that the processes cannot be co-implemented. (One could, in the spirit of compatiblism, argue that all we have shown is that f and g are never co-implemented and not that they cannot be co-implemented, but I assume most would reject this view.)

In the case of the prediction machine, we can deny that it’s possible for Bob to act in a way that’s contrary to the predictions that are made about him. This might be defensible in the case of self-prediction: if Bob cannot prevent himself from forming an accurate prediction about what he will do between the time that he forms the intention to act and the time that he acts, then he will never be able to rebel against his own predictions. But it is much less plausible in cases where Bob is responding to the predictions of others.

Alternatively, we could try to argue that Bob and the prediction machine will simply never communicate: perhaps every time the researchers try to run this experiment the machine will break down or spit out nonsense, for example. But this response is unsatisfactory for the reasons outlined above.

Finally, we could simply embrace DLP and concede that we cannot always produce accurate predictions about what agents like Bob will do, even if we have access to all of the relevant information about Bob and the state he is in. Embracing DLP might seem like a bad option, but the states we’ve identified in which we can’t make accurate predictions about agents are states in which our predictions causally affect the very thing that we are attempting to predict. It might not be surprising if it’s often impossible to make accurate predictions in cases where our predictions play this kind of causal role.

Conclusion: It seems like DLP could be true but, if it is, it might not be something that should concern us too much.

Disagreeing with content and disagreeing with connotations


Summary: It’s possible to agree with the content of a piece of writing but but to think that the conclusions that many readers might draw from it are wrong. I think it's useful to distinguish between these before criticizing the writing of others.

Suppose someone writes an article entitled “rates of false sexual assault accusation on the rise”. Now, suppose you care about sexual assault victims and you’re worried about unreported sexual assaults. When you see a title like this you think “this person just wants to smear sexual assault victims” and you promptly conclude that the article is wrong or that the person writing it has malicious intentions. (This article title and content are made up: the idea is just that it’s a controversial claim that might nonetheless be well supported.)

We often have a reflexive reaction to an article like this that we don’t even notice. It starts with a reasonable-looking inference: “This article is wrong, therefore something in the article must be wrong.” You then either dismiss the article outright (“false accusation rates are not increasing”) or you try to find some claim the article makes that is false and that blocks the conclusion (“one of the key studies you appealed to here isn’t very good”) or you just point out that the authors must have immoral views (“you’re claiming we shouldn’t believe the victims of sexual assault.”)

It’s possible that the article does in fact contain an error and is incorrect, in which case it’s good that you pointed out the error. But it’s also possible that if you sat down and read the article closely, you wouldn’t actually be able to find any key claim, argument, or conclusion in the article that you truly disagree with. For example, the article on false accusation rates may contain no errors and be fairly humble in its conclusions. It may be completely accurate and fairly boring report on recent studies into, say, prosecution rates for malicious false accusations that doesn’t say anything about how we should respond to this increase. You might still feel like you disagree with the article, but you can’t actually point the author to precisely what you think they got wrong.

This leads to a really bad dynamic between authors and their critics in which the author feels unfairly maligned: they were trying to say something true and reasonable and now all these people on the internet keep misconstruing what they are saying or offering objections that seem beside the point or are claiming that the author is a bad person. The critic doesn’t change their mind and is angry at the author for saying such false things and annoyed that they don’t see how wrong they are.

What we can miss here is that the reasonable-looking inference “This article is wrong, therefore something in the article must be wrong” is not quite correct. It’s possible to agree with every claim in an article (to think that the article is technically correct in most respects) but but to think that the conclusions that many readers might draw from the article are wrong. You have a reasonable belief that an article on increased false accusation rates will be used to justify disbelieving victims, even if this was never something that the author actually endorsed or even if it’s something they went out of their way to reject. What you actually disagree with is the article’s connotations: what you think others will believe the article justifies.

I think it’s good for us to notice when we primarily disagree with the connotations of an article and not its content. We can then point out that we disagree with the conclusions people might draw from the article without misrepresenting it or its author. E.g. “This is an interesting [fictional] article that does seem to show an increase in false accusation prosecutions. Of course, it’s worth bearing in mind that the base rate of false accusations are relatively low and that this wouldn’t justify a sudden change in how much credence we place in the testimony of victims.”

An importnat worry we might have is that some authors will their article because they want people to draw the conclusion that it doesn’t state (“sexual assault victims shouldn’t be believed”) but they also want to avoid being criticized for supporting that conclusion. So they only state things that are technically true and let the reader draw the conclusion. That is a problem, and I think that this is why authors should try to be explicit about what they think does and doesn’t follow from the claims they are making. But this criticism can also be stated directly. We can say: “In your article you say x and many people are going to feel it’s reasonable to conclude y from this. I think that y is wrong and that it doesn’t follow from x, and that you never really did enough to rule out that inference.” This strikes me as a valid criticism but one that I don’t often see articulated.

Impossibility reasoning


Summary: It's typical to teach and use sequential reasoning, but all sequential arguments can be reforumalted as impossibility results. Thinking and presenting arguments in terms of impossibility results rather than sequential arguments can be more fruitful than sequential reasoning.

This is niche topic but I want to write about it because it’s something that I’ve found useful. ‘Syllogistic reasoning’ is sequential reasoning from premises to a conclusion, and it’s the type of reasoning most people use. For example:

  1. If the tax plan will increase jobs then it’s worth passing
  2. The tax plan will increase jobs
  3. Therefore it’s worth passing the tax plan

I prefer to use a slight variant of syllogistic reasoning, which I’ll call ‘impossibility reasoning’. To use impossibility reasoning, you just convert every argument like the one above into a set of things that you can’t have: you can’t have all of the premises and also have the negation of the conclusion. In other words, the argument above is saying that the following is an impossibility set (we’re not reasoning sequentially so the order doesn’t matter):

If you can look at this set and see some way to make all of the things in it true then you immediately know that the original argument is invalid. And if you notice that the member of the set that you find least plausible is one of the original premises then you can pick that out and make a counterargument. You’ve also identified your point of disagreement with the original author. (’Actually, I don’t agree that if the tax plan will increase jobs then it’s worth passing - that’s only true if the cost per job created isn’t too high.’)

This might seem like a very minor adjustment, especially in an example this simple, but I find it much easier to work with impossibility sets than with sequential arguments. I also like presenting arguments as impossibility sets because it lets people explicitly see the trade-offs they have to make. Underneath it all, arguments are just statements of the form “you can’t have all of these things”. I think it’s better to present them as such and make your case, but let your reader decide what they want to give up.

Keep others’ identities small


Summary: I really like Paul Graham’s advice to “keep your identity small” - to avoid making groups or positions part of your identity if you want to remain unbiased. But I often want to add to it “and keep other people’s identity small too”.

I really like Paul Graham’s advice to “keep your identity small” - to avoid making groups or positions part of your identity if you want to remain unbiased. But I often want to add to it “and keep other people’s identity small too”.

I find it irritating when the first thing someone does when they hear something they disagree with is to attribute an identity to the person who expressed the view that roughly correlates with the view in question (feminist, liberal, conservative, religious, libertarian, etc.). When people do this they almost invariably fail to engage with the actual claims that the other person is making. Instead, they engage with claims they think someone from that group would typically make or they dismiss the person’s claims because they come from “a member of group x”.

I’ve seen this happen on all sides. Think implicit bias might hinder women’s careers? You must just be a dyed-in-the-wool feminist. Think IQ might be heritable? You must just be racist and/or sexist. Think abortion might be wrong? You must just be religious and anti-women.

This makes it almost impossible to sincerely engage with the claims in question. Maybe the person you’re talking with does subscribe to some underlying views you disagree with, but I think it’s better to assume, at least at first, that all they’re committed to are the claims they’ve explicitly made.

There are exceptions here, but if we want to expose ourselves to a variety of views and to change people’s minds on divisive topics, it seems better to engage with other people’s statements directly and attribute as small an identity to them as possible.

Infinity and the problem of evil


Summary: Some fictional dialogues in which I explore whether God should create all good worlds and how this relates to the problem of evil.

God pops into existence and – with his newly found omniscience – realizes he’s only thing that exists. Since he’s omnipotent, he can create absolutely anything. Since he’s benevolent, he thinks “I’ll create the most perfect of universes!” At this point, the god will either realize that it’s not possible to create a perfect universe because universes can always be improved, or the god will realize that it is possible to create a perfect universe. Now we (the human readers) don’t know if a perfect universe is possible, so let’s split god into the PIP god (perfect is possible god) and the PIN god (perfect is not possible god). Let’s imagine that PIP and PIN can talk to one another.

PIP: I’ll create the perfect universe!

PIN: Argh, I have all of these good universes to choose between.

PIP: You’re omnipotent: why don’t you just create all of the universes that are good?

PIN: Great idea! But why are you only creating one perfect universe, why don’t you duplicate that universe a bunch of times?

PIP: You’re right, I could create infinitely many perfect universes! You should also duplicate all of the good universes a bunch of times.

PIN: Yeah, I’ll duplicate them infinitely many times. But now that I think about it, why aren’t you also creating the good universes? You could create those in addition to all of the perfect universes, right?

PIP: You know what, that’s not a bad idea, let me go ahead and do that.

So PIP and PIN both decide to create infinitely many duplicates of all the net good possible universes. Out of all of the options available to them, creating infinitely many duplicates of every net good universe seems like the best thing they can do (there might not be an upper bound on the amount of universes they can create, but let’s assume they create as many as they can).

Those observing this conversation realize that the problem of evil has been massively undermined. If PIP and PIN exist and are doing the best thing that they can do, we might expect that we should find ourselves in a good universe but not necessarily in a perfect one. And it seems way more plausible that the universe is net good than that it contains no evil at all: we could be in a suboptimal pocket of the most optimal multiverse.

A sleepy philosopher then comes along and raises a (not very good) objection to PIP.

Sleepy philosopher: PIP, why did you create the net good universes? Wouldn’t it have been better to just create all of the net perfect universes, since this set will dominate the set of both good and perfect universes?

To make the philosopher’s objection clearer, let’s line up these universes on the natural number line: the good universes are all above zero and the perfect universes are depicted with a P:

PIP’s multiverse: {1, 2, 3, 4, 5, 6, …. P, P, P, P, P,…}

PIN’s multiverse: {1, 2, 3, 4, 5, 6, ….}

The complaint against PIP is that he could have just created the perfect universes: {P, P, P, P, P, P, …} and if we line this up with all good universes, it looks like the set of perfect universes is better. But PIP has a pretty good response to this.

PIP: Sure, the set of perfect universes would dominate the set of good universes in a one-to-one contest, but you can’t just map the members of the infinite set of perfect universes and the infinite set of good universes to each other here: I’m also creating as many perfect universes as I would have if I hadn’t created the good universes (all of them, if this is possible, and however many I can if this is not) and the good universes are just a bonus. It seems better to add the good universes to the set of perfect universes, and so that’s what I did. You’re complaints are unreasonable.

But then a less sleepy ethicist and a metaphysician hear about PIN and PIP’s decision and decide to raise some objections of their own.

Ethicist: Look, I get that you both created all of the net good universes that you could, but surely you could have improved things within each of the good universe you created. For one thing, they could have avoided making people have lives that are not worth living.

(PIN and PIP occasionally respond in a unified voice, which we will call PIPPIN)

PIPPIN: I didn’t create lives not worth living unless they were necessary for universe X to exist. Take universe 987c: this universe has one agent called Bob with a crappy life in it. But the universe without Bob in it is identical to universe 999a and I already created that one. If I take Bob out of existence then universe 987c would simply cease to exist.

Metaphysician: Wait, hang on, are you saying that the identity of indiscernibles is true? Like, why can’t you just create 987c but with a better life for Bob? Even if it’s qualitatively identical to 999a it would still be a distinct universe.

PIPPIN: Yes, the identity of indiscernibles is a metaphysical truth and that I can’t change.

Metaphysician: Are you sure? We really thought that one was false.

PIPPIN: Okay you’re right, that wasn’t a very convincing response. My real response is that I didn’t create lives not worth living. Even if people have crappy lives for some period, their lives are actually always net good because I send them all to heaven.

Ethicist: But then why start off their lives in a crappy way? Why don’t you at least make all of the existing lives not involve suffering before the afterlife?

PIPPIN: Well, in order for Bob to be Bob, he has to suffer a bit. Bob without suffering is just a different guy Jeff, and we’ve already created him.

Metaphysician: We’ve been over this.

PIPPIN: Oh yeah. Okay, assume that the identity of indiscernibles is false. Then we need to split back into PIP and PIN.

PIP: So here’s my response. Since perfect universes are possible, perfect lives are also possible (if we can always improve lives then we can also always improve universes by improving the lives in them). And what kind of callous god would fail to improve net good lives to perfect ones? Not me!

Ethicist: So you’re saying everyone has a perfect life – a life that cannot be improved on, even though the identity of indiscernibles is false?

PIP: I sure am!

Ethicist: How on earth can you claim that? Look at this child suffering from disease: are you saying you cannot improve their life?

PIP: Look, I told you I send everyone to heaven, so that child has a life that is infinitely good.

Ethicist: Hang on, you can’t fool me that easily. Let’s look at the sequence of happiness in this child’s life. If I believe you then it’s something like {-3, -3, -3, +1, +1, +1, +1,…} i.e. a finite period of suffering followed by an infinite period of happiness. Even I, a lowly ethicist, can make that life better. Just make the first three locations +1, +1, +1 instead of -3, -3, -3!

PIP: Yeah, but both sequences are infinitely good…

Ethicist: That doesn’t mean you can’t improve them! There’s no reason to look at the sum of the sequence rather than the differences between the sequences. I’ve just made the child’s life better at the first three locations, so I’ve improved it. Why couldn’t you have done that?

PIP: Um, I really thought that the only way to make things better was to make the total larger. That’s about as far as I got.

Ethicist: Aren’t you supposed to be omniscient?

PIP: vanishes in a puff of logic

At this point, all attention is turned to PIN.

PIN: Look, I made all of the net good lives. And yes I can improve them without making them not exist (metaphysician nods) but because there aren’t perfect lives I can always improve them. There’s just no end to the life improvements I can make, so I had to stop at some point.

Ethicist: Okay, PIN. You say you had to stop at some point. Here is my question: why did you make people suffer at all? Even if you can’t make perfect lives, you can at least make lives that only contain good experiences. We might find your existence more plausible if everyone only had positive experiences, but we don’t – we often have very bad experiences. So why didn’t you just create all net good experiences? That seems like an obvious lower bound on what it was right for you to do.

PIN: I guess I could have done that.

Ethicist: Aren’t you supposed to be omnipotent?

PIN: vanishes in a puff of logic

… no time whatsoever goes by …

In a blinding light PIN and PIP are resurrected. Each has more to say in their own defense.

PIPPIN [1]: I know that I said that the identity of indiscernibles was false, but I’ve just realized (though I’m omniscient, so I actually always knew) that isn’t what I needed in order to justify my decision to create lives with suffering in them. Here’s my defense: suppose that Bob is just the sum of his qulitative parts. So Bob plus some extra happiness is a different guy: call him Bob+. Surely I should create as many distinct net good lives as I can. If this is the case and if Bob is not the same person as Bob+ then it would be better for me to create Bob than to fail to do so. (I’ll also create Bob+, but the point is that I will create Bob).

Ethicist: Okay, I’ll grant you that. But why not only create all lives with positive life experiences?

PIPPIN [2]: I’m glad you asked . Let me ask you something in turn: do you think you’d have a complaint if Bob had the happiness stream {+1, +1, +1, +1,…}?

Ethicist: No, I suppose not. In that case Bob would have a life with no suffering. It might be better to create Bob with happiness stream {+2, +2, +2, +2,…} but I understand that when it comes to the net good lives you can always create better ones.

PIPPIN: So you think that{+2, +2, +2, +2,…} is permissible. What about {-1, +3, +3, +3,…}?

Ethicist: No, that life has some negative experiences.

PIPPIN: But isn’t a life at {-1, +3, +3, +3,…} better than a life at {+2, +2, +2, +2,…}?

Ethicist: I suppose…

PIPPIN: So how can it be permissible for me to create the second but not the first, if the first is better?

Ethicist: I guess I can’t say that the second is permissible but the first is not, unless I commit to the idea that it’s life-segments and not whole lives that are the things we should care about. If it’s life segments that we care about then you would have done something wrong by bringing into existence a life segment that was bad.

PIPPIN: Perhaps, although only if that negative life segment wasn’t essential to some positive life segments, correct?

Ethicist: I’ll grant that for now since my head is starting to hurt.

PIPPIN: Well let’s set that thought aside. The important point is that if it’s permissible for me to create lives at {+2, +2, +2, +2,…} then it must be permissible for me to create lives at {-1, +3, +3, +3,…}.

Ethicist: I suppose….

PIPPIN: And if the best thing for me to do is create all possible net good lives and I can’t change a life from {-1, +3, +3, +3,…} to {+3, +3, +3, +3,…} without changing who is experiencing the life, then I should create both lifes if I can. In other words, it’s better for me to also create the first agent.

Ethicist: I suppose…

PIPPIN: Well then what do you have to complain about? You could perhaps claim that I’ve created some lives that are net negative on the whole, but I’m obviously going to tell you that these lives were necessary for the existence of some poisitive lives, or that all lives are net good because I send people to heaven.

Ethicist: Yes, I had foreseen that.

PIPPIN: So have I solved your little problem of evil?

Ethicist: vanishes in a puff of frustration

[1] Thanks to David Mathers to pointing out that we only need anti-haecceitism and not the identity of indiscernibles here.

[2] Thanks to Dustin Crummett for pointing this out. For any life stream of continuous positive value, we could construct a life stream with some negative value that seems better.

Transmitting credences and transmitting evidence


Summary: There is a longstanding debate about whether deliberation prevents us from making any predictions about actions. In this post I will argue for a weaker thesis, namely that deliberation limits our ability to predict actions.

Your credences are how likely you think something is given your evidence and your priors, and reporting them can be much more useful than reporting beliefs. Telling you that I believe it’s not going to rain is good if I want you to know that an umbrella is not necessary, but it’s bad if you need to know specifically how likely it is that I think it will rain. If, unbeknownst to me, you have left some electronics outside that will be destroyed if it rains, then it’s important for you to know whether there’s a 1% chance of rain or a 20% chance of rain, but my belief report doesn’t tell you this.

In some circumstances it can also be useful to report on more than just your credences. For example, suppose that as I’m walking down the street I meet six people in a row who all tell me that a building four blocks away is on fire. I reasonably assume that some of these six people have seen the fire themselves or that they’ve heard that there’s a fire from different people who have seen it. I conclude that I’ve got good testimonial evidence that there’s a fire four blocks away. But suppose that none of them have seen the fire: they’ve all just left a meeting in which a charismatic person Bob told them that there is a fire four blocks away. If I knew that there wasn’t actually any more evidence for the fire claim than Bob’s testimony, I would not have been so confident that there’s a fire four blocks away.

In this case, the credence that I ended up with was based on the testimony of those six people, which I reasonably assumed represented a diverse body of evidence. This means that anyone asking me what makes me confident that there’s a fire will also receive misleading evidence that there’s a diverse body of evidence for the fire claim. This is a problem of evidential overlap: when several people independently tell me that they have some credence in P, I have a reasonable prior about how much overlap there is in their evidence. But in cases like the one above, that prior is incorrect. (The same issue arises when I have just one person telling me that they have some credence in P. If it turns out that we both have a high credence in P on the basis of completely different evidence, then I should update more towards P than I would if we had identical evidence for P.)

So it’s sometimes useful to transmit not only your credence, but the evidence on which that credence is based. When we update on the credences that other people assert, we are updating both on their reading as a thermometer of the evidence, but also on what we estimate to be the nature of the evidence that they are a thermometer of. One way to avoid mistaken beliefs about the nature of that evidence is to transmit it directly: i.e. for each person to tell me that they are confident that there’s a fire four blocks away because Bob said so.

This isn’t just useful in cases of evidential overlap. To take a different kind of example, suppose that you want to know how likely it is that Alice has asthma. I might think that it’s quite unlikely that Alice has asthma because the base rate is quite low, and tell you that I have a low credence that Alice has asthma. A nurse might also think that it’s unlikely that Alice has asthma because they have tested her for asthma and established that she doesn’t have it, and so they too tell you that they have a low credence that Alice has asthma. Even though our credences might be pretty similar, my credence is not very resilient (resilience is, roughly, how likely you think it is to remain the same upon getting more evidence) while the nurses credence is very resilient. And in order to know how resilient your own credence should be, you need to know how resilient the credences of those whose testimony you are relying on are. In other words, you need to know whether their credence is based on a lot of evidence (like the nurses) or whether their credence is based on very little evidence (like mine).

A final reason to transmit your evidence in addition to your credence is that it lets you calibrate people if you think that they have updated incorrectly and to be calibrated yourself in turn if you have made a mistake. This also lets people see how you update on evidence, and to use this information when they weigh your testimony in the future. For example, if you discover that I have a high prior in cultural relativism or that I update too strongly in response to new evidence, you can use this to calibrate how much weight to give to my testimony going forward.

But transmitting your evidence can be costly. For one, we don’t always have a good sense of what our evidence is, and so we may end up just transmitting things like “I think I read this in a journal article once, but it could have been a newspaper column. Really I just have a general hunch that I read it somewhere.” Or even “I have no idea how I know that there’s danger over there, I just sense it.” I actually think it’s useful to identify cases where we don’t know what our evidence is, as long as people don’t mistake this for “I have no evidence”. A larger down side is that transmitting your evidence takes a lot longer than transmitting your credence does. It requires more reflection and simply takes longer to state, especially if it requires additional hedging (“I’m not completely confident about what my evidence is, but I think it’s x to degree n, y to degree m…”).

So when is it a good idea to transmit your evidence rather than just your credence? The value of transmitting your evidence scales with how important it is for your interlocutor to have an accurate credence about the proposition in question, and also with how atypical your evidence is: how much the evidence that you have for the proposition differs from what someone would reasonably expect if they were to hear your credence. If this is right, then when we ask people to do things like forecast important events or estimate the probability of important claims, we should probably ask not only for their credence but also how resilient their credence is and what evidence it is based on.

Thanks to Max Dalton for making substantive contributions to this post.

Against jargon


Summary: It’s sometimes useful to introduce new terms into discourse, but new terms can increase communication efficiency but at the cost of accessibility and sometimes precision. In this post I outline the pros and cons of introducing new, domain-specific terms.

In his 1946 essay “Politics and the English Language”, Orwell suggests the following rule:

Never use a foreign phrase, a scientific word, or a jargon word if you can think of an everyday English equivalent.

I think this is sensible advice. Of course, it’s sometimes useful to introduce new terms into discourse. This is true whenever we encounter a new object or a new concept that is both useful and relatively well defined, but takes too long to express. For example, we couldn’t go around calling gold “that sparkly yellow stuff” or calling limits “the value that this sequence approaches” forever. How useful a new term is will depend on how important the thing it refers to is for the community that is discussing it. Academic fields like economics end up developing certain insider terms like “Pareto efficient” because they are useful for economists, even though they are less useful for the general public. Having succinct terms for these concepts can facilitate work building on them, since they allow a richer class of ideas to be expressed without the complexity of constructions becoming baffling. So the primary benefits of new terms is that they make communication within a group more efficient and allow groups to develop new ideas based on those terms (and when that group is sufficiently large, the term falls into common use).

What, then, is the difference between a new but useful term and a piece of jargon? For the purposes of this piece, I’m going to use “jargon” to refer to a new term whose creation or use is not cost-justified in a given context. This seems consistent with how people use the term jargon: a phrase like “equitable relief” might not be jargon in a conversation between lawyers, but might be jargon if it is used by a politician addressing her constituents.

What are the costs of introducing new terms? I can think of at least six. The first three relate to the fact that new terms are often “insider terms” that are used within an isolated community, and the remaining three are more general:

  1. New terms create an additional barrier to entering a community If you need to learn the language of a given community in order to meaningfully interact with it, then it’s often going to be more difficult to join that community. Sometimes this cost is justified – e.g. it’s difficult to become a mathematician without learning the language of mathematics – but sometimes it is not, especially if you want your community to grow. (In some cases new terms might make it easier to join a community. If there are certain ideas which are central to the operating in the community, having terms for them makes it easier to recognise that there is something to learn. But this only seems to apply to central concepts.)

  2. New terms are often alienating to people outside of that community New terms can be used as a way to indicate insider status, and they can seem alienating (and, in some cases just plain weird) to people outside of the community that uses them.

  3. New terms hinder discussion between community members and outsiders If people from community x and community y use their own vocabularies, this can make people from each field unduly doubt the competence of those in the other community (they didn’t even know what actinic keratosis was!) and can also simply make it more difficult for them to talk with one another (what the hell do half of these terms mean?).

  4. New terms can mask imprecise concepts The flip side of terms facilitating work building on concepts is that having a term to refer to something gives us a feeling that the underlying concept is concrete and commonly understood. This applies even if the underlying concept is actually imprecise and subject to interpretation. When this happens, it can lead to people talking past one another.

  5. New terms can act like undue linguistic patents Consider an existing action like “debating issues when one participant is feeling emotional or defensive” and suppose we gave it a label like “glopping”. The idea that we should avoid “glopping” will probably strike us as novel, and so we will avoid saying “don’t debate issues when one participant is feeling emotional or defensive” without referring to the “avoid glopping” idea. But this gives undue ownership over existing concepts by those that construct terms for them, and can impede discussion of those concepts.

  6. New terms can lend undue credibility to ideas When we give introduce a new term, we are indicating that the underlying concept is so important that it will be useful to have a shorthand for it going forward. Some concepts that are given new terms are simply not important enough to warrant this kind of attention, but can be easy to forget this if the new term gains traction.

We can see that there are perverse incentives for creating new terms (beside the non-perverse incentive of aiding efficient communication within a group). New terms are intellectually satisfying and can be used to indicate insider status or to create a group identity. They also let us patent ideas, increase the credibility of ideas, and can be used to mask imprecision. We should probably bear these perverse incentives in mind before considering whether to create or use a new term.

So how can we tell when the creation or use of a new term is cost-justified? I think the following questions might be helpful guides:

Questions to ask before introducing a new term:

  1. Is this concept sufficiently useful and difficult to convey in a short amount of time that it is worth constructing a new term for it?
  2. Is it likely to be useful to build on this concept?
  3. Have I defined the term precisely in plain English and acknowledged any lingering imprecision in the term?
  4. Is the term that I have introduced as close as possible to a common description of the concept? (e.g. “glopping” is worse by this standard than “emotionally charged debating”: it can be worth sacrificing some succinctness for greater accuracy and comprehensibility.) This is important because having a term which gives essentially the right impression to people who don’t know the precise meaning can capture many of the benefits of a new term while avoiding many of the costs.
  5. Even if my immediate audience consists only of an isolated community, could this new term be costly when the community tries to communicate with a wider audience?

Questions to ask before using such a term:

  1. Am I communicating to a group of people who are all familiar with this term? (If there’s any uncertainty about this, it’s worth checking or saying what you mean by the term.)
  2. Is the underlying concept sufficiently precise that this term is not likely to lead to people talking past one another?
  3. Do I need to use the term or is there some more accessible way to describe the underlying concept?

I think we should aspire to communicate in a way that is as precise, accessible, and efficient as possible. New terms can increase communication efficiency but at the cost of accessibility and sometimes precision. It seems important to bear this in mind before we create new terms, before we decide what new term to use for a given concept or idea, and whether to use those terms when communicating with others both within and outside of a given community.

Thanks to Owen Cotton-Barratt for making substantive contributions to this post.

Utilitarians and disability activists: what are the genuine disagreements?


Summary: I consider five key objections to utilitarianism from disability activists, and highlight where I think there are genuine tensions between the positions, and where I think there are not.

Utilitarianism is the view (roughly) that we ought to act so that we maximize happiness or welfare, and minimize suffering. I think of myself as quite utilitarian in my moral outlook. I have pretty utilitarian intuitions when it comes to policy-level decisions, and I think that utilitarianism has a lot of features that I think a good moral theory should have. I also think of myself as a supporter of those with disabilities. Part of what attracted me to the effective altruism movement was the idea that I might be able to help people who are suffering from illnesses abroad, including what we’d typically conceive of as disabilities (rather than non-disability illnesses), such as blindness caused by cataracts, untreated depression, and cognitive and physical disabilities caused by maternal and childhood malnutrition.

Given this, the emerging gulf between utilitarians and disability activists saddens me. I think of my utilitarian intuitions and my desire to help people with disabilities as two sides of the same coin. I think that we could shorten that gulf somewhat by clearing up some of the confusions that exist between utilitarians and disability activists. I’m going to consider five key objections to utilitarianism from disability activists, and highlight where I think there are genuine tensions between the positions, and where I think there are not.

(1) Utilitarians think that all disabilities are bad

Utilitarians think that something is bad insofar as it causes suffering or results in a loss of actual or potential welfare (e.g. we make three people happier instead of ten). But the theory doesn’t take a stance on what a ‘disability’ is, as distinct from an illness or anything else. What it cares about is whether the following is true of a given condition:

Harmful conditions: condition c is harmful if it causes a welfare reduction in expectation, as a result of either its intrinsic qualities or because people with condition c are treated badly

Utilitarianism is not committed to all or even any disabilities being harmful conditions in this sense. It might be that some disabilities are intrinsically harmful, but that this is compensated for by extrinsic benefits (e.g. the deaf community is a particularly strong one, and perhaps this greatly improves the lives of deaf people such that the condition is not net harmful). It might be that some disabilities are not intrinsically harmful (e.g. conditions that make one different from others, but don’t cause one suffering or social stigma).

Of course, it seems plausible that a lot of disabilities will be harmful in the sense above. Some disabilities involve chronic pain, which many report as being detrimental to their wellbeing. Some disabilities are highly stigmatized, which is also detrimental to people’s wellbeing. Utilitarians believe that we should aim to eliminate these kinds of welfare reductions: for example, by finding the sorts of pain killers that can safely eliminate chronic pain in those whose lives are made worse by it, and by eliminating the kinds of social stigma that cause needless suffering to those with disabilities.

One valid criticism that I think is worth mentioning here; I think that, in the past, utilitarian scholars have been too careless in the way that they have talked about particular disabilities. We should not treat all disabilities as though they were harmful, and we should try to gather as much evidence about how harmful a disability is (e.g. from studies that ask disabled people how their disability impacts on their life) before we discuss it. My impression is that contemporary scholars are much more aware of this, and I think this is important and good.

(2) Utilitarians want to get rid of all disabilities

So far I have only mentioned interventions that involve ‘treating the symptoms’ of disability. But what about getting rid of the disability itself? As we have noted, the utilitarian wants to get rid of the welfare reduction that a condition causes. One way of doing that is by eliminating the harms that a condition causes without eliminating the condition itself. What is left over is a ‘mere difference’ akin to having red hair or liking jazz, which utilitarians have no interest in getting rid of. But what about cases where the best way to treat the symptoms is to ‘cure’ the condition, rather than treat its symptoms?

Suppose that someone is blind from birth and that this negatively impacts on their welfare (by their own report). Now suppose there are two treatment options available: we can try to surgically reverse the blindness entirely (call this ‘cure’), or we can try to make this person’s life as a blind person roughly as easy as that of a seeing person without ‘curing’ their disability (call this ‘assist’). We have to consider the harms and benefits of curing and assisting, which will roughly consist in the following:

(a) the cost of ‘cure’ vs. ‘assist’

(b) the expected welfare benefits to the person of ‘cure’ vs. ‘assist’

(c) the person’s preference of ‘cure’ vs. ‘assist’

(d) the benefits of ‘cure’ vs ‘assist’ if we value human diversity

Together, (a) and (b) give us a rough cost-benefit analysis of ‘cure’ and ‘assist’ for the welfare of the person in question. For example, if ‘cure’ is half as good as ‘assist’ for the person’s welfare, but the costs are such that we can either ‘cure’ 1000 people or ‘assist’ 3 people, then this favors ‘cure’ over ‘assist’ when we are working with limited resources. But it’s important that we take into account all of the harms and benefits of our actions. For example, it’s very important to uphold norms where we respect people’s choices about their own treatment, as (c) states. Such a norm is easy to justify on utilitarian grounds.

I think a key point of disagreement here is about how much we should value (d). Utilitarians will value human diversity for a few reasons: for example, because people like having qualities that make them distinct, and because it’s good for society to consist of people different qualities and views. But it won’t count this as an intrinsic value. If we encounter an alien planet that has a well-functioning and happy society that happens to be completely homogeneous, utilitarianism doesn’t say that this alien planet is worse than a similarly well-functioning and happy society that happens to be heterogenous.

We can imagine toy cases where the utilitarian and the person who gives greater value to (d) might diverge. For example, imagine that Tom has been blind from birth and this causes him some loss of welfare. Suppose it would cost $100 to surgically treat his blindness or $200 to assist him to the point that he’d be just as happy as he would be as a seeing person. And Tom is completely indifferent between these two options. If the utilitarian values the diversity benefit of having a blind person (where blindness is not harming their welfare) at less than $100, then it will say that it’s better to surgically reverse Tom’s blindness than to assist him. But if disability activists think that the diversity benefits are greater than $100, then they’re going to prefer assisting Tom to surgically reversing his blindness. I think that how much to value diversity is going to be a point of disagreement among both utilitarians and disability activists, and utilitarians and other utilitarians. Human diversity is the kind of thing that it’s hard to place a value on. For example, would we ever be justified in refusing to let someone ‘cure’ their own disability because we value the diversity of having both able-bodied people and disabled people in our society? These are difficult questions, but my guess would be that disability activists will give more moral weight to human diversity than utilitarians, on average.

To conclude, utilitarians don’t want to get rid of all disabilities, but it’s likely that they will favor ‘cure’ over ‘assist’ in more cases than disability activists will, because they place less value on people being differently abled than disability activists, on average.

(3) Utilitarians think it would be better not to bring disabled people into existence

As we’ve seen above, utilitarianism is not committed to the claim that all disabilities are harmful. In the case of harmless disabilities, the utilitarian will be roughly indifferent about whether you bring them into existence, all else being equal (they might even be in favor of it, if there’s a value to having a diverse population of people).

But let’s take into account the subset of disabilities that are harmful in expectation. Does a utilitarian think that it would be better to bring someone into existence with no harmful condition than to bring someone into existence with a harmful condition, all else being equal? I’m not interested in being deceptive here: I think that the answer is yes. But the strength of the reasons that we have for bringing people into existence that don’t have harmful conditions requires that all else is equal (which it often isn’t) and often they are fairly weak. For example, if the condition is not very harmful or we can easily make a person with the condition just as well off as someone without the harmful condition, then there’s very little reason to prefer to bring into existence the latter over the former. Trivial reasons like this are easily outweighed by other considerations.

It’s also worth noting that utilitarianism doesn’t distinguish between harmful disabilities and more mundane properties when it comes to questions like this. Suppose you ask: all else being equal, should we bring into existence an agent who has a 10% chance of having a headache on their 32nd birthday or an agent who has a 9% chance of having a headache on their 32nd birthday? The utilitarian will say: all else being equal, it’s better to bring into existence the person with the 9% chance of having the headache. But they will also point out that it doesn’t make that much of a difference, and is a consideration that can be easily outweighed in real world scenarios.

To clarify: utilitarianism does not say that from behind the veil of ignorance (i.e. if we imagine who to bring into being) it’s always better to bring into existence a non-disabled person Jane instead of a disabled person Bob. But it will usually say that it is better, all else being equal, to bring into existence Bob without some harmful condition than it is to bring into existence Bob with some harmful condition.

Things are a bit more complicated when it comes to very harmful conditions that are essential to someone’s being (i.e. we cannot bring into existence this person without bringing into existence someone with a very harmful condition). To take an extreme example, imagine that we know that someone with gene X will be born with a condition that causes them to have a short life that is full of suffering before they die at a young age: they are guaranteed to have a life that is not worth living. And since it is part of their genetics, we cannot remove this condition without turning them into a different person. If we are selecting between embryos for implantation, should we remove those with gene X if we can? Utilitarians will almost certainly say yes. This is analogous to thinking that ‘cure’ rather than ‘assist’ in cases where ‘assist’ is simply not viable. Some disability activists may disagree with this.

On one extreme, you might think that we should be completely indifferent about the possible people that we bring into existence in the future: that we should not care if we bring into existence people who are in extreme agony rather than people who experience normal levels of pain. On the other extreme, you might think that we should have a strong preference for bringing into existence only people who don’t have any harmful conditions. I think that many would agree that neither position is all that plausible, but where to draw that line is the point of disagreement between disability activists and utilitarians.

(4) Utilitarians support euthanizing disabled people

Utilitarians often defend voluntary euthanasia in cases where someone doesn’t have a life worth living. A lot of people don’t actually find this position so implausible in extreme cases involving an adult with a condition that cannot be made better, and who has gone through various medical assessments to establish this. Voluntary euthanasia is still controversial in these cases, since the person could still get better, or we might find a way of alleviating their suffering. But this doesn’t seem to be the point of contention with disability activists, since utilitarians are not singling out disabled people here: the view is just that suicide can be rational in extreme cases (i.e. cases that involve extreme suffering), and that in such cases we should not prevent people from ending their own life.

The cases that are usually discussed in the debate between disability activists and utilitarians are ones that involve people who cannot consent. For example, we can imagine that a child is born with a condition that causes extreme pain and who will die within a certain period of time, but who will, between now and then, suffer terribly. What should we do in such a case? I think that if we had the option of putting the child into a medically induced coma between the time that we realize this and the time of their death, we would not be doing something wrong. In fact, to leave them in agony seems very cruel to me. But I’m also aware that putting a child into this coma is tantamount to death, since it robs them of conscious experience from that time on. Utilitarians are probably divided on this issue. On the one hand, extending suffering needlessly seems cruel. On the other hand, euthanasia is a dangerous tool if abused, and we might want to avoid sanctioning it even in these extreme cases. These are the very cases and considerations that utilitarians discuss.

But this is all very different from the discussion you might think was going on if you just saw the claim ‘utilitarians support euthanizing disabled people’. This claim is simply not true, and if I thought it were true then it would be a strong reason to reject utilitarianism. Utilitarianism values human life and happiness. If someone has a life ‘worth living’ (roughly: the person would prefer to keep living than to die), then utilitarians want to preserve that person’s life. And if someone has a life that is not worth living (roughly: the person would rather die than continue living), then utilitarians would want to bring that life up to the point that it is worth living, if doing so is possible. The truly difficult cases arise when a life cannot be brought up to this point and wishes to die (voluntary euthanasia) or when their life cannot be brought up to this point and cannot consent to euthanasia (the child in a coma case). These are awful cases to have to consider, but the view that euthanasia may be morally justifiable in either or both cases is not morally repugnant, and certainly cannot be equated with the false claim that utilitarianism would support euthanizing people with disabilities. I can hardly believe I need to say this, but there it is.

(5) Utilitarians think that the lives of disabled people are less valuable

Most utilitarians are act utilitarians, in that they think that we should morally evaluate the actions available to us by how much welfare or suffering they cause. But we can look at how the moral view would evaluate things other than actions: e.g. how it would assess the value of a chair, or a nation, or a person. A person’s direct value would be how much welfare they have: that’s what got us the view that it was better to bring about the person with the 9% chance of a headache than the person with a 10% chance of a headache. A person’s indirect value would be how much welfare they create in others: e.g., a great author has positive indirect value, while a mass murderer has negative indirect value.

I do think that it’s a bit weird to apply this moral evaluation to people. If we do, then we end up saying that even though people are equally value as vessels and creators of welfare (i.e., we don’t care about their qualities other than this), the total ‘value’ of a person can vary with their personal levels of welfare and their indirect impact on welfare. Whether, by these lights, a disabled person is less ‘valuable’ than a non-disabled person is going to vary from case to case. Having any harmful condition is going to reduce one’s direct value. For example, I stubbed my toe last night and thereby reduced my direct utilitarian value. Any illness I have also slightly decreases that value. A harmful condition, including those caused by or related to disabilities, will decrease one’s direct value. That’s because the condition is harmful and not because the person has a disability. The class of people who have harmful conditions will have less valuable lives according to the utilitarian, but so will the class of people who stubbed their toe yesterday, or the class of people who’ve had a bad week at work.

The indirect value that people have is also going to vary a lot, depending on what one does with one’s life (e.g. becoming a great author or becoming a mass murderer). Utilitarianism doesn’t say that the lives of disabled people are less valuable, but it does say that if we hold fixed the indirect value that people have, the lives of people with any harmful condition (e.g. a stubbed toe) are producing less direct value than the same life without that harmful condition. But we should be careful not to conclude too much from this. There are strong moral reasons to adopt a norm of never discriminating between people on the basis of their direct or indirect value in all but extreme cases. It can be easy to forget this if we only discuss the kind of idealized fictional cases that find their way into philosophy papers.

I think that a large problem here is that there’s ambiguity in what we mean by the ‘value’ of a life. On the one hand, we can mean ‘how much value this life contains and produces’, as discussed above. But when we discuss the value of a life, we often care more about things like their welfare being treated as equally valuable (which utilitarianism advocates), or people being given fair treatment in accordance with their value as people. And utilitarianism does not imply that if a life contains more welfare then it is deserving of greater resources or better treatment. If anything, utilitarians seem to consistently advocate giving resources to those with lower levels of welfare in order to improve their lives, and not to those whose lives contain ‘more value’ i.e. those who already have sufficiently high levels of welfare.


I do think there are some genuine disagreements between utilitarians and disability activists. I think that disability activists probably value diversity as an intrinsic good while utilitarians do not, and that this has some knock-on effects for the views that each of them have when it comes to ‘edge cases’ like when euthanasia or embryo selection are justified. I also think that utilitarians have been needlessly careless in their discussion of disability in the past, but that this is improving. You might still think that utilitarianism gets things wrong on some of the issues I’ve discussed above. My main goal hasn’t been to defend these utilitarian positions, but to show that the points on which utilitarians and disability activists disagree are points that well-intended and reasonable people can disagree about.

Some noise on signaling


Summary: I ask what signaling is and argue that it’s a bad idea to simply accuse people of “signaling” because signaling can mean a lot of things. I also argue that not all signaling is bad.

Sometimes when people publicly give to charity or adopt a vegan diet or support a cause like Black Lives Matter, they get accused of ‘virtue signaling’. This is a criticism that’s always bothered me for reasons that I couldn’t quite articulate. I think I’ve now identified why it bothers me and why I think that we should avoid blanket claims that someone is ‘signaling’ or ‘virtue signaling’. In order to make things clear, I’m going to give a broad definition of signaling and note the various ways that one could adjust this definition. I’m then going to explain what I think the conditions are for signaling in a way that is morally blameworthy and the difficulties involved in distinguishing blameworthy signaling from blameless behavior that is superficially similar.

In order to discuss signaling with any clarity, we need to try to give some account of what a signal is. The term was originally introduced by Spence (1973) who relied on an implicit definition. I’m not a fan of implicit definitions, so I’m going to attempt to give an explicit definition that is as broad and clear as I can muster, given that ‘signal’ and ‘signaling’ are now terms in ordinary parlance as well as in different academic fields.

A signal is, at base, a piece of evidence. But we don’t want to call any piece of evidence a signal. For one thing, a signal is usually sent by one party (the sender) and received by another (the recipient), so it’s communicated evidence. Moreover, the evidence is usually about a property of the sender: I can signal that I’m hungry or that I’m a good piano player or that I like you, but not that the sky is blue or that it will rain tomorrow (we can imagine an even broader definition of a signal that includes these, but let’s grant this restriction). And we communicate this evidence in various ways: by, for example, undertaking certain actions, saying certain things, or having certain properties. Putting all this together, let’s give the following definition of a signal:

A signal is an action, statement, or property of the sender that communicates to the receiver some evidence that the sender has some property p

Note that, under this broad definition, ‘trivial signals’ and ‘costless signals’ are possible: we can signal that we have a property by simply having it. We can also signal things at no cost to ourselves. I don’t think this is a problem: most of the signals we’re interested in just happen to be non-trivial signals or costly signals (e.g. incurring a cost to convey private information).

Of course, one way we give information about ourselves is by simply telling people. If I’m hungry, I can turn to you and say “I’m hungry”. In doing so, I give you testimonial evidence that I’m hungry. Because you’re a good Bayesian, how much you will increase your credence that I’m hungry given this evidence depends on (a) how likely you think it is that I’m hungry (you’re less likely to believe me if I just ate a large meal) and (b) how likely you think it is that I’d say I’m hungry if I wasn’t (you’re less likely to believe me if I have a habit of lying about how hungry I am, or if I have an incentive to lie to you in this case). And sometimes my testimony that I have a given property just won’t be sufficient to convince you to a high enough degree that I have the property in question. For example, if I’m interviewing for a job, it’s probably not sufficient for me to say to you “trust me, I know python inside and out” because it’s not that common to know python inside and out, and I have a strong incentive to deceive you into thinking I know more about python than I actually do. (As a side note: this gives us a strong incentive to adopt fairly strong honesty norms: if you’re known to be honest and accurate about your abilities and properties even when you have an incentive to lie, you’ll have to rely less on non-testimonial signals of those abilities and properties.)

I know that others (like Robin Hanson, in this post) want to exclude direct testimony of the form “I have property p” as a signal. We could exclude this by adding to the definition the condition that “I have property p” isn’t the content of the agent’s assertion, but I think this is unnecessarily messy: it’s just that we’re less interested in signals that are given via direct testimony. Also, some cases of signaling do seem to involve assertions of this sort. If I find it very difficult to tell people I love them, then the act of saying “I love you” may be a very credible signal that I love you. It also happens to be the primary content of my assertion.

In cases where we can’t sufficiently raise someone’s credence that we have some property p with our testimony alone, they require additional evidence that we have the property to be sufficiently confident that we have it (where the property itself may be a gradational one: e.g., that I’m competent in python to degree n, and not just that I am competent in python simpliciter). In such cases, we need to provide additional evidence that we have the property. For example, I can give you evidence of my competence in python by showing you my university transcripts, or simply by demonstrating my abilities. When I do so, I raise your credence that I am competent in python to the degree that you would require to give me a job, which I wasn’t able to do with testimony alone.

In scenarios like this, there’s an optimal credence for you to have in “Amanda has property p” from my perspective, and there’s an optimal credence for you to have in “Amanda has property p” from your perspective. You — the receiver — probably just want to have the most accurate credence that I have property p. Sometimes it’s going to be in my interest to communicate evidence that will give you a more accurate credence (e.g., if I genuinely know python well, I want to communicate evidence that will move you up from your low prior to one that is more accurate), but sometimes I want to make your credence less accurate (e.g., if I don’t know python that well, but I want to convince you to give me the job). Let’s say that the sender value of a signal is how valuable the resultant credence change is to the sender, and the accuracy of a signal is how much closer the signal moves the receiver towards having an accurate credence.

Hanson argues that we cannot signal that we have properties that are ‘easy to verify’ because if a property is easy to verify, then it is cheap for the receiver to check whether my signal is accurate. I think that it will often be less rational to send costly signals of properties that are easy to verify, but I don’t think we should make this part of the definition of a signal. Suppose that I am in a seminar, and I ask a naive question that any graduate student would be afraid to ask because it might make them look foolish. As a side effect of this, I might signal (or, rather, countersignal) that I am a tenured professor. Such a thing is easy enough to verify: someone could simply look up my name on a faculty list. So if my primary goal was to signal that I was a tenured professor, there are easier methods available to me than to ask naive questions in seminars. But we can signal something even doing so is not our primary goal. And this seems like a genuine instance of signaling that I am a tenured professor, despite the fact that this information is easily verifiable.

Finally, signals sometimes involve costs to the sender. Hanson argues that costly signals are required in cases where a property is more difficult to verify or cannot be verified soon. I think the details here are actually rather tricky, but one thing we can say is that the costlier it is for any receiver to verify that I have a given property, the higher that the minimum absolute cost for sending a true signal is going to be. It doesn’t follow that sending the signal will be net costly to me, just that the absolute cost will be higher. For example, suppose that to be deemed competent as a pilot you need to do hundreds of hours of supervised flying (i.e., you can’t just take a one-time test to demonstrate that you’re a competent pilot). The property ‘is a competent pilot’ is then quite hard to verify, and so the cost of sending a true signal involves hundreds of hours of supervised flying. But if I love flying and am more than happy to pay the time and money cost to engage in supervised flying, then the net cost to me to send the signal might be negligible or even zero, even though the absolute costs are quite high.

So far I have argued that a signal can simply be understood as an action, statement, or property of the sender that communicates to the receiver some evidence that the sender has some property p. Such signaling will be rational if the cost to the sender is greater than the benefit that they will acquire by sending the signal. But one remaining question is whether signaling must be consciously or unconsciously motivating. By ‘motivating’ I just mean that the benefits of sending the signal are part of the agent’s reasons for undertaking a given action (e.g., doing something, speaking, acquiring a property). We might be unconsciously motivated by the signal value of something: for example, I might think that I’m playing the flute because I love it, even though I am unconsciously motivated by a desire to appear interesting or cultured. We can also be motivated to greater or lesser degrees by something: for example, it might turn out that if I could never actually demonstrate my flute-playing abilities to others, then I’d only reduce my flute-playing by 5%, in which case only 5% of my flute-playing was motivated by the signal value it generated.

I’m going to assume that signaling doesn’t require being motivated by signal value. This means that my signaling something can be a side-effect of something I would do for its own sake. Some people might think that in order for me to be ‘signaling’, sending the signal must be a sufficient part of my conscious or unconscious motivation. For example: it must be the case that they would not undertake the action were it not for the signaling value it afforded. If this is the case, then 5% of my flute playing would by signaling in the case above, while 95% of my playing would not be signaling. I can foresee difficulties for views that have either a counterfactual or threshold motivational requirement for signaling, and so I’m going to assume that I can signal without being motivated by signal value. The reader can decide whether they would want to classify unmotivated signaling as signaling (and economists seem to reserve the term for signals that are both motivated and costly).

I think we can now divide signaling into four important categories that track how accurate the signal is (i.e., whether the sender actually has the property to the relevant degree) and how motivated the agent is by the signal value. I’ll label these as follows:

Innate signaling involves sending an accurate signal without being consciously or unconsciously motivated by sending the signal. If a child is hungry and eats some bread from the floor for this reason alone, then she is innately signaling hunger to anyone who sees her.

Honest signaling involves sending an accurate signal that one is consciously or unconsciously motivated by. If a child is hungry and eats some bread from the floor to show her parents that she is hungry, then she is honestly signaling hunger.

Deceptive signaling involves sending an inaccurate signal that one is consciously or unconsciously motivated by. If a child is not hungry and eats some bread from the floor to get her parents to believe that she is hungry and give her sweets, then she is deceptively signaling hunger.

Mistaken signaling involves sending an inaccurate signal that one is not consciously or unconsciously motivated by. If a child is not hungry and eats some bread from the floor because she is curious about the taste of bread that has fallen on the floor, then she is mistakenly signaling hunger to anyone who sees her.

Since motivation and accuracy come in degrees, signaling behavior comes on a spectrum from more honest to more innate, and more deceptive to more mistaken, and so on. (If you think that agents must be consciously or unconsciously motivated to send a signal in order for them to be signaling, then innate signaling and mistaken signaling will not be signaling at all. I have shaded these darker in the diagram above to reflect this.)

So when is it unethical or blameworthy for agents to engage in signaling? It seems pretty clear that innate signaling will rarely be unethical or blameworthy. If an agent innately signals that she is selfish, then we might think that she is unethical or blameworthy for being selfish but not that she is unethical or blameworthy for signaling that she is selfish. The same is true of mistaken signaling. If an agent is not negligent, but mistakenly signals something that is not true — for example, she appears more altruistic than she is because someone mistakes a minor act of kindness on her part for a great sacrifice — then we presumably don’t think that she is responsible for accidentally sending inaccurate signals to others. We might think that she can be blamed if she is negligent (e.g., if she had the ability to correct the beliefs). But if her actions were not consciously or unconsciously motivated by their signal value, then we’re unlikely to think that she can be accused of signaling in a way that is unethical.

If this is correct, then most occasions on which we think that agents can be aptly blamed for signaling are when these agents are motivated in whole or in part by the signal value of their actions (in other words, even if we do think that innate signaling and mistaken signaling are possible, we don’t think that they’re particularly blameworthy). But things are tricky even if we focus on motivated signaling, because we have already said that an agent can be consciously or unconsciously motivated by the value of sending a signal. Let’s focus only on motivated signaling and adjust our y-axis to reflect this distinction:

The more that a behavior involves conscious deceptive signaling, the less ethical it is, all else being equal. This is because conscious deceptive signaling involves intentionally trying to get others to believe things that are false, which we generally consider harmful. If I become a vegetarian in order to deceive my boss into thinking that I share her values when I don’t, then the motives behind my action are blameworthy, even if the action itself is morally good.

Unconscious deceptive signaling seems less blameworthy. Suppose that I’m a deeply selfish person but help my elderly aunt once a week. Without realizing it, I’m actually doing this in order to mitigate the evidence others have that I’m selfish. This isn’t as blameworthy as conscious deception, but we might want to encourage people to avoid sending deceptive signals to others. And so here we might be inclined to point out to someone that they are in fact deceiving people, even if they are not doing so consciously.

As I mentioned above, signals can be deceptive to greater or lesser degrees. For example, suppose that I give 10% of my income to charity, but that if I were to suddenly not gain at all personally from being able to signal my charitable giving I would only give 8% of my income to charity. Suppose that giving 10% signals “I am altruistic to degree n” and giving 8% signals “I am altruistic to degree m“, where n>m. Let’s call a trait ‘robust’ insofar as one has the trait even if they were to lose the personal gain from signaling that they have it (this is distinct from the counterfactual of not being able to signal at all, since signaling can have moral value). The deceptive signal that people receive is “Amanda is robustly altruistic to degree n” when the truth is that I am only robustly altruistic to degree m. If this is the case, then my signal is much less deceptive than the signal of someone who would give nothing to charity if it were not for the self-interested signaling value of their donations.

Finally, what about honest signaling? Honest signaling cannot be criticized on the grounds that it is deceptive, but we might still think that honest signaling can sometimes be morally blameworthy. For example, suppose that I were to give 10% of my income to charity and, when asked about it, was explicit that I thought that if I wouldn’t personally benefit from telling people about my giving, I’d only give 8% of my income to charity. I haven’t attempted to deceive you in this case. Nonetheless, we might think that being motivated by self-interested signaling value is morally worse than being motivated by the good that my charitable giving can do because the latter is more robust than the former (the former is sensitive to things like the existence of Twitter or an ability to discuss giving among friends, while the latter is not). I suspect that this is why honest conscious signaling causes us to think that the agent in question has “one thought too many”, while unconscious honest signaling still makes us feel like the person’s motivations could be better, insofar as we don’t think that being motivated by signaling value is particularly laudable.

Note that this criticism only seems apt in domains where we think that self-interest should not be an undue part of one’s motivations: i.e., in the moral domain. We are not likely to chide the trainee pilot if she pays $100 to get a certificate showing that she has completed her training because this is a domain in which self-interest seems permissible. Similarly, the criticism only seems apt if the agent is motivated by the value of the signal for her. If someone advertises their charitable donation to normalize donating and encourage others to donate, then they are motivated by the moral value of their signal and not by its personal value. This motivation does not seem morally blameworthy.

If I am correct here, then critical accusations of signaling can be divided into two distinct accusations: first, that the person is being consciously or unconsciously deceptive, and second, that the person is being motivated by how much sending a signal benefits them personally, when this is worse than an alternative set of motivations: i.e., moral motivations. Since this can be consciously or unconsciously done, the underlying criticisms are as follows:

(1) Conscious deceptive signaling: you are consciously generating evidence that you have property p to degree n, when you actually have property p to degree m, where m ≠ n

(2) Unconscious deceptive signaling: you are unconsciously generating evidence that you have property p to degree n, when you actually have property p to degree m, where m ≠ n

(3) Conscious self-interested motivations: you are being consciously motivated by the personal signal value of your actions rather than by the moral value of your actions

(4) Unconscious self-interested motivations: you are being unconsciously motivated by the personal signal value of your actions rather than by the moral value of your actions

Note that if an agent is signaling honestly then she can only be accused of (3) and (4), but if she is signaling dishonestly then she can be accused of (1), (2), (1 & 3) or (2 & 4).

Claims that one is doing (3) or (4) only arise in the moral domain, and only if the agent is non-morally motivated to send a signal. Even when these conditions are satisfied, the harm of (3) or (4) can be fairly minor and forgivable, especially if the action that the person undertakes is a good one. It’s presumably better to do more good even if we are, to some small degree, motivated by the personal signaling value that doing more good affords. But let’s accept that each of (1) – (4) is, at the very least, morally suboptimal to some degree and that we can be justified in pointing this out when we see it. The question then is: how do we identify instances of (1) to (4), and how bad they are?

In order to claim that an agent is engaging in unconscious deceptive signaling, we need to have some evidence that she doesn’t actually have the property to the degree indicated. In order to claim that she is engaging in conscious deceptive signaling, we need to have some evidence that she also knows that this is the case. And in order to claim that an agent has self-interested motives, we have to have some evidence that she is being consciously or unconsciously motivated by the personal signaling value of her actions, and not by their moral consequences (with signal value being mostly a side-effect).

I think that it’s important to note that criticisms of people for signaling must have one of these components. It’s too easy to claim that someone is “just signaling” where the implication is that they are doing so wrongly and for the person in question to feel that they have to defend the claim “I am not signaling” rather than having to defend the claim “I am not being deceptive nor being unduly motivated by personal signaling value”.

The key problem we face is that whether or not an agent is signaling inaccurately, and whether or not she is being unduly motivated by self interest will often be underdetermined by the evidence. Suppose that you see someone tweet “I hope things get better in Syria.” If you claim that this person is merely ‘virtue signaling’, then you presumably mean that (i) they are consciously or unconsciously trying to make themselves appear more caring than they actually are, or (ii) they consciously or unconsciously sent this message because of the personal value it had for them rather than out of genuine care (or both). But we can’t really infer this from their tweet alone. The person might actually be as caring as this message indicates (i.e., the signal they send is accurate), and they might be primarily motivated by the signal value only insofar as it is impersonally valuable (i.e., because it normalizes caring about Syria and informs people about the situation). Someone might think that if the agent actually cared about people then they would focus on some different situation where more people are in peril, but the person tweeting about Syria also be focusing other causes, or they might simply not know about how much suffering different situations are causing, or they might not believe in that sort of ethical prioritization.

So what counts as evidence that someone is engaged in a morally egregious form of signaling? In support of (1) or (2), we can have independent evidence that the person lacks the property that they profess to have. For example, if someone claims that systemic social change is the most important intervention for the poor and yet does nothing to bring about systemic social change, we can infer that they are not very motivated to help the poor. Insofar as engaging in discussion about what is the best way to help the poor seems to send the signal that one helps the poor, we can infer that this signal is deceptive. In support of (3) or (4), we can have evidence that the person is unduly motivated by the personal signal value of their action. For example, if someone does the minimum that would be required to make them look good but less than what would be required if they were genuinely motivated to do good, then it seems more likely that they are being motivated by personal signaling value. An example might be a company that makes a token donation to charity in response to a PR disaster. In this kind of case, it seems we have some evidence that the charity is trying to appear good, rather than trying to genuinely correct the harm that led to the PR disaster in the first place.

I think we can take a few useful lessons from all this. The first is that it’s a bad idea to simply accuse people of “signaling” because signaling can mean a lot of things, and not all signaling is bad. The second is that if we are going to make such an accusation, then we must be more precise about whether we are objecting because we think they are sending deceptive signals, or because we think they are being unduly motivated by personal signaling value. The third is that we should be able to say why we think they are consciously or unconsciously being deceptive or unduly motivated by personal signaling value, since a lot of behavior that is consistent with blameworthy signaling is not in fact an instance of blameworthy signaling. The fourth is that we should identify how bad a given instance of signaling is and not overstate our case: if someone is only a little motivated by signaling value, whether consciously or unconsciously, then they have hardly committed a grave moral wrong that undermines the goodness of their actions. None of this nuance is captured if the name of the game is simply to see some apparently virtuous behavior and dismiss it as a mere instance of ‘virtue signaling’.

Vegetarianism, abortion, and moral empathy


Summary: When people disagree about moral issues, they often fail to treat the moral beliefs of those that they disagree with as genuine moral beliefs. They instead they treat them like mere whims or mild preferences. This shows a lack of what I call moral empathy. I argue that lacking moral empathy can be harmful and can prevent fruitful discussion on divisive topics.

When people disagree about moral issues, they often don’t treat the moral beliefs of those that they disagree with as genuine moral beliefs: instead they treat them like mere whims or mild preferences. They lack what I am going to call moral empathy. Having moral empathy for someone doesn’t mean that you agree with their moral views: it just means you recognize that someone genuinely believes that something is morally right or wrong, even if you happen to think that they are incorrect, and that you treat their beliefs like genuine moral beliefs rather than mild preferences. I think that a failure to cultivate moral empathy is bad for two reasons: it causes us to harm people unnecessarily, and it prevents meaningful dialogue from happening between people who morally disagree.

Let’s start with an example. I think it’s a good idea to brush your teeth every night, but I don’t think it’s a moral obligation. But someone might, for religious reasons perhaps, believe that they are morally required to brush their teeth every night. They don’t merely prefer it: they think that failing to brush their teeth is wrong, in the same way that I think that attacking a person for no reason is wrong. Having moral empathy means that I correctly model their attitude towards teeth-brushing – one that gives teeth-brushing moral significance – even if I disagree with their reasons for having that belief. Treating this belief like a genuine moral belief doesn’t mean that I need to always accommodate it if doing so is too difficult or harmful. But it does mean that I should treat it like a genuine moral belief when I discuss it with them, and that I should accommodate their preference in the same way that I would any preference that, if violated, would cause the person significant harm.

The teeth-brushing example is, I admit, not very realistic. But I’ve seen some clear failures of moral empathy occur with real world moral beliefs. A salient example of this is ethical vegetarianism. I have had many conversations with people who complain about vegetarians and vegans coming to parties or restaurants, and expecting their weird tastes to be accommodated. But ethical vegetarians and vegans are not merely acting on a whim: they think that it’s morally wrong to eat meat. If you were to be told that ritual cannibalism was practiced by your friends, you would presumably say “either don’t serve me human flesh for dinner, or I’m not coming to your house” (you might even say a little more than this: e.g. “please stop eating people” or “I’m calling the police”). If it’s reasonable to want your anti-cannibalism moral beliefs to be accommodated, then why is it not reasonable for the vegetarian to want their anti-meat eating beliefs to be accommodated?

People have even thought that it’s acceptable or funny to trick vegetarians into eating meat. It’s cruel enough to trick someone into eating something they don’t like the taste of (surely we should try to accommodate mild preferences too). It seems even more cruel to trick someone into doing something that they believe is wrong simply because we don’t agree that it’s wrong. After all, we’d be rightly horrified and upset if we went to our friend’s house and were tricked into eating human flesh disguised as beef or pork.

Another case in which we often see a lack of moral empathy is the abortion debate, where those who are pro-choice often show a lack of moral empathy towards those who are opposed to abortion. Many people who believe that abortion is wrong think that fetuses have the moral status of persons, and that abortion is morally equivalent to murder. But a lot of the things that I’ve heard pro-choice people say don’t make any sense unless you presuppose that those who are anti-abortion don’t actually hold these moral beliefs, but rather have something like a personal dislike of abortion.

For example, consider claims like “women have a right to do what they like with their bodies”, or “men have no place discussing the issue of abortion” or “if you don’t like abortion, then just don’t have one”. Now imagine a world in which it is legal for men to kill young children, and that they do so regularly. Presumably, in this world, you would campaign for this to be made illegal (I know I would!). But suppose someone who defends the view that this practice remain legal were to insist that “men have a right to do what they like with their bodies”. You’d respond: “no they don’t – they don’t have the right to murder other people with their bodies.” (Someone directed me to a related quote from Nozick: “My property rights in my knife allow me to leave it where I will, but not in your chest.”) Similarly, if they insisted that “women have no place discussing this issue” you’d respond: “yes they do: this is a moral issue that involves the harming of children, and it doesn’t make sense to only let the group allowed to partake in the practice to discuss it”. Finally, if they were to respond “well, if you don’t like the killing of children, then just don’t do it” you’d presumably respond: “um, no, I’m also going to try to stop you from doing it too”.

Given this, it seems odd that people who are pro-choice often respond in completely analogous ways in response to those who are anti-abortion. Perhaps the goal is to paint those who hold anti-abortion beliefs as more unreasonable than they actually are (i.e. as having a mere preference against abortion that they are devilishly trying to impose on others). But painting people in an unfair light is hardly a morally admirable practice. Perhaps those who are pro-choice believe that those who claim to hold moral anti-abortion beliefs are in fact being disingenuous, and that they don’t actually believe that abortion is immoral. But it’s clear that lots of people hold moral views that we find quite alien, so why should we assume that this group of people are being disingenuous when they claim to believe that abortion is immoral?

Statements that betray a lack of moral empathy are not very likely to be effective when it comes to convincing those that we disagree with. Saying things like “if you don’t like abortion, then just don’t have one” already presupposes that abortion is morally unproblematic, which is exactly what the person with anti-abortion beliefs wants to deny. If we have moral empathy for our interlocutor, then we are better able to identify the point of disagreement between us. For example, we might realize that we disagree about when fetuses become persons, or the degree to which personhood is morally relevant, or the importance of bodily autonomy over and above the interests of beings dependent on us. These all seem like reasonable points of disagreement that we can make progress on, and focussing on the actual points of disagreement will at least prevent us from infuriating each other needlessly.

Cultivating moral empathy is important. As we have seen, a lack of moral empathy can cause us to harm people unnecessarily, because we end up treating strong moral preferences like mild preferences, or even ignore them altogether. It can also lead to predictable dialectical failures, because we don’t actually engage with the beliefs that could change someone’s mind on an issue. This doesn’t mean that we always need to agree with or accommodate moral beliefs that we think are incorrect. Suppose that, for some bizarre reason, you think that you’re morally obligated to sacrifice kittens. I can tell you that you are wrong to sacrifice kittens, while still acknowledging that you believe that you are morally obligated to do so. I can also try to pass laws that prevent you from sacrificing kittens, because I think that your moral beliefs are incorrect, regardless of how sincerely they are held. But none of this requires treating you as though you had a mild preference for sacrificing kittens or are doing it on a whim, and treating you in this way makes it even less likely that I will be able to convince you that your sincerely held moral beliefs are incorrect.

Can we offset immorality?


Summary: People offset bad actions in various ways. The most salient example of this is probably carbon offsetting, where we pay a company to reduce the carbon in the atmosphere by roughly the same amount that we put in. But there are arguably more mundane examples of acts that are intended to offset immoral behavior. In this post I ask what moral offsetting is and whether it is something we should be in favor of.

People ‘offset’ bad actions in various ways. The most salient example of offsetting is probably carbon offsetting, where we pay a company to reduce the carbon in the atmosphere by roughly the same amount that we put in. But there are arguably more mundane examples of acts that look a lot like offsetting (“I know I promised I’d make it to your game tonight, but I have to work late. I’ll take you out to dinner to make up for it!”). Let’s call an action intended to offset immoral behavior ‘moral offsetting’. In this post I want to ask a couple of questions: first, what is moral offsetting? Second, is it something we should be in favor of?

What is moral offsetting? Here’s one natural account: moral offsetting is making up for a harm by performing a compensatory action of equal or greater moral value. It presumably has to be an action that you wouldn’t have taken otherwise: it’s not moral offsetting if I don’t increase my carbon donations, or if I was already going to take you to dinner, because the relevant thing is what would have happened. So the idea is that your offsetting action genuinely makes a difference, because even if there’s a possible world where you do the right thing and do the offsetting action, the offsetting action isn’t something you actually will do unless you behave immorally, just like you normally won’t give to carbon offsetting organizations if you’re not going to be using any carbon. So we have three worlds that might be brought about:

GOOD: I don’t work late and make it to the game tonight, fulfilling my promise.

OFFSET: I work late and miss the game, but take you out to dinner.

BAD: I work late and miss the game, and don’t take you out to dinner.

Sometimes when we offset we are trying to prevent a harm from happening at all. I think some people think this is what is happening when we carbon offset, but I actually suspect that view of carbon offsetting is wrong. So to take a different example, suppose I take one of your yoghurts from the fridge knowing that I can replace it with the same type of yogurt before you get home. Taking the yoghurt would have harmed you if I hadn’t ‘offset’ my action by replacing it with one from the store. I offset to prevent a harm from happening.

In other cases of offsetting we are letting a harm happen, but are trying to compensate for it by giving something of equal or greater value to the person or people harmed. If you would much rather I take you out to dinner and break my promise to see your game, then you might be quite happy with my offer. I’m better off because I’d rather work late and buy you dinner than satisfy my promise, and you are better off because you’d also prefer this, even though you’d be pretty annoyed if I broke my promise without offering to take you to dinner.

But can we morally offset harms in cases where a harm has or will occur, and where we cannot compensate those who are harmed by it? Suppose, for example, I am deciding whether to eat a steak or not. I believe that eating a steak is wrong because it incentivizes people to bring into existence future cows that will have bad lives. My eating the steak doesn’t harm the cow that the steak comes from – they’re already dead – but it does, in expectation, harm some future cow (of course, given elasticity of demand, my particular steak may have no impact). But even if some cow is brought into existence as a result of my eating the steak, it will be virtually impossible for me to help that cow. How can I pick that cow out among all of the other cows brought into existence? But perhaps I can still morally offset my eating of the steak. Imagine I can choose between the following three worlds:

GOOD: I don’t eat the steak.

OFFSET: I eat the steak and donate \$50 (that I would have otherwise spent on new sneakers) to an effective animal charity.

BAD: I eat the steak and don’t give anything to charity.

Let’s suppose that the expected percentage of a cow that’s created when I eat a steak causes 10 units of expected harm in the world, and that my \$50 creates 50 units of expected wellbeing in the world: more than enough to compensate the harm. Since the overall wellbeing of the world is at least as good in OFFSET as it is in GOOD (the cosmic moral balance has been restored!) should we not conclude that if bringing about GOOD is permissible then bringing about OFFSET is also permissible? This at least seems plausible on harm-based accounts of moral permissibility.

Cases like this one involve us forcing a trade of harms between distinct agents: in the case above we are forcing a harm on an expected cow in order to give increased wellbeing or reduced harms to some (presumably different) set of actual/expected animals. In doing so, we make the world a better place overall. This might not be acceptable on most justice-based accounts of ethics, but it at least seems plausible that such forced trades are permissible on harm-based accounts.

But if this is why we think that it’s acceptable to morally offset, then it’s not clear that we should care about the similarity of the two agents we’re forcing to trade harms. Suppose that I could create 50 units of expected wellbeing by donating just \$30 to some charity that helps humans rather than animals. If we think that the reason morally offsetting was acceptable in the case above is that OFFSET is a better world, wellbeing-wise, than GOOD, then surely this would mean that we are permitted donate \$30 to the human charity rather than \$50 to the animal charity. After all, why does it matter if I force a trade between an expected cow and expected/actual animals, or between an expected cow and expected/actual humans? Superficial resemblances between those who are harmed and those who are benefited seems morally irrelevant in cases like this. As long as we make very sure that we create at least as much good in the world as we do harm, the act of eating the steak and offsetting is morally equal to or better than the act of not eating the steak on the harm-based account.

There are, of course, a few objections that one can level against the offsetting view, even if we accept a harm-based account of moral permissibility. The main objection I foresee people raising is that in cases where we can morally offset, we also have an additional choice of world available to us – one where we perform the good action and the offsetting action. For example, in the promise case I could have brought about the following world:

BEST: I make it to the game and take you out for dinner.

And in the steak case I could have brought about the following world:

BEST: I don’t eat the steak and I donate the money to charity.

Even if we think that OFFSET is at least as good as GOOD, it’s obvious that BEST will always be better than OFFSET and so, at least according to maximizing views, I should always bring about BEST rather than OFFSET. And since bringing about BEST means not acting immorally, I’m never permitted to act immorally and then offset.

There are a couple of things that we can say in response to this. It’s worth pointing out that – on this view – we’re also never permitted to bring about GOOD. That is, we’re never permitted to just keep promises and be vegetarians, because we are obligated to do as much good as we can in addition to this. This level of demandingness is consistent with maximizing views of course, but it still means that GOOD is not more morally permissible on these views than OFFSET is (and in many cases GOOD may be much worse than OFFSET).

What’s more, it’s not clear that BEST is really what we should be comparing either GOOD or OFFSET to. As I said at the beginning, moral offsetting involves undertaking an action that you wouldn’t undertake if it weren’t for the fact that you wanted to offset your immoral action. So the fact that there’s an even better world where you offset despite having done nothing wrong seems like an irrelevant counterfactual. This essentially boils down to the debate between actualism and possibilism in ethics. Imagine you are trying to decide whether to go to the movies with your friends or not. You know that you ought to finish grading papers tonight, and that if you go to the movies then you are sure you will get back late and fall asleep without grading the papers. But your friends will be mildly disappointed if you don’t go to the movies. The actualist says: don’t go to the movies, because if you do that then you won’t grade the papers, and a ‘grading, no movies’ world is morally preferable to a ‘movies, no grading’ world. The possibilist says: but it’s at least possible for you to do both, and since a ‘movies, grading’ world is better than either of these worlds, you ought to go to the movies! If, like me, you find the possibilist’s position implausible, then you also have reason to doubt that their appeal to BEST as an argument against moral offsetting.

Finally, a large worry with the moral offsetting view is that it could be used to justify any degree of immoral action. Couldn’t we use this argument to justify stealing, torture, or any other wicked act, as long as we were willing to pay a very high price in moral compensation? At first glance, there’s no obvious reason why the moral offsetting argument shouldn’t extend to highly immoral actions, and I think that those who defend harm-based views in ethics should find that troubling.

There are a few different things that the harm-based ethicist could say in response to this, however. First, they could point out that as the immorality of the action increases, it becomes far less likely that performing this action and morally offsetting is the best option available, even out of those options that actualists would deem morally relevant. Second, it is very harmful to undermine social norms where people don’t behave immorally and compensate for it (imagine how terrible it would be to live in a world where this was acceptable). Third, it is – in expectation – bad to become the kind of person who offsets their moral harms. Such a person will usually have a much worse expected impact on the world than someone who strives to be as moral as they can be.

I think that these are compelling reasons to think that, in the actual world, we are – at best – morally permitted to offset trivial immoral actions, but that more serious immoral actions are almost always not the sorts of things we can morally offset. But I also think that the fact that these arguments all depend on contingent features of the world should be concerning to those who defend harm-based views in ethics. Such views at least seem to allow that it could, at least in principle, be better to commit a gravely immoral action and then offset than to fail to commit the gravely immoral action in the first place. I imagine that many of us, if presented with a case of this sort, would be inclined to reject any moral theory that entailed such a conclusion.

Prison is no more humane than flogging


Summary: Many people believe that corporal punishmenthas no place in a modern criminal justice system. Imprisonment is seen as a more humane form of punishment, and it is one that is employed in most modern criminal justice systems. In this post I ask why we think that imprisonment is humane while corporal punishment is not. I think this should cause us to question the ethics of imprisoning people.

Many people believe that corporal punishment - the infliction of pain for the purposes of punishment through caning, beating, whipping, amputation, electrocution, branding, and so on – has no place in a modern criminal justice system. Instead imprisonment is seen as a more ‘humane’ form of punishment, and it is one that is employed in most modern criminal justice systems. But why do we think that imprisonment is humane while corporal punishment is not? Here are a few reasons you might think this, and why I don’t think they work.

(1) Imprisonment is more humane because it causes less suffering than corporal punishment

We might argue that imprisonment is more humane than corporal punishment because imprisoning people causes them less suffering than corporal punishment would, and that it’s wrong to cause people the level of suffering brought about by corporal punishment.

Imagine that you have been convicted of a crime and are presented with a choice: you can either spend 10 years in a US prison, or you can experience a single lash of the cane. Which would you choose? I am going to guess that you, like me, would choose the cane. (Although the choice offered here is hypothetical, some people have suggested that we should actually offer this kind of choice to convicts.) So it’s clearly not the case that any amount of imprisonment is better than any amount of corporal punishment.

What kind of corporal punishment would you need to be offered before you think it would be reasonable for you to be indifferent between that punishment and 10 years in prison? Bear in mind that 10 years in a US prison is 10 years in which you’ll have limited access to your friends and family, 10 years of your career that you’ll lose, 10 years in which you’ll lose autonomy over when you eat, sleep, and exercise, and – perhaps most disturbing to us – 10 years in which you may face the threat of violence and sexual assault.

I think that the corporal punishments we think are equivalent to 10 years in a US prison are worse than many of us would want to admit. For example, I am pretty sure I would prefer to have two of my fingers amputated than go to a US prison for 10 years. It’s not that I think I’d prefer to have two fingers amputated because I am simply failing to think rationally when I think about what I would do if I faced such a terrible choice. It seems to me that I prefer the corporal punishment of having my fingers amputated because I can be fairly certain that I would suffer less if I were to choose this in order to avoid spending 10 years in a US prison. But if amputating two of my fingers for a given crime would not be considered humane, then it is difficult to see how a 10 year prison sentence for the same crime could be when it’s plausible that this causes more suffering.

Imprisonment therefore cannot be more humane than corporal punishment because it causes less suffering. If anything, the suffering imposed by imprisonment seems comparable to the suffering imposed by fairly severe forms of corporal punishment.

(2) Imprisonment is more humane because it spreads out the suffering across time

It might be objected that corporal punishment is more inhumane than imprisonment because it causes a large amount of suffering in a shorter period of time. And perhaps it’s more humane to spread out someone’s suffering over a long period of time, rather than to force them to experience it all at once. (This would also mean that longer, less severe corporal punishments would be more humane.)

But it seems unlikely that a long, less severe punishment is more humane than a short, severe one. Suppose you can either suffer through a painful migraine for an hour or a dull toothache for six months, and you would much prefer to just get it over with and have the migraine. Would forcing you to endure the toothache for six months really be the more humane choice? Surely not.

(There may be a practical argument lurking here: after all, if we reduce the amount of suffering per second that a government can inflict on someone, then we lower the upper bound of suffering that the government can inflict on any one person.)

(3) Imprisonment is more humane because of its qualities and not its severity

It might be argued that the humaneness of a punishment is related more to its qualitative features than its severity or the amount of suffering it causes. In other words, whipping someone is less humane than imprisoning them because of the qualitative features of whipping, even if imprisonment causes more suffering. But it is difficult to see how it could be more humane to force someone to endure a punishment that causes them more suffering, regardless of what qualitative features the alternative punishment has. In other words, it is difficult to see how x can be more humane than y if almost everyone would prefer to experience y in order to avoid experiencing x.

(4) Imprisonment is more humane because it’s more effective

Finally, you might be thinking that corporal punishment is simply not as effective as imprisonment. In particular, corporal punishment does not prevent people from committing crimes by removing them from society or rehabilitating them. But even if we assume that corporal punishment is less effective than imprisonment when it comes to preventing people from committing crimes, this doesn’t give us any reason to think that imprisonment is more humane than corporal punishment. It is merely an argument for employing imprisonment as a punishment, regardless of how inhumane it is. Suppose we discovered that pulling out someone’s fingernails for the rest of their life turned out to be the most effective way of preventing people from committing crimes, and that it’s many times more effective than imprisonment. Would we conclude that a lifetime of pulling out someone’s fingernails is a humane form of punishment?

This is not to say that the effectiveness of a given form of punishment isn’t important. The main goals of punishment are to achieve retribution, to restore the losses of the victims, and to prevent further crime from occurring (by incapacitating and rehabilitating the criminal, and by disincentivizing others from committing crimes). In order to achieve these things, we may need to inflict some suffering on criminals. And the most ethical punishment is presumably the punishment that can achieve these goals with as little excess suffering as possible. If imprisonment is the most ethical punishment by these standards, which seems doubtful, then this does not make it more humane: it just makes it a more effective form of caning.

Is the born this way message homophobic?


Summary: The message of “born this way“ is that your sexual orientation is something you’re born with rather than something you choose. This is considered an important point in the justification of gay rights. I’m a strong supporter of gay rights, but I realised just over a year ago that something about this slogan didn’t sit right with me. I’m now pretty confident that basing gay rights on the “born this way“ message can be pretty harmful to LGBT people and other oppressed groups.

The message of “born this way” is that your sexual orientation is something you’re born with rather than something you choose. And it’s considered an important point in the justification of gay rights. I’m a strong supporter of gay rights, but I realised just over a year ago that something about this slogan didn’t sit right with me. I’m now pretty confident that basing gay rights on the “born this way” message can be pretty harmful to LGBT people and other oppressed groups. Here’s why.

1. It implies that being gay is immoral

Suppose that John is a violent criminal because he was born with an inoperable tumor pressing against the parts of his brain regulating aggression. Suppose that Emma is a violent criminal because she enjoys being a violent criminal. We probably think that Emma deserves more blame for her behavior than John does, since John couldn’t easily avoid behaving in a violent way and Emma could. So we seem to think that being “born this way” can mitigate blame for actions that are bad. (We might also think that being “born this way” can mitigate praise for actions that are good.)

If, however, some action is morally neutral – like watching baseball regularly – then we don’t really care about whether a person is born with a disposition to behave in that way. We don’t care about whether people are born with a disposition to watch baseball regularly, and we don’t care if they couldn’t easily avoid watching baseball regularly, because watching baseball regularly simply isn’t blameworthy behavior.

Using the argument that gay people were “born this way” already implies we think they’re doing something wrong, and that their behaviour has to be justified on the basis that they couldn’t help behaving in the way that they do. If there’s nothing wrong with being gay, then we don’t need to worry about whether people are “born this way” in order to justify their rights to things like marriage and equal treatment under the law any more than we need to worry about whether baseball fans were born that way in order to justify extending rights to things like marriage and equal treatment under the law to baseball fans.

2. It’s not going to convince anyone who does think that being gay is immoral

Suppose that although John’s tumor makes him disposed to be a violent violent criminal, it doesn’t force him to actually commit acts of violence: it just makes it much harder for him to avoid behaving violently. We might think that this is unfortunate for John, and hope that a treatment will one day be available. But insofar as he has some control over his behaviour, we would still say that it is immoral for John to commit acts of violence. We’d also want to do everything we could to prevent John from harming people. But we certainly wouldn’t grant John the right to be violent just because he was born with a strong disposition to do so.

Similarly, if someone thinks it’s immoral to have same-sex partners, then the “born this way” argument is at most going to make them see gay people as less blameworthy than we thought before – but not that that their behavior is, ultimately, less immoral. They might think it’s unfortunate that some people are born with a strong disposition to have same-sex partners, but they’re still going to say that gay people shouldn’t form same-sex partnerships insofar as they have some control over their behaviour. They’re also going to want to take steps to prevent gay people from forming such partnerships, and they’re certainly not going to want to grant gay people the right to have such partnerships. Their attitudes to gay people will mirror our own attitudes towards John.

So arguing that people are born gay isn’t going to convince anyone who thinks it’s immoral to be gay. When we say “they can’t help it”, we’re not actually arguing that someone’s behaviour isn’t immoral – just that they’re not as blameworthy as we once thought. Instead of arguing that people can’t be blamed for being gay because they are born gay, we need to argue that there’s nothing wrong with being gay in the first place.

3. It grounds gay rights on something that could turn out to be false

Edward Stein raises the good point that is that it’s dangerous to ground a defense of gay rights on an empirical hypothesis that could turn out to be false. We don’t yet have enough evidence to know for sure that sexual orientation is something you’re born with, and something you can’t change (albeit with a lot of effort.) Suppose we were to find out that people actually do have significant control over the gender they are attracted to. Would this mean a significant reason to support gay rights would have been undermined? Would we think it was ok to then revoke those rights? Surely not.

It’s also important to note that there’s a distinction between whether someone was born with a certain trait or disposition, and whether they have control over it. I was born with brown hair, but that doesn’t mean I can’t change my hair colour if I want to. The “born this way” defense of gay rights is really the “they can’t change it” defense. But even if sexual orientation is an innate trait, we might develop drugs or therapy in the future that would let us change our sexual orientation in the same way that we currently change our hair color. This at least seems possible, and it would mean that every gay person would essentially be gay by choice. We presumably don’t want to say that this would severely undermine the case for gay rights. And yet it seems like this is what is implied if we think we should extend rights to gay people primarily because they currently have no say over their sexual orientation.

4. It’s offensive to many LGBT people and other minorities where choice is a factor

Some people do seem to feel they have a choice about whether to enter into gay relationships or not. Three years ago, Cynthia Nixon faced a lot of outcry after claiming that, for her, being gay was a choice. She later clarified that she felt her bisexuality was not a choice, but that her decision to be in a gay relationship was a choice. Some bisexual people do seem to feel they have a choice over the gender of the people that they enter into relationships with. But if gay rights are founded on the fact those people have no choice, then it seems we shouldn’t extend those rights to people who can choose whether they enter into same-sex relationships or not, like bisexual people. This seems pretty absurd.

Here’s another example: the ruling in favor of gay marriage has led some, like William Baude, to ask whether group marriage should be made legal. At the time of writing, the top comment on this piece, with over 700 recommendations, says: “Gay people are born gay and have no choice about marriage — they can either marry someone of their own sex, or they can’t (honestly) marry anyone at all. That is vastly different from people saying that they would prefer to marry a dog, or three women, or whatever.” And this sentiment is reflected in other top comments.

Setting aside the horrible comparison between polyamory and bestiality, if we accept the idea that group marriage shouldn’t be legal because polyamorous people are making a choice, then shouldn’t we also deny same sex marriages to bisexual people who could have pursued heterosexual relationships instead? This seems like a pretty abhorrent (and bizarre) situation – where whether or not we allow people to enter into a relationship or not, or to get married or not, depends on whether they really had no other option. The implication is that the ‘other option’ (a straight relationship or a monogamous relationship) would be much better, and gay marriage or group marriage are really only acceptable as a kind of ‘last resort.’

It may be that there’s more going on with the “born this way” slogan than I have appreciated. If so, then I hope people will tell me. But my current view is that defending the better treatment of LGBT people by appealing to “born this way” reasoning is both ineffective and harmful. It doesn’t actually defend the claim that homosexuality is not immoral – it just says that being gay isn’t blameworthy – which implies that there’s something to be blamed for. But we don’t need to make excuses for consenting adults to engage in non-harmful behavior with one another, whether that involves same-sex relationships or polyamorous relationships or anything else. And we don’t need to defend our right to engage in such behavior with an apologetic slogan which says that we just couldn’t help ourselves.

Common objections to Pascal’s wager


Summary: In this post I respond to some of the common objections to Pascal’s wager, keeping each response to under 100 words!

I am interested in Pascal’s wager, fanaticism problems, and infinite decision theory. In fact, sometimes I’m even foolish enough to mention these topics over dinner. And when I do there are a series of common objections that I get to Pascal’s wager in particular. I think that Pascal’s wager is in fact a very interesting and difficult problem to which there is currently no completely satisfactory solution. In fact, I think that many of the methods used to get around the wager are worse than simply accepting that the argument is, perhaps surprisingly, valid and sound. But people are nonetheless often very confident that the argument is not a good one. So in this post I’m going to quickly run through each of the most common objections to the wager that I’ve been presented with thus far, and explain why (in under 100 words!) I think that none of them are successful.

Okay, so let’s start things off by giving a simple formulation of Pascal’s wager. There are two possible states of the world: (G) God exists, and (~G) God doesn’t exist. Now there are two actions available to you: (B) Believe in God, and (~B) Don’t Believe in God. What should you do? Well, there are four possible outcomes that will occur when you die, so let’s list them and note down the utility of each: B&G: heaven (infinite utility) B&~G: annihilation (0 utility) ~B&G: either hell (infinite suffering) or annihilation (0 utility) ~B&~G: annihilation (0 utility) Treating ~B&G as if it produces 0 utility lets us avoid some nasty features of infinities for now, so I’ll assume it and then mention those below. It should be obvious that the only way to ‘win’ in such a scenario is to believe in God even if you think it’s very unlikely (but not impossible) that God exists. So to lay out the argument behind Pascal’s wager explicitly: (1) You shouldn’t perform actions with lower expected utility over those with greater expected utility. (2) The expected utility of wagering for God is greater than the expected utility of wagering against God. (3) Conclusion: you shouldn’t wager against God.

That’s the basic argument. And boy does it annoy people. Here I’m going to respond to the common objections to the wager (some more sophisticated, some less). BUT so that this post doesn’t take me forever, I’m restricting myself to 100 words (that’s right, 100 words!) per response. I’m happy to go into further details or discuss objections that I haven’t included here in the comments if anyone wants me to. Sometimes it’s easier to understand the reply to one objection if you already know the reply to another, so I’ve tried to put them in an order that takes that into account. I’ve also put [IC] next to the objections currently mentioned on the Iron Chariots blog entry on Pascal’s wager, since it’s a source a few people have mentioned to me when discussing the problem.

1. There are many gods you could wager for, not just one! [IC]

The basic idea: there are n-many gods that reward belief. If there’s a non-zero chance of getting infinite utility if you wager for a different God, then wagering for any of these gods has infinite expected utility (EU). So the wager doesn’t give you any more reason to believe in God X over any of the alternative gods.

Answer: Suppose you find yourself standing at the gates of heaven. St Peter offers you one of two options: you can walk through door A and go into heaven, or you can walk through door B and have a 1 in 1,000,000,000 of getting into a heaven and a 999,999,999 in 1,000,000,000 chance of being annihilated. Now you want to get into heaven – it’s not some crummy heaven that you won’t enjoy. Do you really think it’s rational to be indifferent between these two options? If you think you should have even the slightest preference for door A, then this objection doesn’t work.

2. Almost all actions have infinite expected utility if wagering for God has infinite expected utility. So if Pascal’s wager is true then I can do almost anything I want to.

The basic idea: If the expected value of believing in god is infinite then any action that has a non-zero chance of ending with me wagering for god is also infinitely valuable. So the wager doesn’t give you any more reason to believe in God than it does to roll a dice and believing in God if 4 comes up, or pick up a beer knowing that it might end with you getting drunk and believing in God.

Answer: The same probability dominance argument applies to the mixed strategies objection. Insofar as you think that rolling a dice or drinking a beer has a lower probability of producing the infinite utility outcome (heaven) than some other action does – such as simply wagering for God now – you ought, all things considered, to perform the action with the higher probability of producing the infinite utility outcome. In this case, that means wagering for god rather than employing a mixed strategy.

3. Doesn’t the wager beg the question? [IC] The basic idea: Pascal’s wager assumes key features of the god it seeks to prove the existence of. For example, that god rewards belief and not non-belief.

Answer: Firstly, the aim of the wager isn’t to prove existence of god: it’s to establish that belief in god is prudentially/morally rational. Now consider the following argument: ‘If you think there’s a >10% chance that there’s a dish in the dishwasher that’s made of china, then you ought to check that the dishwasher is off. You think there’s >10% chance that there’s a dish in the dishwasher that’s made of china. So you ought to check that the dishwasher is off.’ Pascal’s wager doesn’t fallaciously assume characteristics of god any more than this argument fallaciously assumes characteristics of dishes.

4. What about the atheist-loving god? [IC]

The basic idea: Suppose there’s a god that sends all non-believers to heaven and all believers to hell. Given the logic of Pascal’s wager, I ought not to believe in God.

Answer: If it’s rational for you to think that disbelief in God (or cars, or hands) will maximize your chance of getting into heaven, then that’s what you ought to do under PW. What’s the evidence for the belief-shunning God? Possibly: ‘Divine hiddenness’ plus God making us capable of evidentialism. The evidence against? God making us capable of performing expected utility calculations, all the historical testimonial evidence for belief-loving Gods. I suspect the latter will outweigh the former. But if you’re making this objection you’re already on my side really: we’re now just quibbling about what God wants us to do.

5. What about infinite utility producing scientific hypotheses? The basic idea: Okay, so Pascal’s wager doesn’t tell us which God to believe in, just to maximize the probability of gaining infinite utility. But what about the possibility of more naturalistic infinite utility hypotheses (singularity, lab universes, etc.)?

Answer: Given the response to 1, you ought to perform whatever set of actions that has the highest probability of getting you into heaven. Given this, insofar as a belief in a supernatural being or God is consistent with actions that maximize the chance of a scientific means of gaining infinite utility, you ought to do both regardless of which is more plausible. Also, higher cardinalities of infinite utility will dominate lower cardinalities of infinite utility in EU calculations. And supernatural hypotheses may be more likely to produce higher cardinalities of utility than their empirically-grounded cousins.

6. You can’t quantify the utility of heaven.

Answer: The wager doesn’t start by looking at a religious text and trying to work out how good their heaven is. The argument is premised on some infinite-utility outcome being possible, such that you ought to have a non-zero credence in infinite-utility outcomes. It doesn’t matter how inconsistent that outcome is with common conceptions of heaven, as long as it’s in principle possible the argument will go through. You might want to declare that such heavens are absolutely (and not just nomologically) impossible, but it’s hard enough to defend logical omniscience, let alone no-such-thing-as-heaven omniscience.

7. God wouldn’t reward prudentially-grounded belief. [IC]

Answer: You have to take your credence that a given God would reward belief into account when calculating what to do. Suppose you are certain that only two gods are possible: A and B. Each of their heavens produce infinite utility, and they’re equally likely to exist. The only way to get into heaven is through belief, but god A might reward prudentially grounded belief while god B doesn’t (all with certainty). Clearly you ought to wager for A. Suppose god B becomes sufficiently more probably. Then perhaps you ought to try to inculcate non-prudentially-grounded belief in yourself and others!

8. I think God’s just as likely to reward belief as to reward non-belief.

Answer: Suppose that, for action A that has the potential to produce infinite utility (given all of the possible states of the world), A and ~A are just as likely to produce infinite utility. Then you would need to find a tie-breaker between the two, or flip a coin. This doesn’t undermine the argument of the wager. But it seems highly unlikely that belief and non-belief would have exactly the same rational subjective likelihood of getting you into heaven. What could be the evidential basis for this perfect symmetry?

9. What about the problem of evil, etc?

Answer: Evidential considerations for or against a certain god are obviously relevant to what you ought to do or believe, since they are relevant to the likelihood that given actions will produce infinite utility. But Pascal’s wager doesn’t solve (or aim to solve) theological problems like the problem of evil. But its conclusion still holds as long as those problems don’t warrant adopting credence 0 in there being any infinite utility outcome that’s consistent with any action we can perform. It seems unlikely that the standard objections to God’s existence are as devastating as this requires!

10. I have credence 0 (or near enough) in God’s existence.

Answer: The near enough strategy isn’t going to work, unless you add the premise that you ought to treat even extreme-utility outcomes that you have a sufficiently low credence in as though you had credence 0 in them. That seems like a bad principle. If you genuinely have credence 0 in all potentially infinite-utility producing states of the world, credence 1 that you have these credences etc. then you are indeed immune to Pascal’s wager. Would it be reasonable to have such credences? This seems implausible under a standard account of credences, since these states of the world appear to be far from impossible.

11. But we don’t have voluntary control over our beliefs!

Answer: Are you certain that doxastic voluntarism is false? If not, the chance that your voluntary belief could occur and would result in your getting into heaven ought to be taken into account when you’re trying to determine what you ought to do and belief (constructing the full decision procedure for maximizing your chance of gaining infinite utility an interesting task!). But suppose you’re certain that doxastic voluntarism is false: you still ought to try to convince others of God’s existence, give money to organizations that try to do this, etc. The argument would simply support a different set of actions.

12. The wager ignores the disutility of believing in God and the utility of not believing in God. [IC]

Answer: The wager doesn’t ignore either of these: they simply don’t affect the act or belief that it is rational for you to perform or adopt. Suppose that the annoyance of wagering for god is like continuous torture for you. And suppose the utility of not believing in god is extremely pleasurable for you. You still ought to wager for god, since infinite expected utility swamps any finite (dis)utility. Even if the utility of both is infinite (see 2), it’s still probability and not finite utility considerations that determine whether or not you ought to wager.

13. Dammit Jim, it’s just not scientific!

Answer: The wager doesn’t give evidence for god:it’s a moral/prudential argument for belief. The view that your beliefs always ought to be in accordance with your evidence is powerful and useful, but should we be certain that it’s true, and that there are never prudential reasons to hold a belief? If not, then the full force of Pascal’s wager returns, since any non-zero credence that there are prudential reasons for belief is enough to let infinite utility back in. Even if you could be rationally certain in this norm, however, it just changes the actions Pascal’s wager warrants (see 11).

14. That’s not how the maths works.

Answer: Pascal’s wager appeals to the claim that a finite, nonzero chance of getting an infinitely good outcome is better than any probability of a finitely good outcome. We can appeal to something like Bartha’s relative utility theory to get both this result and the result that a greater chance of an infinite outcome is better than a lower chance of the same outcome. It would be somewhat surprising if our accounts of infinity (e.g. hyperreals, surreals) were in conflict with either of these claims. In theories where infinities can be multiplied by finite, nonzero numbers, they tend to produce infinities.

15. The only reason you’d believe this is because you want to believe in God anyway.

Answer: There’s a class of responses to the wager that bring into question your motives for defending the argument in the first place. I don’t really think that motives like wanting to believe in god have much bearing on the efficacy of the argument, but they should probably give you reason to doubt my weighing of the arguments and evidence, etc. But I don’t have such motives. I came to this through intellectual curiosity, though I don’t think that means that I’ll end up finding the conclusions unmotivating.

16. Doesn’t the wager promote an unethical life of belief over an ethical life of non-belief?

Answer: In principle the wager could promote this. But I don’t see any reason to think that this is overwhelmingly likely. It doesn’t necessarily favor adopting a given religion unflinchingly. And if we are more confident that god is more non-malevolent than not and that we haven’t been grossly mislead about the nature of moral truth, then we have strong reasons to act morally. The wager applies to actions as well as beliefs, so even if you think you’ll be ‘forgiven’ for a certain action it’s unlikely that under PW it’ll be worth performing an action you are confident is wrong.

17. The wager is only valid because there are problematic features of infinities.

Answer: The infinite version of Pascal’s wager relies on features of infinities: e.g. that the expected value of an infinitely good outcome is will not be finite. But the uncertainty argument will apply even if you’re pretty certain these features won’t be present in the correct account of infinities. I don’t think that our worries about the wager give us sufficient reason to reject these principles of infinities. In principle we could reformulate much of wager by simply appealing to sufficiently large finite amounts of utility, but Pascal’s wager seems to be consistent with features of infinities that we are happy with in other domains.

18. What if we have bounded utility functions? Aren’t unbounded utility functions problematic?

Answer: Utility functions that are bounded above and below can prevent both positive and negative infinite forms of Pascal’s wager. But there are some obvious drawbacks to this response: 1. It’s ad hoc. What other reason do we have to think we don’t have an unbounded concave utility function over happiness (that isn’t’ just a result of our inability to adequately handle large numbers)? 2. Counterintuitive results at the point where a unit of happiness has no extra value for us, 3. Might not work for non-preference forms of utilitarianism (moral PW argument) and 4. We shouldn’t be certain that out utility function is bounded.

19. If we allow ourselves to be skeptical about mathematical and normative principles, we’ll end up skeptical about everything!

Answer: I don’t think this is the case, for a couple of reasons. Firstly, the question is a bit misleading. The uncertainty I’ve appealed to here isn’t mathematical uncertainty (though I think we can appeal to that as well in some cases) it’s normative uncertainty. And it’s not really skepticism, it’s just taking into account that we shouldn’t be certain that a given normative principle (mentioned in 1) is true. If we end up uncertain about everything like this, I don’t think that would be a bad thing. However, I’ll try to discuss objections to this view in another post.

20. Isn’t this just a reductio of expected utility theory?

Answer: I think that the existence of fanaticism problems presents a huge worry for expected utility theorists who allow unbounded utility functions. In fact, I’m surprised people haven’t written this up as an impossibility theorem with an anti-fanaticism axiom, since it seem you have to either accept the wager, accept other problematic conclusions, or give up on some plausible aspect of unbounded expected utility theory. I don’t think the reductio worry helps people who don’t want to buy Pascal’s wager though, since it doesn’t warrant acting as if some fanaticism-avoiding decision theory were true.

So there you have it: the reasons why – in under 100 words- I’m not satisfied by any of the common responses to Pascal’s wager. If you’re reading this and have any comments/objections or spot any errors (I was quite tired when I wrote this!) please do let me know.