Strata of the World—now moved to https://nosetgauge.substack.com/

Blog moved!

2024-11-10T23:21:00.003+00:00

This blog has moved from Blogger to Substack. The new link is:

No Set Gauge

In addition, I now have a personal website that has a catalogue of my published writings here.

(image credit: Robert McCall. Original here.)

Investigating an insurance-for-AI startup

2024-09-21T16:16:00.000+01:00

We (Flo & Rudolf) spent a month fleshing out the idea of an insurance-for-AI company. We talked to 15 people in the insurance industry, and did 20 customer interviews. We decided not to continue, but we think it’s still a very promising idea and that maybe someone else should do this. This post describes our findings.

The idea

Theory of change

To reduce AI risks, it would be good if we understood risks well, and if some organisation existed that could incentivise the use of safer AI practices. An insurance company that sells insurance policies for AI use cases has a financial incentive to understand concrete AI risks & harms well, because this feeds into its pricing. This company would also be incentivised to encourage companies to adopt safer AI practices, and could incentivise this by offering lower premiums in return. Like many cyber-insurance companies, it could also provide more general advice & consulting on AI-related risk reduction.

Concrete path

TL;DR: Currently, professionals (e.g. lawyers) have professional indemnity (PI) insurance. Right now, most AI tools involve the human being in the loop. But eventually, the AI will do the work end-to-end, and then the AI will be the one whose mistakes need to be insured. Currently, this insurance does not exist. We would start with law, but then expand to all other forms of professional indemnity insurance (i.e. insurance against harms caused by a professional’s mistakes or malpractice in their work).

Frontier labs are not good customers for insurance, since their size means they mostly do not need external insurance, and have a big information advantage in understanding the risk.

Instead, we would target companies using LLMs (e.g. large companies that use specific potentially-risky AI workflows internally), or companies building LLM products for a specific industry.

We focused on the latter, since startups are easier to sell to. Specifically, we wanted a case where:

LLMs were being used in a high-stakes industry like medicine or law
there were startups building LLM products in this industry
there is some reason why the AI might cause legal liability, for example:
the LLM tools are sufficiently automating the work that the liability is plausibly on them rather than the humans
AI exceptions in existing insurance policies exist (or will soon exist)

The best example we found was legal LLM tools. Law involves important decisions and large amounts of money, and lawyers can be found liable in legal malpractice lawsuits. LLMs are close to being able to do much legal work end-to-end; in particular, if the work is not checked by a human before being shipped, it is uncertain if existing professional indemnity (PI) insurance applies. People who work in law and law tech are also, naturally, very liability-aware.

Therefore, our plan was:

Become a managing general agent (MGA), a type of insurance company that does not pay claims out of its own capital (but instead finds a reinsurer to agree to pay them, and earns a cut of the premiums).
Design PI policies for AI legal work, and sell these policies to legal AI startups (to help them sell to their law firm customers), or directly to law firms buying end-to-end legal AI tools.
As more and more legal work is done end-to-end by AI, more and more of the legal PI insurance market is AI insurance policies.
As AI advances and AI insurance issues become relevant in other industries, expand to those industries (e.g. medicine, finance, etc.).
Eventually, most of the world’s professional indemnity insurance market (on the order of $10B-100B/year) has switched from insuring against human mistakes to insuring against AI mistakes.
Along the way, provide consulting services for countless businesses switching to AI-based work, helping them reduce the chance of harm, and incentivising this with lowered premiums.
Stay up-to-date on concrete AI risks, likely funding research focused on measuring it in the real-world. Also, for example the claims history that results will be an automatic, high-quality database of AI harms.

We thought this could be a multi-billion dollar company, a natural for-profit home for concrete AI risk research, and a reducer of existential risk from AI.

How insurance works

Why insurance

Insurance is about cash-flow management. Sometimes, a low-probability accident happens, that either bankrupts the company or just puts an annoying dent in their accounting. If the expected value of such losses exceeded the company’s ability to pay, (competent) insurers would not be willing to sell the policy. But if it’s less, the company can benefit (by e.g. better weathering sudden shocks) by having insurance, and the insurer can make a profit in expectation.

Another way of describing the core function of insurance is as arbitrage between differently-locally-concave utility functions. Assume the policyholder faces a gamble between a good outcome $$x_2$$ and a bad outcome $$x_1$$. If the policyholder’s utility function is more concave than the insurer’s (for example, if it’s the red line below, while the insurer’s is just linear), then the policyholder cares less than the insurer about the difference between $$x_2$$ and $$x_2$$ minus the insurance premium rate $$r$$. If the policyholder's utility function is $$f$$, the policyholder’s expected utility change given insurance is a rise from $$p f(x_1) + (1-p) f(x_2)$$ (where $$p$$ is the chance of the bad outcome) to just always being $$f(x_2 - r)$$, while the insurer is exposed to upside risk $$(1-p)r$$ (green rectangle) and downside risk $$p(x2-x1-r)$$ (red rectangle). The flatter the policyholder’s utility function $$f$$ is around the $$x_2-r$$ to $$x_2$$ region compared to the region before that, the better this trade can be.

There are other benefits of insurance too:

Contracts sometimes require one party to have insurance, usually so that the other party knows they can sue and recover damages if necessary.
Reputational benefits (“you can trust us because we managed to get an insurer to!”)
Infrastructure for handling claims/losses (insurance companies may have non-financial advantages in processing or evaluating claims, so it can be good to contract such functions out)
Credible loss prevention advice (“we’re not just a consultancy - our money is also on the line here, and therefore trust us when we tell you to do X”)
Credible pricing of harms (similar to the above)

Capital efficiency in insurance comes from scale

Insurance depends on scale because of the central limit theorem. As a simplified example, let’s say you’re an insurance company selling flood insurance. Assume you sell flood insurance policies for £100, and there’s a 10% chance that a flood happens and you need to pay out £600 (so the expected risk is £60). In expectation, you make £40 per policy. However, if you sell 10 policies, and suddenly 2 or more of them trigger, you’ve made revenue of £1000 but a loss of £1200 and you’re bankrupt (assume you don’t have cash reserves). There’s a ~26% chance of this happening ($$\sum_{i=0}^{1} [\text{binomial pdf}_{p=0.1, n=10}(i)] \approx 0.26$$). However, if you sold 100 policies, the probability that 2 or more in every set of 10 triggers is only 0.2% ($$\sum_{i=0}^{19} [\text{binomial pdf}_{p=0.1,n=100}(i)] \approx 0.002$$).

Therefore, even if the expected loss is the same between two insurers, assuming the same risk tolerance (a red area), the insurer with more policies is exposed to less variance in expected loss per policy and can set a lower price:

Therefore, the way you achieve good capital efficiency in insurance is to sell many policies across many different types of risk (ideally uncorrelated ones - so not just flood policies, which might all trigger in the case of rising seas, but diverse types of risks).

However, in practice the balance sheets and the actual selling of the insurance products are not very tightly coupled in the insurance industry. You do not have to be a company with huge policy scale, you just have to work with such a company (for example, by becoming an MGA - see below).

The insurance industry is complicated and allows for startups

The straightforward consequence of the above incentive is that you get monopoly effects in insurance, where one firm achieves the largest balance sheet and then beats everyone else on price (or pushes everyone else into a daredevil game where they’re more likely to blow up than the big one is).

To some extent, this has happened. However, there’s a separate dynamic too. To quote from a blog post by Felix Stocker:

“Most businesses, especially SMEs, buy their insurance from someone they know personally. Because it’s low on the list of priorities, but important to get right, the CEOs or CFOs responsible stick with people they trust - brokers that can answer questions, and be relied upon to bail them out in a tough spot. Personalities, not brands, are key. Because of this, the concept that best explains the structure of the insurance broking market is Dunbar’s Number - the idea that any one person can only hold a limited number of personal relationships. So each broker has up to about a hundred relationships - but no more than that. And since the end-customer relationship is owned by an individual broker, then the challenge becomes aggregating brokers, rather than the customers themselves.”

There are also many ways to bundle and unbundle the different components of insurance. Felix Stocker writes about this here. In brief:

A common starting point for insurance startups is to be MGAs (managing general agents), that handle pricing (and, sometimes, distribution - i.e. selling), but are backed by the balance sheet of a reinsurer. This would’ve been our approach too.

General liability exceptions are key for new insurance products

There isn’t an insurance product for every niche risk, because companies often hold general liability insurance that covers basically anything.

However, general liability insurance often comes with exceptions. For example, professional liability (also called errors & omissions) is often left to a separate policy, and terrorism & war -related harms are excluded. Also, complicated new risks like cybersecurity have increasingly tended to get exceptions, and be left to specialised cyber policies.

Based on talking to insurance industry experts, we expect AI-related exceptions to general liability and professional indemnity insurance to be coming. In the meantime, the need for them seems somewhat complex and subtle.

Our ideas for pricing risk

We were loosely inspired by Epoch’s “Direct Approach” for forecasting human-level AI. Specifically, we’d make an argument of the form: if we can show that the outputs of the human and the AI are indistinguishable regarding some property (e.g. mistake rate as assessed by humans), then we should treat them as practically the same regarding related properties (e.g. the probability of causing a malpractice lawsuit).

Specifically, our guess for how to price legal professional indemnity insurance for an AI model/scaffold is:

Collect a bunch of legal documents created by the AI, and comparable documents created by the human.
Hire legal experts to assess the potentially-claim-causing mistake rate in the AIs’ and the humans’ work. (Note: legal experts cost over ~$10^2 per hour, so this would be fairly expensive)
Apply a fudge factor to the number of AI mistakes caught, on the assumption that humans are better (having had more practice) at catching human mistakes, and to account for unknown unknowns.
If the fudge factor times the AI claim-relevant mistake rate is lower than the humans’, offer the AI model’s outputs PI cover with the same rates as the relevant human PI for the same firm. If it’s higher, then either don’t offer it, or offer it at a fairly steep additional price, and probably with lower cover. Basically - price in the risk.

We would likely only go through this process if we had first done a more checkbox-style check of the AI workflow, including:

Whether reasonable evaluations have been run
Whether the AI’s access to protected information is reasonable (e.g. probably the AI should have zero access to customer X’s info while doing work for customer Y)
The extent to which humans are in the loop or could intervene
Susceptibility to prompt injection attacks (e.g. we might require companies to use something like Lakera Guard)
Vulnerability to model updates (e.g. if OpenAI drops a new version of GPT-4 that is worse at specific things, does your workflow switch to it immediately without checks, or have you hardcoded the GPT version number?)

We expect there is room for fancier technical solutions to evaluate risk probabilities. However, we caution that the size of a loss is almost entirely not about the AI, but instead about the context of the case: what legal work was being done, what harm the legal error resulted in in the real world, what the judge’s mood was when they were deciding the case (if it went to court), and so on. Even the probability of risk is only partly about the AI; it also depends, for example, on whether the client who received the bad advice decides to sue in the first place. This is why the core of our approach is side-stepping the problem of evaluating legal malpractice harms from scratch, and instead creating an argument for why the AI lawyer does not have more risk (or has some specific factor more risk) than the human lawyer. We effectively want to import the human lawyer claims history used for existing insurance pricing to the AI case.

We did not prioritise thinking of technical approaches to risk evaluation, because we thought much more of the risk was on the market size (thinking the opposite is perhaps the most common failure mode of tech-focused entrepreneurs). However, having a “magical” tech demo would probably be a good way to get your foot in the door. Showing you could’ve accurately predicted past failure rates might be the type of evidence that insurers care a lot about.

Notes on professional indemnity insurance for law

There isn’t a set of “cookie-cutter” templates that most claims fall into; it’s a diverse set.
The split is roughly (taking a rough average over several sources):
- 50% substantive claims (e.g. filing a motion in the wrong court, failing to raise a critical defense point, misapplying rules of evidence)
- 30% administrative errors (e.g. missing a statute of limitations or court-ordered discovery deadline, forgetting to renew a client’s trademark, typos, filing documents for the wrong client, losing important documents, sending an email with confidential info to the wrong person, leaving sensitive documents visible on a desk, failing to properly redact info in public filings)
  - in particular, missing deadlines is a common one
  - many of these (e.g. missing deadlines, typos, and losing documents) seem like ones where AIs would be much lower
- 10% client relation issues (e.g. settling a case without client approval, failing to disclose a past representation of an adverse party or a gift, not keeping the client informed, not explaining risks, failing to return calls or emails)
  - apart from some instruction-following or bad explanation -related harms, most of these seem far less applicable to AIs
- 10% intentional wrongs (e.g. overbilling for work not performed, submitting false evidence, severe COI, selling information to a competitor, using confidentional info to benefit another client)
  - again, most of these are far less applicable to AIs

It seems reasonable that AI legal PI would therefore be even more tilted towards the substantive errors category than human lawyer PI.

Relevant Stakeholders

Munich Re has an AI insurance team. Here is their thinking on the state of the AI insurance market.

Orbital Witness, which accelerates legal due diligence in real estate, built their own custom AI insurance product with a real estate insurer called First Title.

We won’t share details about the situations of specific startups that we talked to.

Findings

Customer demand

Common reasons for customers not needing insurance for their AI use cases included:

not working in a regulated or high-stakes domain
there is a human in the loop all the time, so the AI is just a tool and existing PI will probably cover it - for now

We did find several AI-for-law companies that did want an AI-specific insurance policy. Notably, one of them had seen the need to build their own custom insurance product, working with a specialist insurer in their area (real estate law). Several complained about not finding an off-the-shelf solution, and were willing to pay immediately for an insurance policy that addressed their problem. However, in at least one case this was more of a formality required by a contract.

One theme in many customer conversations was that being financially compensated by insurance is not sufficient to make up for damages, because the real damage is to the reputation of the company in question. This is much harder to insure against. Insurance could help indirectly here (e.g. the fact that you were able to get insurance for your product is some evidence that whatever you’re selling doesn’t blow up too often).

Another theme in many customer conversations that people just aren’t thinking that much about AI risks or harms yet. We think this is a consequence of AIs not being deployed in high-stakes use cases. Many organisations are conservative in their applications of AI and choose to start implementing in low stakes domains, such as internal products and answering simple FAQ questions. Nobody wants to be the first company to have AI publicly fail.

Findings about insurance

The insurance industry is complicated. The legal industry is also complicated. Neither of us had any background in either. The lack of knowledge was generally fixable (thanks to Claude in particular), but the lack of relevant connections significantly slowed our momentum. Early-stage startup exploration is mostly driven by talking to potential customers. This was helped by the fact that we were mostly talking to AI product companies in these spaces, but still was slow going compared to both of our previous experiences getting customer interviews.

An MGA requires a reinsurer, and this takes a lot of time. This meant that, to get started, we would’ve needed not just customers, but a reinsurer. We did not find a reinsurer who was willing to work with us. If we had kept doing this, we would’ve talked to more reinsurers (perhaps starting with Allianz, who have previously reinsured a drone insurance product). The normal time for a new insuretech startup to get a reinsurer is on the order of 6-24 months.

The insurance industry moves slowly and carefully. This makes sense, since insurance companies that make rash and risky moves probably wouldn’t exist for very long. But it is still a very important cultural difference to, for example, the tech world.

Insurance is overwhelmingly about inductive, not deductive, evidence. Claims histories are the gold standard of evidence in the insurance world. If you don’t have a claims history, you will have a hard time.

Insurance is often reactive, and changes are driven by new types of big losses. The industry perks up and starts paying attention and figuring out how to deal with a given risk when a big loss happens related to that risk. In particular, once a big loss happens, lots of insurance actors will want to know how exposed they are to that type of risk, and either reduce exposure to that risk or make money by insuring against it.

Insurance for AI might only become something reinsurers care a lot about after a big event happens and causes harm.

The insurance industry is financially very large but does not have high valuations. Many insurance companies have extremely large revenues, but insurance companies are often valued at only a 1-3x multiple of their revenues (compared to 20x for tech companies). Allianz makes more revenue than Meta and has almost 10x the assets, but as of writing is valued at 1/10th of Meta.

Also, some vague things about insurance that struck us:

Insurance is fundamentally on the financialization side of the financialization-v-building axis. Financialization is necessary in a complex world, but it’s perhaps harder to feel the hacker building ethos when that’s what you’re doing.
At the claims stage, insurance is fundamentally adversarial: the claimant wants money, and the company is incentivised to not pay.
Reducing someone’s exposure to risk can lead to them taking more risk.

Cruxes

Our rough standard was that if we saw a path to getting a reinsurer onboard in clearly less than 6 months, we would start this company. We had several reasons for wanting to move fast:

Moving fast is the key to building big impactful things.
We felt our opportunity cost was high; in, say, a year of talking to reinsurers before being able to sign our first customer, we could’ve gotten far building something impactful that isn’t an insurance product.
The rate of AI progress is high enough that things that move slowly might not have time to matter.

We also were bottlenecked by not having insurance industry connections. Insurance, as mentioned above, is a very network-based field. It is true that many insuretech founders do not have insurance backgrounds, but it is still critical that some industry expert is involved very early on in advisory capacity, and probably the first hire needs to be someone with deep insurance connections.

In summary, we think that insurance for AI is a great idea for a team that is less impatient, and has either more insurance connections or great willingness to find networks in insurance.

Former AIG (American Insurance Group) CEO Hank Greenberg once said: “All I want in life is an unfair advantage”. Someone who - unlike us - does have an unfair advantage in insurance may be able to run with this idea, build a great company, and reduce AI risk.

Alternative approaches

Labs / Big Tech as reinsurers

Insurance requires a large balance sheet to pay out claims from. The standard way to solve this is with a reinsurer. However, who else has a lot of capital, and (unlike reinsurers) a specific interest in AI? Foundation model labs (FMLs) and their Big Tech backers.

This could also simultaneously align FML incentives. Incentive-wise, the natural place to park AI risk is at the AI companies themselves. There are two levels of this:

When there are claims, an FML is involved in paying them out (directly or indirectly)
When there are claims, all FMLs are at least partly on the hook, because the claims are at least partly paid out from a shared pool of capital that all FMLs are involved in. (This creates an incentive among FMLs to care about the safety of the others, making safety tech sharing more likely, and making it easier for the industry to negotiate a slowdown if needed.)

Why might FMLs want to do this? It spreads the risk of things going wrong and incentivises finding errors in other companies’ models early on. It could increase public trust in AI as a whole, which will make adoption easier. In particular, most people don’t know the difference between the top FMLs and so see them as “AI companies.” If one AI company causes a large harm, the public is likely to associate it with AI companies in general. It also seems good, incentive-wise, that the companies driving a technology are the ones who are involved in insuring the risk.

Why might FMLs not want to do this? There are lots of incredibly good reasons.

There are many better uses of capital than parking it in some risk-free place where it can only be used as backing for an insurance product. This includes paying the employee salaries and compute costs that may enable these labs to build AGI and take over the entire economy and then the world - potentially a much higher-margin business than insurance.
(If going for the more ambitious version, where all FMLs participate in paying out claims:) By increasing the trust in all FMLs, pooled insurance might reduce a given FML’s competitive edge. For example, that OpenAI and Microsoft offer their Copyright Shield product is a specific advantage for them.
If it looks like insurance, or smells like insurance, or quacks like insurance, a horde of regulatory requirements immediately descend upon you. This makes a lot of sense; insurance is a very natural “picking up pennies in front of the steam roller” business. But FMLs understandably would prefer to not deal with this.
It’s not their core competency, and companies generally do better when they stick to their core competency.

To try to get around these issues, we explored options for FML backing, including:

FMLs / Big Tech simply partner with us, giving us credibility when we go to reinsurers.
FMLs / Big Tech become reinsurers.
FMLs / Big Tech create a captive insurance body. Captive insurance is when companies in a given industry get together to pool risks, allowing them some regulatory benefits over a normal reinsurer that deals with arbitrary risks, but also some limitations.
FMLs / Big Tech enter into some complicated deals with reinsurers that makes it a better deal for them.

We soon reached the point where, with Claude-3.5 serving as our legal team, we were doodling increasingly complicated legal structure diagrams on whiteboards. Some of them were starting to look vaguely pyramid-shaped. That was a good place to leave it.

Selling a risk model rather than an insurance policy

Why not leave the entire insurance industry to the existing insurance companies, and focus on what we really care about: modelling concrete AI risks?

An example of a company that sells risk models to insurance companies is RMS (now part of Moody’s, after changing ownership a few times). They were started in the late 1980s and specialised in natural catastrophe (“nat cat”) risk modelling. They had a technical moat: they were better at modelling things like synthetic hurricane populations than others.

The main disadvantage of such a route is that selling to insurance companies is very painful: they have slow procurement processes, mostly don’t understand technical details, and generally need to see a long track record of correct predictions before they buy. Venture capitalists are also unlikely to be interested in supporting such a company, since their growth rates are usually not stratospheric. For example, RMS was sold to Moody’s in 2021 for $2 billion, but only after almost 30 years in existence, and after already having been sold to Daily Mail along the way.

Might there be a market apart from insurance companies for a risk modelling product? Maybe, but this is unlikely. For natural catastrophe risks at least, insurance companies dominate risk modelling demand by sheer volume - they want updates all the time, whereas governments might want an update for planning purposes once every decade. Given how fast AI changes, though, there may be more actors who have a high rate of demand for risk models and updates on them.

Should AI evaluation / auditing orgs do this?

We haven’t thought about this much, but an org with an AI evaluation/auditing background might be well-placed to move into the insurance (or risk-modelling) space.

Acknowledgements

We’d like to thank Felix Stocker for lots of great advice on how things work in insuretech, Ed Leon Klinger for sharing his insurtech journey, Robert Muir-Woods for a very helpful chat about RMS, Otto Beyer for a valuable initial conversation about the insurance space, Jawad Koradia for helping us get initial momentum and introductions, Will Urquhart for talking with us about underwriting niche risks, the team at Entrepreneur First (in particular Kitty Mayo, Dominik Diak, and Jack Wiseman) for hosting much of our exploration and offering advice & introductions, and various people scattered across AI startups and the insurance industry for taking time to meet with us.

Positive visions for AI

2024-07-23T20:08:00.002+01:00

This post was a collaboration with Florence Hinder

Reasons to make the positive case

Everyone who starts thinking about AI starts thinking big. Alan Turing predicted that machine intelligence would make humanity appear feeble in comparison. I. J. Good said that AI is the last invention that humanity ever needs to invent.

The AI safety movement started from Eliezer Yudkowsky and others on the SL4 mailing list discussing (and aiming for) an intelligence explosion and colonizing the universe. However, as the promise of AI has drawn nearer, visions for AI upsides have paradoxically shrunk. Within the field of AI safety, this is due to a combination of the “doomers” believing in very high existential risk and therefore focusing on trying to avoid imminent human extinction rather than achieving the upside, people working on policy not talking about sci-fi upsides to look less weird, and recent progress in AI driving the focus towards concrete machine learning research rather than aspirational visions of the future.

Both DeepMind and OpenAI were explicitly founded as moonshot AGI projects (“solve intelligence, and then use that to solve everything else” in the words of Demis Hassabis). Now DeepMind - sorry, Google DeepMind - has been eaten by the corporate machinery of Alphabet, and OpenAI is increasingly captured by profit and product considerations.

The torch of AI techno-optimism has moved on the e/acc movement. Their core message is correct: growth, innovation, and energy are very important, and almost no one puts enough emphasis on them. However, their claims to take radical futures seriously are belied by the fact that their visions of the future seem to stop at GenAI unicorns. They also seem to take the general usefulness of innovation not as just a robust trend, but as a law of nature, and so are remarkably incurious about the possibility of important exceptions. Their deeper ideology is in parts incoherent and inhuman. Instead of centering human well-being, they worship the “thermodynamic will of the universe”. “You cannot stop the acceleration”, argues their figurehead, so “[y]ou might as well embrace it” - hardly an inspiring humanist rallying cry.

In this piece, we want to paint a picture of the possible benefits of AI, without ignoring the risks or shying away from radical visions. Why not dream about the future you hope for? It’s important to consider the future you want rather than just the future you don’t. Otherwise, you might create your own unfortunate destiny. In the Greek myth about Oedipus, he was prophesied to kill his father, so his father ordered him to be killed, but he wasn’t and ended up being adopted. Years later he crossed his father on the road in his travels and killed him, as he had no idea who his father was. Oedipus’ father focusing on the bad path might have made the prophecy happen. If Oedipus' father hadn’t ordered him to be killed, he would have known who his father was and likely wouldn’t have killed him.

When thinking about AI, if we only focus on the catastrophic future, we may cause it to become true by causing an increase in attention on this topic. Sam Altman, who is leading the way in AI capabilities, claimed to have gotten interested from arch-doomer Eliezer Yudkowsky. We may also neglect progress towards positive AI developments; some people think that even direct AI alignment research should not be published because it might speed up the creation of unaligned AI.

With modern AI, we might even get a very direct “self-fulfilling prophecy” effect: current AIs increasingly know that they are AIs, and make predictions about how to act based on their training data which includes everything we write about AI.

Benefits of AI

Since we think a large focus of AI is on what could go wrong, let’s think through what could go well starting from what’s most tangible and close to the current usage of AI to what the more distant future could hold.

AI will do the mundane work
Lowering the costs of coordination
Spreading Intelligence
AI can create more technology
Increased technology, wealth and energy, correlate with life being good
All of the above, and the wealth it creates, could allow people to self-actualise more

Already, AI advances mean that Claude has beocme very useful, and programmers are faster and better. But below we’ll cast a look towards the bigger picture and where this could take us.

AI will do the mundane work

First, there’s a lot of mundane mental work that humans currently have to do. Dealing with admin work, filing taxes, coordinating parcel returns -- these are not the things you will fondly be reminiscing about as you lie on your deathbed. Software has reduced the pain of dealing with such things, but not perfectly. In the future, you should be able to deal with all administrative work by specifying what you want to get done to an AI, and being consulted on decision points or any ambiguities in your preferences. Many CEOs or executives have personal assistants; AIs will mean that everyone will have access to this.

What about mundane physical work, like washing the dishes and cleaning the toilets? Currently, robotics is bad. But there is no known fundamental obstacle to having good robotics. It seems mainly downstream of a lot of engineering and a lot of data collection. AI can help with both of those. The household robots that we’ve been waiting for could finally become a reality.

Of course, it is unclear whether AIs will first have a comparative advantage against humans in mundane or meaningful work. We’re already seeing that AI models are making massive strides in making art, way before they’re managing our inboxes for us. It may be that there is a transitional period where robotics is lagging but AIs are smarter-than-human, where the main economic value of humans is their hands rather than their brains.

Lowering the cost of coordination

With AI agents being able to negotiate with other AI agents, the cost of coordination is likely to dramatically drop (see here for related discussion). Examples of coordination are agreements between multiple parties, or searching through a large pool of people to match buyers or sellers, or employees and employers. Searching through large sets of people, doing complex negotiations, and the monitoring and enforcement of agreements all take lots of human time. AI could reduce the cost and time taken by such work. In addition to efficiency gains, new opportunities for coordination will open up that would have previously been too expensive.

Small-scale coordination

To give an example of this on the small scale of two individuals, say you are trying to search for a new job. Normally you can’t review every single job posting ever, and employers can’t review every person in the world to see if they want to reach out. However, an AI could filter that for the individual and another AI for the business, and the two AIs could have detailed negotiations with each other to find the best possible match.

Coordination as a scarce resource

A lot of the current economy is a coordination platform; that’s the main product of each of Google, Uber, Amazon, and Facebook. Reducing the cost of searching for matches and trades should unlock at least as much mundane benefits and economic value as the tech platforms have.

Increased coordination may also reduce the need to group people into roles, hierarchies, and stereotypes. Right now, we need to put people into rigid structures (e.g. large organisations with departments like “HR” or “R&D”, or specific roles like “doctor” or “developer”) when coordinating a large group of people. In addition to upholding standards and enabling specialisation of labour, another reason for this is that people need to be legible to unintelligent processes, like binning of applicants by profession, or the CEO using an org chart to find out who to ask about a problem, or someone trying to buy some type of service. Humans can reach a much higher level of nuance when dealing with their friends and immediate colleagues. The cheap intelligence we get from AI might let us deal with the same level of nuance with a larger group of people than humans can themselves track. This means people may be able to be more unique and differentiated, while still being able to interface with society.

Large-scale Coordination

On a larger scale, increased coordination will also impact geopolitics. Say there are two countries fighting over land or resources. Both countries could have AI agents to negotiate with the other AI agents to search the space of possible deals and find an optimal compromise for both. They could also simulate a vast number of war scenarios to figure out what would happen; much conflict is about two sides disagreeing about who would win and resolving the uncertainty through a real-world test. This relies on three key abilities: the ability to negotiate cheaply, the ability to simulate outcomes, and the ability to stick to and enforce contracts. AI is likely to help with all three. This could reduce the incentives for traditional war, in that no human lives are needed to be lost because the outcome is already known and we can negotiate straight from that. We also know exactly what we are and are not willing to trade off which means it’s easier to optimise for the best compromise for everyone.

Spreading the intelligence

AI lets us spread the benefits of being smart more widely.

The benefits of intelligence are large. For example, this study estimates that a 1 standard deviation increase in intelligence increases your odds of self-assessed happiness by 11%. Now, part of this gain comes from intelligence being a positional good: you benefit from having more intelligence at your disposal than others, for example in competing for a fixed set of places. However, intelligence also has absolute benefits, since it lets you make better choices. And AI means you can convert energy into intelligence. Much as physical machines let the weak gain some of the benefits of (even superhuman) strength, AI might allow all humans to enjoy some of the benefits of being smart.

Concretely, this could have two forms. The first is that you could have AI advisors increase your ability to make plans or decisions, in the same way that - hypothetically - even a near-senile president might still make decent decisions with the help of their smart advisors. With AI, everyone could have access to comparable expert advisors. The effect may be even more dramatic than human advisors: the AI might be superhumanly smart, the AI might be more verifiably smart (a big problem in selecting smart advisors is that it can be hard to tell who is actually smart, especially if you are not), and if AIs are aligned successfully there may be less to worry about in trusting it than in trusting potentially-scheming human advisors.

The second is AI tutoring. Human 1-1 tutoring boosts educational outcomes by 2 standard deviations (2 standard deviations above average is often considered the cutoff for “giftedness”). If AI tutoring is as good, that’s a big deal.

AI is the ultimate meta-technology

AI is special because it automates intelligence, and intelligence is what you need to build technology, including AI, creating a feedback loop. Some other previous technologies have boosted other technologies; for example, the printing press massively helped the accumulation of knowledge that led to the invention of many other technologies. But we have not before had a technology that could itself directly advance other technology. Such AI has been called PASTA (Process for Automating Scientific and Technological Advancement).

Positive feedback loops - whether self-improving AIs, nuclear reactions, epidemics, or human cultural evolution - are very powerful, so you should be wary of risks from them. Similarly, it is currently at best extremely unclear whether AIs that improve themselves could be controlled with current technology. We should be very cautious in using AI systems to improve themselves.

In the long run, however, most of the value of AI will likely come from their effects on technological progress, much like the next industrial revolution. We can imagine AIs slashing the cost and increasing the speed of science in every field, curing diseases and making entire new veins of technology available, in the same way that steam engines made entirely new veins of coal accessible.

In particular, AIs help de-risk one of the largest current risks to future human progress. One model of the feedback loop behind humanity’s progress in the past few centuries is that people led to ideas led to wealth led to food led to more people.

However, greater wealth no longer translates into more people. The world population, which was exponentially growing for much of the 19th and 20th centuries, is likely to be in decline by the end of the 21st century. This is likely to have negative consequences for the rate of innovation, and as discussed in the next section, a decline in productivity would likely have a negative impact on human wellbeing. However, if AIs start driving innovation, then we have a new feedback loop: wealth leads to energy leads to more AIs leads to ideas leads to wealth.

As long as this feedback loop does not decouple from the human economy and instead continues benefitting humans, this could help progress continue long into the future.

Wealth and energy are good

If you want humans to be well-off, one of the easiest things to do is give them more wealth and more energy. GDP per capita (on a log scale) has a 0.79 correlation with life satisfaction, and per-capita energy use (again on a log scale) has a 0.74 correlation with life satisfaction. Increased wealth and energy correlate with life satisfaction, and we should expect these trends to continue.

Above: GDP per capita (x-axis), energy use (y-axis), and life satisfaction (colour scale) for 142 countries. There are no poor countries with high energy use, and no high energy use countries that are poor. There are no countries with high average life satisfaction that are not high in both energy use and average GDP per capita. The axes are logarithmic, but since economic growth is exponential, countries should be able to make progress at a constant rate along the axis. Data source: Our World In Data (here, here, and here).

(It is true that energy use and economic growth have been increasingly decoupling in rich countries, due to services being more of the economy, and efficiency gains in energy use. However, the latter is effectively increasing the amount of useful energy that can be used - e.g. say the amount of energy needed to cook one meal is now enough to cook two meals, which is effectively the same as gaining more energy. However, efficiency effects are fundamentally limited because there is a physical limit, and also if demand is elastic then efficiency gains lead to increased energy use, meaning it doesn’t help the environment either. Ultimately, if you want to do more things in the physical world, you need more energy).

A wealthy, energy-rich society has many material benefits: plentiful food, advanced medicine, high redistributive spending becomes feasible, and great choice and personal freedom through specialisation of labour and high spending power. A wealthy and energy-rich society also has some important subtler benefits. Poverty and resource constraints sharpen conflict. Economic growth is intimately linked to tolerance and liberalism, by weakening the cultural status and clout of zero-sum strategies like conflict and politicking.

One clear historic example of how increases in energy correlated with improved quality of life was in the industrial revolution, arguably the best and most important thing that ever happened. Before it, trends in human wellbeing seemed either stagnant, fluctuating, or very slow, and after it, all the variables for which we can find good long-term series that are related to human well-being shoot upwards.

Above: variables correlated with human well-being over time. Source: Luke Muehlhauser

Therefore, it’s worth keeping in mind that boosting energy and wealth is good, actually. And the most powerful way to do that is through inventing new technologies that let us use energy to serve our needs.

The heart of the industrial revolution was replacing part of human manual labour with something cheaper and more powerful. AI that replaces large parts of human mental labour with something cheaper and more powerful should be expected to be similarly transformative. Whether it is a good or bad transformation seems more uncertain. We are lucky that industrialisation happened to make national power very tightly tied to having a large, educated, and prosperous middle class; it is unclear what is the winning strategy in an AI economy. We are also lucky that the powerful totalitarian states enabled by industrial technology have not triumphed so far, and they might get further boosts from AI. Automating mental labour also involves the automation of decision-making, and handing over decision-making to the machines is handing over power to machines, which is more risky than handing the manual labour to them. But if we can safely control our AI systems and engineer good incentives for the resulting society, we could get another leap in human welfare.

Self actualisation

Now say we’ve had a leap in innovation and energy through Transformative AI (TAI) and we’ve also reached a post scarcity world. What happens now? Humans have had all their basic needs met, most jobs are automated, but what do people spend their time actually doing?

Maslow’s Hierarchy

Maslow’s hierachy of needs is a framework of understanding human needs and drivers for human behaviour. Maslow suggested that in most scenarios people need to mostly satisfy one level before being able to focus on higher-level needs.

The top level of the hierachy is self-actualisation. The peak of human experience is something that few can currently reach - but maybe everyone could get there.

There is a possible path the world takes in which all humans can reach self-actualisation. With increases in technology & wealth, such as with TAI and a Universal Basic Income (UBI), we would be able to provide the basic needs of food, water, shelter, and clothing for all humans, enabling people to easily meet their basic needs. Humans can now spend more time on the things they want, for example moving up through Maslow’s hierarchy to focusing on increasing love and belonging, self-esteem and self-actualization.

Say you are in a post scarcity world, what would you do if you didn’t have to work?

Would you be spending time with loved ones, engaging in social activities that provide a sense of connection and belonging, self-esteem? Would it be honing your craft and becoming an expert in a particular field? Or would you spend the whole time scrolling on your phone?

Say hypothetically a wealthy billionaire gave you a grant to work on anything you wanted, would you be happy with having the complete freedom to spend your time as you wished?

Often people assume that others will be unhappy with this world, but would you? There is a cognitive bias where people tend to judge themselves as happier than their peers, which could nudge you to think people would be less happy in this world, even if you would enjoy this.

In this post-scarcity world, humans could spend more time on creative pursuits such as art, music, and any other hobbies – not with the goal of making money, but to reach self-actualisation.

With AI being better than humans in every dimension, AI can produce the best art in the world, but there is intrinsic value in honing your craft, improving at art or expressing your feelings through it, in and of itself. The vast majority of art is not created to be the best art in the world but for the journey itself. A child that paints a finger painting and the parent who puts it on the wall does not think “my child’s art is better than Van Gogh’s”. Instead, they feel a sense of excitement about the progress their child has made and the creative expression the child has produced.

Another example is the Olympic games. Nobody needs to win the olympic games to survive, but it lets people express pride in their country, hone their craft, attain status, and so on. But the actual task is just a game, a social construct. More and more tasks will look like social constructs and games we create to challenge each other.

Examples of post-scarcity scenes

Since this is quite theoretical, let's consider examples where we’ve had “post-scarcity” microcosms to explore.

The French Bourgeoisie

The French leisure class, or bourgeoisie, were a class of wealthy elite that emerged in 16th century France. Many had enough money to pursue endeavours like refining their taste in arts and culture. Salon culture was a cornerstone of bourgeoisie social life. Gatherings featuring discussions on literature, art, politics and philosophy.

Upper Class in the Victorian Era

The upper class in the Victorian era enjoyed a variety of leisure activities that reflected their wealth, status and values. They attended social events and balls, fox hunting and other sports, theater and opera, art and literature, travel, tea parties and social visits, gardening and horticulture, charitable work and philanthropy. Several undertook serious pursuits in science or art.

Burning Man

Burning Man is an annual festival where people take all the basic things you need with you for a week of living in the desert:food, water, shelter. People have a week to create a new community or city that is a temporary microcosm of a post-scarcity world. They pursue artistic endeavours and creative expression, music, dance and connecting with others. People often talk about Burning Man events being some of the best experiences of their lives.

Successful Startup Founders in The Bay Area

In San Francisco, there is a crossover with hippie culture and tech, and many people with excess wealth and resources, resulting in many looking for more in life. They try to reach self actualisation, by pursuing many arts and creative pursuits. Hippie movements often encourage communal living, and a sense of connection with those around you. Many may raise eyebrows at the lifestyles of some such people, but it’s hard to claim that it’s a fundamentally bad existence.

More pessimistic views about humans?

It is true that not all cultural tendencies in a post-scarcity world would be positive. In particular, humans have a remarkable ability to have extremely tough and all-consuming social status games, seemingly especially in environments where other needs are met. See for example this book review about the cut-throat social scene of upper-class Manhattan women or this one about the bland sameness and wastefulness of nightlife, or this book review that ends up concluding that the trajectory of human social evolution is one long arc from prehistoric gossip traps to internet gossip traps, with liberal institutions just a passing phase.

But the liberal humanist attitude here is to let humans be humans. Yes, they will have petty dramas and competitions, but if that is what they want, who is to tell them no? And they will also have joy and love.

Would a post-scarcity world have meaning? Adversity is one of the greatest sources of meaning. Consider D-Day, when hundreds of thousands of soldiers got together to charge up a beach under machine-gun fire to liberate a continent from Nazi rule. Or consider a poor parent of four working three jobs to make ends meet. There are few greater sources of meaning. But adversity can be meaningful while involving less suffering and loss. A good future will be shallower, in a sense, but that is a good thing.

Finally, it is unclear if we would get a happy world, even if we had the technology for post-scarcity, because of politics and conflict. We will discuss this later.

Radical improvements

AI might also help with radical but necessary improvements to the human condition.

People die. It is a moral tragedy when people are forced to die against their will, as happens to over 50 million people per year. Medicine is making progress against many causes of death and disability; in the limit it can cure all of them. We should reach that limit as fast as possible, and AI can likely help accelerate the research and deployment of solutions.

One of the greatest inequalities in the world is inequality in intelligence. Some people struggle to perform in simple jobs, while others (well, at least one) are John von Neumann. In the short term, AI might help by making cognitively demanding tasks more accessible to people through AI tutors and AI copilots. In the longer term, AI might help us enhance human intelligence, through brain-AI integration or new medical technology.

Reasons to worry

Though there are many potential upsides for AI and AGI as argued in this post, that doesn’t mean there aren’t risks.

The plausible risks of AI go all the way to human extinction, meaning this shouldn’t be taken lightly. Since this piece is focused on the upside risk, not the downside risk, we will not argue this point in depth, but it is worth revisiting briefly.

Existential risk from AI is a serious concern

It is intuitive that AI is risky.

First, creating something smarter, faster, and more capable than humans is obviously risky, since you need to very precisely either control it (i.e. stop it from doing things you don’t like) or align it (i.e. make it always try to do what you would want it to do). Both the control and alignment problem for AIs still have unsolved technical challenges. And that’s assuming that AI is in the right hands.

Second, even if the AIs remain in our control, they are likely to be as transformative as the industrial revolution. Eighteenth-century European monarchs would’ve found it hard to imagine how the steam engine could challenge their power, but the social changes that were in part a result of them eventually wrested all their powers away. In the modern world, a lot of power depends on large educated workforces of humans, whereas sufficiently strong AGI might decorrelate power and humans, decreasing the incentive to have people be educated and prosperous - or to have people around at all.

Apart from object-level arguments, consider too the seriousness with which the AI doomsday is discussed. Many top researchers and all top AI lab CEOs have signed a statement saying “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war”. Nuclear war and pandemics are the only other cases where similarly serious predictions have been made by a similarly serious set of people (though arguably climate change is close: the science on the effects is more established and certain, but while catastrophe is more likely, literal human extinction from it is much less likely).

Side-effects of non-existentially-bad AI might be large

Consider the internet, a widely-successful technology with a lot of benefits. There are credible claims that the internet is responsible for harms ranging from massively increased depression rates among teenagers to political polarisation to widespread productivity loss through addiction and distraction.

In the same way, the success of AI might lead to bad side effects, even if all the existential risks are avoided.

For example, AI could replace human connection. Human friends and partners might increasingly be replaced with AIs. However bad it was in other ways, at least on pre-AI social media you at least interacted with humans (or simple algorithms), but with AIs it’s possible to have what looks like deep emotional relationships. Just look at the Replika subreddit from a year ago when they changed the algorithm to only allow “PG-rated interactions”. Many users were upset. The film “Her” doesn’t seem far off, as Sam Altman acknowledges. Such relationships give the human much more safety and control than in human relationships, which might both be very attractive to humans, while also excessively coddling them. Given that much human happiness and meaning comes from human relationships and bonding, widespread AI substitution of them could mean the destruction of a large part of all human wellbeing and meaning in the world. On a more prosaic level, society might atomise into individuals hoarding compute credits to spend on running their AI companions without connecting with other humans, with severe effects on society’s functioning, or humans might stop having children and human populations might crash. Humanity has flourished through collaboration and socialisation. If we use AIs to replace this in an overly thoughtless way, the fabric of society could crumble.

Apart from being superhuman at forming relationships with humans, AIs might be superhuman at persuasion. We can imagine AIs producing the vast majority of content that people consume. We can imagine a totalitarian world where the governments with the greatest compute resources can dominate the conversation forever. Instead of humans having ideas and sometimes persuading other humans to adopt them, driving social progress, any human-generated ideas might be swamped by a greater quantity of superhumanly persuasive counter-arguments that support the status quo. We can also imagine a dystopian decentralised world. Already, many online memes (in Dawkins’s original sense of the word) are maladaptive, spreading not by having good effects on their hosts but by being incredibly good at spreading from person to person. AI might make us much better at searching the space of ideas for the most viral ones. Ideas that aren’t maximally viral might be outcompeted. Eventually, our institutions could become mere puppets that serve as viral hosts for the most transmissive memes, as part of an endless tug-of-war where AI-generated memes compete to compel humans to spread them.

Seems bad.

Not good nor bad, but some third thing.

Many debates turn into mood affiliation debates. Are guns bad? Is more government good? But remember: politics is the mindkiller. Navigating a complicated world requires more than the ability to stick the label “good” or “bad” on entire domains. If you were seated in the control room of a nuclear power station, you wouldn’t ask yourself: uranium, good or bad? Instead, you want to steer towards the small set of states where the reaction is perched between dying out and exploding, while generating useful clean power.

We’ve also seen again and again that technology and social change have strong effects on each other, and these are often hard to predict. We’ve discussed how industrial technology may have led to democracy. There is serious academic debate about whether the stirrup caused feudalism, or whether the Black Death was a driver of European liberalism, or whether social media was a significant cause of the Arab Spring. The birth control pill was a major influence of the sexual revolution, and the printing press helped the Protestant Reformation. Often, the consequences of a new technology are some obvious direct benefits, some obvious direct harms, and the shifting of some vast social equilibrium that ends up forever reshaping the world in some way no one saw coming. So far we’ve clearly ended up ahead on net, and maybe that will continue.

Humanity has spent over a hundred thousand years riding a feedback loop of accumulating cultural evolution. Over the past few hundred, the industrial revolution boosted the technological progress feedback loop. Human wellbeing has skyrocketed, though along the way we’ve had - and are continuing to have - close calls with nuclear war, totalitarianism, and environmental issues. We’ve had a healthy dose of luck, including in generalities like the incentive structures of industrial economics and specifics like the heroism of Stanislav Petrov. But we’ve also had an enormous amount of human effort and ingenuity spent on trying to chart a good path for civilization, from solar panel subsidies to the Allies winning World War 2.

For most of this time, the direction of the arrow of progress has been obvious. The miseries of poverty and the horrors of close-up totalitarianism are very powerful driving forces after all. And while both continue ravaging the world, developed countries have in many ways gotten complacent. There are fewer obvious areas of improvement for those lucky enough to enjoy a life of affluence in the developed world. But the future could be much better still.

Know where to aim

We think it’s important to have a target of what to aim for. We need to dream about the future we want. A strong culture needs a story of what it is driving towards, and humanity needs a compelling vision of how our future turns out well so we can work together to create the future we all want. AI seems like the biggest upcoming opportunity and risk. We hope we can avoid the risks, and realise the positive vision presented here, together with a hundred other things we can’t yet imagine.

See LessWrong for additional comments & discussion.

A model of research skill

2024-01-08T00:02:00.002+00:00

~4k words (20 minutes)

Doing research means answering questions no one yet knows the answer to. Lots of impactful projects are downstream of being good at this. A good first step is to have a model for what the hard parts of research skill are.

Two failure modes

There are two opposing failure modes you can fall into when thinking about research skill.

The first is the deferential one. Research skill is this amorphous complicated things, so the only way to be sure you have it is to spend years developing it within some ossified ancient bureaucracy and then have someone in a funny hat hand you a piece of paper (bonus points for Latin being involved).

The second is the hubristic one. You want to do, say, AI alignment research. This involves thinking hard, maybe writing some code, maybe doing some maths, and then writing up your results. You’re good at thinking - after all, you read the Sequences, like, 1.5 times. You can code. You did a STEM undergrad. And writing? Pffft, you’ve been doing that since kindergarten!

I think there’s a lot to be said for hubris. Skills can often be learned well by colliding hard with reality in unstructured ways. Good coders are famously often self-taught. The venture capitalists who thought that management experience and a solid business background are needed to build a billion-dollar company are now mostly extinct.

It’s less clear that research works like this, though. I’ve often heard it said that it’s rare for a researcher to do great work without having been mentored by someone who was themselves a great researcher. Exceptions exist and I’m sceptical that any good statistics exist on this point. However, this is the sort of hearsay an aspiring researcher should pay attention to. It also seems like the feedback signal in research is worse than in programming or startups, which makes it harder to learn.

Methodology, except “methodology” is too fancy a word

To answer this question, and steer between deferential confusion and hubristic over-simplicity, I interviewed people who had done good research to try to understand their models of research skill. I also read a lot of blog posts. Specifically, I wanted to understand what about research a bright, agentic, technical person trying to learn at high speed would likely fail at and either not realise or not be able to fix quickly.

I did structured interviews with Neel Nanda (Google DeepMind; grokking), Lauro Langosco (Krueger Lab; goal misgeneralisation), and one other. I also learned a lot from unstructured conversations with Ferenc Huszar, Dmitrii Krasheninnikov, Sören Mindermann, Owain Evans, and several others. I then ~~~procrastinated on this project for 6 months~~~ touched grass and formed inside views by doing the MATS research program under the mentorship of Owain Evans. I owe a lot to the people I spoke to and their willingness to give their time and takes, but my interpretation and model should not taken as one they would necessarily endorse.

My own first-hand research experience consists mainly of a research-oriented CS (i.e. ML) master’s degree, followed by working as a full-time researcher for 6 months and counting. There are many who have better inside views than I do on this topic.

The Big Three

In summary:

There are a lot of ways reality could be (i.e. hypotheses), and a lot of possible experiment designs. You want to avoid brute-forcing your way through these large spaces as much as possible, and instead be good at picking likely-true hypotheses or informative experiments. Being good at this is called research taste, and it’s largely an intuitive thing that develops over a lot of time spent engaging with a field.
Once you have some bits of evidence from your experiment, it’s easy to over-interpret them (perhaps you interpret them as more bits than they actually are, or perhaps you were failing to consider how large hypothesis space is to start with). To counteract this, you need sufficient paranoia about your results, which mainly just takes careful and creative thought, and good epistemics.
Finally, you need to communicate your results to transfer those bits of evidence into other people’s heads, because we live in a society.

Taste

Empirically, it seems that a lot of the value of senior researchers is a better sense of which questions are important to tackle, and better judgement for what angles of attack will work. For example, good PhD students often say that even if they’re generally as technically competent as their adviser and read a lot of papers, their adviser has much better quick judgements about whether something is a promising direction.

When I was working on my master’s thesis, I had several moments where I was working through some maths and get stuck. I’d go to one of my supervisors, a PhD student, and they’d have some ideas on angles of attack that I hadn’t thought of. We’d work on it for an hour and make more progress than I had in several hours on my own. Then I’d go to another one of my supervisors, a professor, and in fifteen minutes they’d have tried something that worked. Part of this is experience making you faster at crunching through derivations, and knowing things like helpful identities or methods. But the biggest difference seemed to be a good gut feeling for what the most promising angle or next step is.

I think the fundamental driver of this effect is dealing with large spaces: there are many possible ways reality could be (John Wentworth talks about this here), and many possible things you could try, and even being slightly better at honing in on the right things helps a lot. Let’s say you’re trying to prove a theorem that takes 4 steps to prove. If you have a 80% chance of picking the right move at each step, you’ll have a 41% chance of success per attempt. If that chance is 60%, you’ll have a 13% chance – over 3 times less. If you’re trying to find the right hypothesis within some hypothesis space, and you’ve already managed to cut down the entropy of your probability distribution over hypotheses to 10 bits, you’ll be able to narrow down to the correct hypothesis faster and with fewer bits than someone whose entropy is 15 bits (and who’s search space is therefore effectively 2⁵ = 32 times as large). Of course, you’re rarely chasing down just a single hypothesis in a defined hypothesis class. But if you’re constantly 5 extra bits of evidence ahead compared to someone in what you’ve incorporated into your beliefs, you’ll make weirdly accurate guesses from their perspective.

Why does research taste seem to correlate so strongly with experience? I think it’s because the bottleneck is seeing and integrating evidence into your (both explicit and intuitive) world models. No one is close to having integrated all empirical evidence that exists, and new evidence keeps accumulating, so returns from reading and seeing more keep going. (In addition to literal experiments, I count things like “doing a thousand maths problems in this area of maths” as “empirical” evidence for your intuitions about which approaches work; I assume this gets distilled into half-conscious intuitions that your brain can then use when faced with similar problems in the future)

This suggests that the way to speed-run getting research taste is to see lots of evidence about research ideas failing or succeeding. To do this, you could:

Have your own research ideas, and run experiments to test them. The feedback quality is theoretically ideal, since reality does not lie (but may be constrained by what experiments you can realistically run, and a lack of the paranoia that I talk about next). The main disadvantage is that this is often slow and/or expensive.
Read papers to see whether other people’s research ideas succeeded or failed. This is prone to several problems:
1. Biases: in theory, published papers are drawn from the set of ideas that ended up working, so you might not see negative samples (which is bad for learning). In practice, paper creation and selection processes are imperfect, so you might see lots of bad or poorly-communicated ones.
2. Passivity: it’s easy to fool yourself into thinking you would’ve guessed the paper ideas beforehand. Active reading strategies could help; for example, read only the paper’s motivation section and write down what experiment you’d design to test it, and then read only the methodology section and write down a guess about the results.
Ask someone more experienced than you to rate your ideas. A mentor’s feedback is not as good as reality’s, but you can get it a lot faster (at least in theory). The speed up is huge: a big ML experiment might take a month to set up and run, but you can probably get detailed feedback on 10 ideas in an hour of conversation. This is a ~7000x speedup. I suspect a lot of the value of research mentoring lies here: an enormous amount of predictable failures or inefficiently targeted ideas can be skipped or honed into better ones, before you spend time running the expensive test of actually checking with reality. (If true, this would imply that the value of research mentorship is higher whenever feedback loops are worse.)

Chris Olah has a list of suggestions for research taste exercises (number 1 is essentially the last point on my list above).

Research taste takes the most time to develop, and seems to explain the largest part of the performance gap between junior and senior researchers. It is therefore the single most important thing to focus on developing.

(If taste is so important, why does research output not increase monotonically with age in STEM fields? The scary biological explanation is that fluid intelligence (or energy or …) starts dropping at some age, and this decreases your ability to execute on maths/code, even assuming your research taste is constant or improving. Alternatively, hours used on deep technical work might tend to decline with advanced career stages.)

Paranoia

I heard several people saying that junior researchers will sometimes jump to conclusions, or interpret their evidence as saying more than it actually does. My instinctive reaction to this is: “wait, but surely if you just creatively brainstorm the ways the evidence might be misleading, and take these into account in making your conclusions (or are industrious about running additional experiments to check them), you can just avoid this failure mode?” The average answer I got was that yes, this seems true, and indeed many people either only need one peer review cycle to internalise this mindset, or pretty much get it from the start. Therefore, I’m almost tempted to chuck this category off this list, and onto the list of less crucial things where “be generally competent and strategic” will sort you out in a reasonable amount of time. However, two things hold me back.

First, confirmation bias is a strong thing, and it seems helpful to wave a big red sign saying “WARNING: you may be about to experience confirmation bias”.

Second, I think this is one of the cases where the level of paranoia required is sometimes more than you expect, even after you expect it will be high. John Wentworth puts this best in You Are Not Measuring What You Think You Are Measuring, which you should go read right now. There are more confounders and weird effects than are dreamt of in your philosophies.

A few people mentioned going through the peer review process as being a particularly helpful thing for developing paranoia.

Communication

I started out sceptical about the difficulty of research-specific communication, above and beyond general good writing. However, I was eventually persuaded that yes, research-specific communication skills exist and are important.

First, if research has impact, it is through communication. Rob Miles once said (at a talk) something along the lines of: “if you’re trying to ensure positive AGI outcomes through technical work, and you think that you are not going to be one of the people who literally writes the code for it or is in the room when it’s turned on, your path to impact lies through telling other people about your technical ideas.” (This generalises: if you want to drive good policy through your research and you’re not literally writing it …, etc.) So you should expect good communication to be a force multiplier applied on top of everything else, and therefore very important.

Secondly, research is often not communicated well. On the smaller scale, Steven Pinker moans endlessly – and with good reason – about academic prose (my particular pet peeve is the endemic utilisation of the word “utilise” in ML papers.). On the larger scale, entire research agendas can get ignored because the key ideas aren’t communicated in a sufficiently clear and legible way.

I don’t know what’s the best way to speed-run getting good at research communication. Maybe read Pinker to make sure you’re not making predictable mistakes in general writing. I’ve heard that experienced researchers are often good at writing papers, so maybe seek feedback from any you know (but don’t internalise the things they say that are about goodharting for paper acceptance). With papers, understand how papers are read. Some sources of research-specific communication difficulty I can see are (a) the unusually high need for precision (especially in papers), and (b) communicating the intuitive, high-context, and often unverbalised-by-default world models that guide your research taste (especially when talking about research agendas).

Other points

Having a research problem is not enough. You need an angle of attack.
- Richard Feynman once said something like: keep a set of open problems in your head. Whenever you discover a new tool (e.g. a new method), run through this list of problems and see if you can apply it. I think this can also be extended to new facts; whenever you hear about a discovery, run through a list of open questions and see how you should update.
- Hamming says something similar in You and your research: “Most great scientists know many important problems. They have something between 10 and 20 important problems for which they are looking for an attack.”
Research requires a large combination of things to go right. Often, someone will be good at a few of them but not all of them.
- A sample list might be:
  - generating good ideas
  - picking good ideas (= research taste)
  - iterate rapidly to get empirical feedback
  - interpreting your results right (paranoia)
  - communicating your findings
- If success is a product of either sufficiently many variables or of normally distributed variables, the distribution of success should be log-normal, and therefore fairly heavy-tailed. And yes, research is heavy-tailed. Dan Hendrycks and Thomas Woodside claim that while there may be 10x engineers, there are 1000x researchers. This seems true.
  - However, this also means that not being the best at one of the component skills does not doom your ability to still have a really good product across categories.
Ideas from other fields are often worth stealing. There exist standardised pipelines to produce people who are experts in X for many different X, but far less so to produce people who are experts in both X and some other Y. Expect many people in X to miss out on ideas in Y (though remember that not all Y are relevant).
Research involves infrequent and uncertain feedback. Motivation is important and can be hard. Grad students are notorious for having bad mental health. A big chunk of this is due to the insanities of academia rather than research itself. However, startups are somewhat analogous to research (high-risk, difficult, often ambiguous structure), lack institutionalised insanity, and are also acknowledged to be mentally tough.
- The most powerful and universally-applicable hack to make something not suck for a human is for that human to do it together with other humans. Also, more humans = more brains.
Getting new research ideas is often not a particularly big-brained process. Once I had the impression that most research ideas would come from explicitly thinking hard about research ideas, and generating fancy ideas would be a major bottleneck. However, I’ve found that many ideas come with surprisingly little effort, with a feeling of “well, if I want X, the type of thing I should do is probably Y”. Whiteboarding with other people is also great.
- This is not to say that idea generation isn’t helped by actively brainstorming hard. Just that it’s not the only, or even majority, source of ideas.
- The feeling of ideas being rare is often a newbie phase. You should (and very likely will) pass over it quickly if you’re engaging with a field. John Wentworth has a good post on the topic. I have personally experienced an increase in concrete research ideas, and much greater willingness to discard ideas, after going through a few I’ve felt excited by.
- When you look at a field from afar, you see a smooth shape of big topics and abstractions. This makes it easy to feel that everything is done. Once you’re actually at the frontier, you invariably discover that it’s full of holes, with many simple questions that don’t have answers.
There’s great benefit to an idea being the top thing in your mind.
When in doubt, log more. Easily being able to run more analyses is good. At some point you will think to yourself something like “huh, I wonder if thing X13 had an effect, I’ll run the statistics”, and then either thank yourself because you logged the value of X13 in your experiments, or facepalm because you didn’t.
Tolerate the appearance of stupidity (in yourself and others). Research is an intellectual domain, and humans are status-obsessed monkeys. Humans doing research therefore often feel like they need to appear smart. This can lead to a type of wishful thinking where you hear some idea and try to delude yourself (and others) into thinking you understand it immediately, without actually knowing how it bottoms out into concrete things. Remember that any valid idea or chain of reasoning decomposes into simple pieces. Allow yourself to think about the simple things, and ask questions about them.
- There is an anecdote about Niels Bohr (related by George Gamow and quoted here): “Many a time, a visiting young physicist (most physicists visiting Copenhagen were young) would deliver a brilliant talk about his recent calculations on some intricate problem of the quantum theory. Everybody in the audience would understand the argument quite clearly, but Bohr wouldn’t. So everybody would start to explain to Bohr the simple point he had missed, and in the resulting turmoil everybody would stop understanding anything. Finally, after a considerable period of time, Bohr would begin to understand, and it would turn out that what he understood about the problem presented by the visitor was quite different from what the visitor meant, and was correct, while the visitor’s interpretation was wrong.”
“Real ~~artists~~ researchers ship”. Like in anything else, iteration speed really matters.
- Sometimes high iteration speed means schlepping. You should not hesitate to schlep. The deep learning revolution started when some people wrote a lot of low-level CUDA code to get a neural network to run on a GPU. I once reflected on why my experiments were going slower than I hoped, and realised a mental ick for hacky code was making me go about things in a complex roundabout way. I spent a few hours writing ugly code in Jupyter notebooks, got results, and moved on. Researchers are notorious for writing bad code, but there are reasons (apart from laziness and lack of experience) why the style of researcher code is sometimes different from standards of good software.
- The most important thing is doing informative things that make you collide with reality at a high rate, but being even slightly strategic will give great improvements on even that. Jacob Steinhardt gives good advice about this in Research as a Stochastic Decision Process. In particular, start with the thing that is most informative per unit time (rather than e.g. the easiest to do).

Good things to read on research skill

(I have already linked to some of these above.)

General advice on research from experienced researchers
- You and Your Research (Richard Hamming – old but still unbeaten. Hamming also has a book that includes this lecture among other material, but the lecture is the best bit of it and a good 80/20.)
- Career advice (Terry Tao)
- Research as a Stochastic Decision Process (Jacob Steinhardt)
- My research methodology (Paul Christiano)
- An Opinionated Guide to ML Research (John Schulman)
- PhD: a retrospective analysis (Eugene Vinitsky)
John Wentworth’s posts about specific research meta-topics
Relevant Paul Graham essays
- The Top Idea in Your Mind
- How to do Great Work
Advice aimed at new alignment researchers
A Bird’s Eye View of the ML Field (a good overview of how the ML field works)
The importance of stupidity in scientific research (short and sweet)
Research Taste Exercises (what is says on the tin)

A Disneyland Without Children

2023-06-04T13:57:00.000+01:00

The spaceship swung into orbit around the blue-grey planet with a final burn of its engines. Compared to the distance they had travelled, the world, now only some four hundred kilometres below and filling up one hemisphere of the sky, was practically within reach. But Alice was no less confused.

“Well?” she asked.

Charlie stared thoughtfully at the world slowly rotating underneath their feet, oceans glinting in the sunlight. “It looks lickable”, he said.

“We have a task”, Alice said, trying to sound gentle. Spaceflight was hard. Organic life was not designed for it. But their mission was critical, they needed to move fast, and Charlie, for all his quirks, would need to be focused.

“What’s a few minutes when it will take years for anything we discover to be known back home?” Charlie asked.

“No licking”, Alice said.

Charlie rolled his eyes, then refocused them on the surface of the planet below. They were just crossing the coast of one of the larger continents. Blue water was giving way to grey land.

“Look at the texture”, Charlie said. They had seen it from far away with telescopes, but there was something different about seeing it with their bare eyes. Most of the land surface of the planet was like a rug of fine grey mesh. If there had been lights, Alice would have guessed the entire planet’s land was one sprawling city, but as far as their instruments could tell, the world had no artificial lighting.

As far as they could tell, the world also had no radio. They had broadcast messages at every frequency they could, and in desperation even by using their engines to flash a message during their deceleration burn. No response had come.

Alice pulled up one of the telescope feeds on the computer to look closer at the surface. She saw grey rectangular slabs, typically several hundred metres on a side, with wide roads running between them. The pattern was not perfect - sometimes it was irregular, and sometimes there were smaller features too. Some of the smaller ones moved.

“Are they factories?” Charlie asked.

“I’d guess so”, Alice said, watching on the telescope feed as a steady stream of rectangular moving objects, each about ten metres long, slid along a street. Another such stream was moving along an intersecting street, and it looked like they would crash at the intersection, but the timing and spacing was such that vehicles from one stream crossed the road just as there were gaps in vehicles along the other stream.

“A planet covered by factories, then”, Charlie said. “With no one home to turn the lights on.”

“I want to see what they’re making”, Alice said.

-

All through the atmospheric entry of their first drone package, Alice sat tight in her seat and clenched and unclenched her hands. So far all they had done was passive observation or broadcasting. A chunky piece of hardware tracing a streak of red-hot plasma behind it was a much louder knock. She imagined alien jet fighters scrambling to destroy their drones, and some space defence mechanism activating to burn their ship.

The image she saw was a jittery camera feed, showing the black back of the heatshield, the grey skin of the drone package, and a sliver of blue sky. It shook violently as the two halves of the heatshield detached from each other and then the drone package, tumbling off in opposite directions. Land became visible, kilometres below, the grey blocks of the buildings tiny like children’s blocks but still visibly three-dimensional, casting shadows and moving as the drone package continued falling.

The three drones tested their engines, and for a moment flew - or at least slowed their descent - in an ungainly joint configuration, before breaking off from each other and spreading their wings to the fullest. The feed showed the other two drones veering off into the distance on wide narrow wings, and then the view pulled up as the nose of the drone lifted from near-vertical to horizontal.

“Oops, looks like we have company”, Charlie said. He had been tapping away at some other screens while Alice watched the drone deployment sequence.

Alice jumped up from her seat. “What?”

“Our company is … a self-referential joke!”

Alice resisted the temptation to say anything and instead sunk back into her seat. On her monitor, the grey blocks continued slowly moving below the drone. She tapped her foot against the ground.

“Actually though”, Charlie said. “We’re not the only ones in orbit around this planet.”

“What else is orbiting? Has your sense of shame finally caught up with you and joined us?”

“Looks like satellites. Far above us, though. Can you guess how far?”

“I’d guess approximately the distance between you and maturity, so … five light-years?”

Charlie ignored her. “Exactly geostationary altitude”, he said, grinning. The grin was like some platonic ideal of intellectual excitement; too pure for Alice’s annoyance to stay with her, or for her to feel scared about the implications.

“But nothing in lower orbits?” Alice asked.

“No”, Charlie said. “Someone clearly put them there; stuff doesn’t end up at exactly geostationary altitude unless someone deliberately flies a communications or GPS satellite there. Now I can’t be entirely sure that the geostationary satellites are completely dead, but I’d guess that they are.”

“Like everything else”, Alice said, but even as she said so she caught sight of a long trail of vehicles making its way along one of the roads. There was something more real about seeing them on the drone feed.

“Maybe this is just a mining outpost”, Charlie said. “Big rocket launch to blast out a billion tons of ore to god-knows-where, once a year.”

“Or maybe they’re hiding underground or in the oceans”, Alice said.

“Let’s get one of the drones to drop a probe into the oceans. I’ll send one of our initial trio over to the nearest one, it’s only a few hundred kilometres away”, Charlie said.

“Sure”, Alice said.

They split the work of flying the drones, two of them mapping out more and more of the Great Grey Grid (as Alice took to calling it in her head), and one flying over the planet’s largest ocean.

Even the oceans were mostly a barren grey waste. Not empty, though. They did eventually see a few small scaly fish-like creatures that stared at their environment with uncomprehending eyes. Alien life. A young Alice would have been ecstatic. But now she was on a mission, and her inability to figure out what had happened on this planet annoyed her.

In addition to the ocean probe, they had rovers they could send crawling along the ground. Sometimes the doors of the square buildings were open, and Alice would drive a rover past one opening. Most seemed to either be warehouses of stacked crates, or then there would be some kind of automated assembly line of skeletal grey robot arms and moving conveyor belts. A few seemed to place more barriers between the open air and their contents; what went on there, the rovers did not see.

The first time Alice tried to steer a rover into a building, it got run over by a departing convoy of vehicles. The vehicles were rectangular in shape but with an aerodynamic head, with three wheels on each side. Based on their dimensions, she could easily imagine one weighing ten or twenty tons. The rover had no chance.

“Finally!” Charlie had said. “We get to fight these aliens.”

But there was no fight. It seemed like it had been a pure accident, without any hint of malice. The grey vehicles moved and stopped on some schedule of their own, and for all Alice knew they were not just insensitive beasts but blind and dumb ones too.

The next rover got in, quickly scooting through the side of the entrance and then off to one side, out of path of the grey vehicles. It wandered the building on its own, headlights turned on in the otherwise-dark building to bring back a video stream of an assembly line brooded over by those same skeletal hands they had glimpsed from outside. Black plastic beads came in by the million on the grey vehicles. A small thin arm with a spike on the end punctured a few holes on one side, and using these holes two of the black beads were sown onto an amorphous plushy shape. The shape got appendages, were covered with a layer of fluff, and the entire thing became a cheerful purple when it passed through an opaque box with pipes leading into it. It looked like a child’s impression of a hairy four-legged creature with black beady eyes above a long snout. A toy, but for who?

The conveyor belt took an endless line of those fake creatures past the rover’s camera at the end of the assembly line. Alice watched them go, one by one, and fall onto the open back of a grey vehicle. It felt like each and every one made eye contact with her, beady black eyes glinting in the light. She watched for a long time as the vehicle filled up. Once it did, a panel slid over the open top to close the cargo bay, and it sped off out the door. The conveyor belt kept running, but there was a gap of a few metres to the next plushy toy. It came closer and closer to the end - and suddenly a vehicle was driving into place, and the next creature was falling, and it just barely fell into the storage hold of the vehicle while it was driving into place.

“How scary do you find the Blight?” Alice asked.

“Scary enough that I volunteered for this mission”, Charlie said.

Alice remembered the charts they had been shown. They had been hard to miss; even the news, usually full of celebrity gossip and political machinations, had quickly switched to concentrating on the weirdness in the sky once the astronomers spotted it. Starlight dimming in many star systems and what remained of the the light spectra shifting towards the infrared. Draw a barrier around the affected area, and you get a sphere 30 light-years wide, expanding at a third of the speed of light. At the epicentre, a world that had shown all the signs of intelligent life that could be detected from hundreds of light-years away - a world that astronomers had broadcast signals to in the hopes of finally making contact with another civilisation - that had suddenly gone quiet and experienced a total loss of oxygen in its atmosphere. The Blight, they had called it.

In the following years, civilisation had mobilised. A hundred projects had sprung forth. One of them: go investigate the star system that was the second-best candidate for intelligent life, but had refused to answer radio signals, and see if someone was there to help. That was why they were here.

“I think I found something as scary as the Blight”, Alice said. “Come look at this.”

The purple creatures kept parading past the camera feed

-

Over the next five days, while the Blight advanced another forty billion kilometres towards everything they loved back home, Alice and Charlie were busy compiling a shopping catalogue.

“Computers”, Alice said. “Of every kind. A hundred varieties of phones, tablets, laptops, smartwatches, smartglasses, smart-everything.”

“Diamonds and what seems to be jewellery”, Charlie said.

“Millions of tons of every ore and mineral.” They had used their telescopes on what seemed to be a big mine, but they had barely needed them. It was like a huge gash in the flesh of a grey-fleshed and grey-blooded giant, complete with roads that looked like sutures. There were white spots in the image, tiny compared to the mine, each one a sizeable cloud.

“Clothes”, Charlie continued. “Lots and lots of clothes of different varieties. They seem to be shipped around warehouses until they’re recycled.”

“Cars. Sleek electric cars by the million. But we never see them used on the roads, though there are huge buildings were brand-new cars are recycled. And airplanes, including supersonic ones.”

“A lot of things that look like server farms”, Charlie said. “Including ones underwater and on the poles. There’s an enormous amount of compute in this world. Like, mind-boggling. I was thinking we should figure out how to plug into all of it and mine some crypt-”

“Ships with nuclear fusion reactors”, Alice interrupted. There were steady trails of them cutting shortest-path routes between points on the coast.

“Solar panels”, Charlie said. “Basically every spare surface. The building roofs are all covered with solar panels.”

“And children’s plush toys”, Alice said.

They were silent for a while.

“We have a decent idea of what these aliens looked like”, Alice said. “They were organic carbon-based lifeforms, like us. Similar in size too, also bipedal. And it’s like they left some ghostly satanic industrial amusement park running, going through all the motions in their absence, and disappeared.”

“And they didn’t go to space, as far as we know”, Charlie said.

“At least we don’t have any more Blights to worry about then”, Alice said. “I can’t help but imagining that the Blight is something like this. Something that just tiles planets with a Great Grey Grid, does something even worse to the stars, and then moves on.”

“They had space technology, but apparently whoever built the Great Grey Grid didn’t fancy it”, Charlie said. “The satellites might predate it. Probably there were satellites in lower orbits too, but their orbits decayed and they fell down, so we only see the geostationary ones up high.”

“And then what?” Alice said. “All of them vanished into thin air and left behind a highly-automated ghost-town?”

Charlie shrugged.

“Can we plug ourselves into their computers?” Alice asked.

“To mine cr-?”

“To see if anyone’s talking.”

Charlie groaned. “You can’t just plug yourself into a communication system and see anything except encrypted random-looking noise.”

“How do you know they encrypt anything?”

“It would be stupid not to”, Charlie said.

“It would be stupid to blind yourself to the rest of the universe and manufacture a billion plush toys”, Alice said.

“Seems like it will work for them until the Blight arrives.”

-

Alice floated in the middle of the central corridor of the ship. The ship was called Legacy, but even before launch they had taken to calling it “Leggy” for short. The central corridor linked the workstation at the front of the ship where they spent most of their days to the storage bay at the back. In the middle of the corridor, three doors at 120-degree angles from each other lead to the small sleeping rooms, each of them little more than a closet.

Alice had woken up only a few minutes ago, and still felt an early-morning grogginess as well as the pull of her bed. The corridor had no windows or video feeds, but was dimly lit by the artificial blue light from the workstation. They were currently on the night side of the planet.

She took a moment to look at the door of the third sleeping room. It was closed, like always, with its intended inhabitant wrapped in an air-tight seal of plastic in a closed compartment of the storage bay. They would flush him into space before they left for home again; they could have no excess mass on the ship for the return journey.

Alice thought again of the hectic preparations for the mission. Apart from Blightsource, this was only one planet the astronomers had spotted that might have intelligent life on it, and the indications were vague. But when you look into space and see something that looks like an approaching wall of death - well, that has a certain way of inspiring long-shots. Hence the mission, hence Legacy’s flight, hence crossing over the vast cold stretch of interstellar space to see if any answers could be found on this world. Hence Bob’s death while in cryonic suspension for the trip. Hence the hopes of all civilisation potentially resting on her and Charlie figuring valuable out something.

If Charlie and she could find something on this world, some piece of insight or some tool or weapon among the countless pieces of technological wizardry that this world had in spades, that had a credible chance against the Blight when it arrived … maybe there was hope.

Alice pushed off on the wall and set herself in a slow spinning motion. The ship seemed to revolve around her. Bob’s door revolved out of sight, and Charlie’s door became visible -

Wait.

Her gravity-bound instincts kicked in and she tried to stop the spin by shoving back with her hands, but there was nothing below her, so she remained spinning slowly. She breathed in deeply to calm herself down, then kicked out a foot against the wall to push herself to the opposite one. She grabbed one of the handles on the wall and held onto it.

The light on Charlie’s room was off. That meant it was empty.

“Charlie!” Alice called.

No response.

The fear came fast. Here she was, light-years from home, perhaps all alone on a spaceship tracing tight circles around a ghostly automated graveyard planet. The entire mass of the planet stood between her and the sun. Out between the stars, the Blight was closing in on her homeworld. She counted to calm herself down; one, two, three, … and just like that, the Blight was three hundred thousand kilometres closer to home. Unbidden, an image of the fluffy purple creature popped up in her mind, complete with its silly face and unblinking eye contact.

Soundlessly, she used the handles on the wall of the corridor to pull herself towards the workstation. She reached the door, peered inside -

There was Charlie, staring at a computer screen. He looked up and saw Alice. “You scared me!” he said. “Watch out, no need to sneak behind me so quietly.”

“I called your name”, Alice said.

“I know, I know”, Charlie said. “But I’m on to something here, and I just want to run a few more checks and then surprise you with the result.”

“What result?” Alice glanced at some of the screens. Two of the drones were above the Great Grey Grid, one above ocean. With their nuclear power source, they could stay in the air as long as they wanted. Even though their focus was no longer aerial reconnaissance, there was no reason not to keep them mapping the planet from up close, occasionally picking up things that their surveys from the ship did not.

“I fixed the electrical issues with the rover and the cable near the data centre”, Charlie said.

“So you’re getting data, not just frying our equipment?”

“Yes”, Charlie said. “And guess what?”

“What?”

“Guess!”

“You found a Blight-killer”, Alice said.

“No! Even better! These idiots don’t encrypt their data as far as I can tell. And I think a lot of it is natural language.”

“Okay, and can we figure out what it means?”

“We have automated programs for trying to derive syntax rules and so on”, Charlie said. “It’s already found something, including good guesses of which words are prepositions and what type of grammar they have. But mapping words to meaning based on purely statistics of how often they occur is hard.”

“I’ve seen products they have with pictures and instruction manuals”, Alice said. “We could start there.”

“Oh no”, Charlie said. “This is going to be a long process.”

-

By chance, it turned out not to be. Over the next day, they had sent a rover to a furniture factory and had managed, after some attempts, to steal an instruction leaflet out of a printer before the robotic arm could snatch it to be packaged with the furniture. Somehow Alice was reminded of her childhood adventures stealing fruit from the neighbour’s garden.

They had figured out which words meant “cupboard”, “hammer”, and “nail”, and so on. But then another rover on the other side of the world had seen something. It was exploring a grey and windy coast. On one side of the rover was the Great Grey Grid and the last road near the coast, the occasional vehicle hurtling down it. But on the other side was a stretch of rocky beach hammered by white-tipped waves, a small sliver of land that hadn’t been converted to grey.

The land rose by the beach, forming a small hill with jagged rocky sides. The sun shone down on one face of it, but there was a hollow, or perhaps small cave, that was left in the dark by the overhanging rock. And in the rock around this entrance, there were several unmistakable symbols scratched into the rock, each several metres high.

Alice took manual control of the rover and carefully instructed it to drive over the rocky beach towards the cave entrance. On the way it passed what seemed to be a fallen metal pole with some strips of fabric still clinging to it.

Once it was close enough to the mouth of what turned out to be a small cave, the camera could finally see inside.

There was a black cabinet inside. Not far from it, lying on the ground, was the skeleton of a creature with four slender limbs and a large head. Empty eye sockets stared out towards the sky.

Alice felt her heart beating fast. It wasn’t quite right; many of the anatomical details were off. But it was close enough, the similarity almost uncanny. Here, hundreds of light years away, evolution had taken a similar path, and produced sapience. And then killed it off.

“Charlie”, she said in a hoarse voice.

“What?” Charlie asked, sounding annoyed. He had been staring at an instruction manual for a chair, but he looked up and saw the video feed. “Oh”, he said, in a small voice. “We found them.”

Alice tore her eyes away from the skeleton and to the small black cabinet. It had a handle on it. She had the rover extend an arm and open it.

-

The capsule docked with Leggy and in the weightless environment they pushed the cabinet easily into the ship. They had only two there-and-back-again craft - getting back to orbit was hard - but they had quickly decided to use one to get this cabinet up. It had instructions, after all; very clear instructions, though ones that their rovers couldn’t quite follow.

It started from a pictographic representation, etched onto plastic cards, of how you were supposed to read the disks. They managed to build something that could read the microscopic grooves on the disk as per the instructions, and transfer the data to their computers.

After a few hours of work, they had figured out the encodings for numbers, the alphabet, their system of units, and seemingly also some data formats, including for images.

Confirmation came next. The next item on the disk was an image of two of the living aliens, standing on a beach during a sunset. Alice stared into their faces for a long time.

Next there came images next to what were clearly words of text, about fifty of them. Some of the more abstract ones took a few guesses, but ultimately they thought they had a base vocabulary, and with the help of some linguistics software, it did not take very long before they had a translated vocabulary list of about eight thousand words.

Alice was checking the work when Charlie almost shouted: “Look at this!”

Alice looked at what he was pointing at. It was a fragment of text that read:

Hello,

The forms for ordering the new furniture are attached. Please fill them in and we will respond to your order as quickly as we can!

If you need any help, please contact customer support. You will find the phone number on our website.

“What is this? Is Mr Skeleton trying to sell us furniture from beyond the grave?” Alice asked.

“No”, Charlie said. “This isn’t what I got from the recovered data; I haven’t looked at the big remaining chunk yet. This is what I got by interpreting one of the packets of data running on the cables that our rover is plugged into using what we now know about their data formats and the language.”

“And?”

“I don’t get it!” Charlie said. “Why would a world of machines send each other emails in natural language?”

“Why would they manufacture plushy toys? I doubt the robotic arms need cuddles.”

Charlie looked at the world, slowly spinning underneath their ship. “Being so close to it makes me feel creeped out. I don’t get it.”

“You don’t want to lick it anymore?” Alice asked. She decided not to tell Charlie about her own very similar feelings earlier, when she thought for a moment Charlie had gone missing.

Charlie ignored her. “I think the last thing on Mr Skeleton’s hard-drive is a video”, he said. “I’ve checked and it seems to play.”

“You looked at it first?” Alice said in a playfully mocking tone. The thrill of discovery was getting to her.

“Only the first five frames”, Charlie said. “Do you want to watch it?”

-

Our Civilisation: A Story read a short fragment of subtitle, white on black, auto-translated by a program using the dictionary they had built up.

There was a brief shot of some semi-bipedal furry creature walking in the forest. Then one of a fossilised skeleton of something more bipedal and with a bigger head. Then stone tools: triangular ones that might have been spear tips, saw-toothed ones, clubs. A dash of fading red paint on a rock surface, in the shape of a cartoon version of that same bipedal body plan.

There were two pillars of stone in a desert on what looked like a pedestal, some faded inscription at its base and the lone and level sands stretching far away. There was a shot of an arrangement of rocks, some balancing on top of two others, amid a field of green. A massive pyramidal stone structure, lit by the rising sun.

Blocky written script etched on a stone tablet. Buildings framed by columns of marble. A marble statue of one of the aliens, a sling carelessly slung over its shoulder, immaculate in its detail. A spinning arrangement of supported balls orbiting a larger one. And still it moves, the subtitles flashed.

A collection of labelled geometric diagrams on faded yellow paper. Mathematical Principles of Natural Philosophy.

A great ornate building with a spire. A painting of a group of the aliens clad in colourful clothing. An ornate piece of writing. We hold these truths to be self-evident …

A painting of a steam locomotive barrelling along tracks. A diagram of a machine. A black-and-white picture of one of the aliens, then another. Government of the people, for the people, by the people, shall not perish …

An alien with white hair sticking up, holding a small stick of something white and with diagrams of cones behind him. Grainy footage of propeller aircraft streaking through the sky, and then of huge masses of people huddling together and walking across a barren landscape, and then of aliens all in the same clothes charging a field, some of them suddenly jerking about and falling to the ground. We will fight on the beaches, we will fight on the landing grounds …

A black-and-white footage of a mushroom cloud slowly rising from a city below. A picture, in flat pale blue and white, showing a stylised representation of the world’s continents. The same picture, this time black-and-white, on the wall of a room where at least a hundred aliens were sitting.

An alien giving a speech. I have a dream. An alien, looking chubby in a space suit, standing on a barren rocky surface below an ink-black sky next to a pole with a colourful rectangle attached to it.

Three aliens in a room, looking at the camera and holding up a piece of printed text. Disease eradicated.

What looked like a primitive computer. A laptop computer. An abstract helical structure of balls connected by rods, and then flickering letters dancing across the screen.

A blank screen, an arrow extending left to right across it - time, flashed the subtitles- and then another arrow from the bottom-left corner upwards - people in poverty - and then a line crawling from left to right, falling as it did so.

A line folding itself up into a complicated shape. AI system cracks unsolved biology problem.

From then on, the screen showed pictures of headlines.

All routine writing tasks now a solved problem, claims AI company.

Office jobs increasingly automated.

Three-fourths of chief executives of companies on the [no translation] admit to using AI to help write emails, one-third have had AI write a shareholder letter or strategy document.

Exclusive report: world’s first fully-automated company, a website design agency.

Mass layoffs as latest version of [no translation] adopted at [no translation]; ‘stunning performance’ at office work.

Nations race to reap AI productivity gains: who will gain and who will lose?

CEO of [no translation] resigns, claiming job pointless, both internal and board pressure to defer to “excellently-performing” AI in all decisions.

[No translation] ousts executive and management team, announces layoffs; board supports replacing them with AI to keep up with competition.

Entirely or mostly automated companies now delivering 2.5x higher returns on investment on average; ‘the efficiency difference is no joke’, says chair of [no translation].

Year-on-year economic growth hits 21% among countries with advanced AI access.

Opinion: the new automated economy looks great on paper but is not serving the needs of real humans.

Mass protests after [no translation], a think-tank with the ear of the President, is discovered to be funded and powered by AI board of [no translation], and to have practically written national economic policy for the past two years.

‘No choice but forward’, says [no translation] after latest round of worries about AI; unprecedented economic growth still strong.

[No translation 1] orders raid of [no translation 2] over fears [no translation 2] is not complying with latest AI use regulations, but cannot execute order due to noncompliance from the largely-automated police force; ‘we are working with our AI advisers and drivers in accordance with protocol, and wish to assure the [no translation 3] people that we are still far from the sci-fi scenario where our own police cars have rebelled against us.’

‘AI overthrow’ fears over-hyped, states joint panel of 30 top AI scientists and business-people along with leading AI advisory systems; ‘they’re doing a good job maximising all relevant metrics and we should let them keep at it, though businesses need to do a better job of selecting metrics and tough regulation is in order.’

Opinion: we’re better-off under a regime of rigorous AI decision-making than under corrupt politicians; let the AIs repeat in politics what they’ve done for business over the last five years.

‘The statistics have never looked so good’ - Prime Minister reassures populace as worries mount over radical construction projects initiated by top AI-powered companies.

Expert panel opinion: direct AI overthrow scenario remains distant threat, but more care should be exercised over choice of target metrics; recommend banning of profit-maximisation target metric.

Movement to ban profit-maximising AIs picks up pace.

Top companies successfully challenge new AI regulation package in court.

‘The sliver of the economy over which we retain direct control will soon be vanishingly small’, warns top economist, ‘action on AI regulation may already be too late’.

Unverified reports of mass starvation in [no translation]; experts blame agricultural companies pivoting to more land-efficient industries.

Rant goes viral: ‘It’s crazy, man, we just have these office AIs that only exist in the cloud, writing these creepily-human emails to other office AIs, all overseen by yet another AI, and like most of their business is with other AI companies; they only talk to each other, they buy and sell from each other, they do anything as long as it makes those damned numbers on their spreadsheets just keep ticking up and up; I don’t think literally any human has ever seen a single product out of the factory that just replaced our former neighbourhood, but those factories just keep going up everywhere.’

Revolution breaks out in [no translation]; government overthrown, but it’s business-as-usual for most companies, as automated trains, trucks, and ships keep running.

[No translation] Revolution: Leaked AI-written email discovered, in which the AI CEO ordered reinforcement of train lines and trains three weeks ago. ‘We are only trying to ensure the continued functioning of our supply chains despite the recent global unrest, in order to best serve our customers’, CEO writes in new blog post.

[No translation] Revolution: crowds that tried swarming train lines run over by trains; ‘the trains didn’t even slow down’, claim witnesses. CEO cites fiduciary duties.

Despite unprecedented levels of wealth and stability, you can’t actually do much: new report finds people trying to move house, book flight or train tickets, or start a new job or company often find it difficult or impossible; companies prioritising serving ‘more lucrative’ AI customers and often shutting down human-facing services.

Expert report: ‘no sign of human-like consciousness even in the most advanced AI systems’, but ‘abundantly clear’ that ‘the future belongs to them’.

New report: world population shrinking rapidly; food shortages, low birth rates, anti-natalist attitudes fuelled by corporate campaigns to blame.

The screen went blank. Then a video of an alien appeared, sitting up on a rocky surface. Alice took a moment to realise that it’s the same cave they found the skeleton in. The alien’s skin was wrapped tight around its bones, and even across the vast gulf of biology and evolutionary history, Alice could tell that it is not far from death. It opened its mouth, and sound came out. Captions appeared beneath it.

“It is the end”, the alien said, its eyes staring at them from between long unkempt clumps of hair. “On paper, I am rich beyond all imagination. But I have no say in this new world. And I cannot find food. I will die.”

The wind tugged at the alien’s long hair, but otherwise the alien was so still that Alice wondered if it had died there and then.

“There is much I would like to say”, the alien says. “But I do not have the words, and I do not have the energy.” It paused. “I hope it was not all in vain. Or, that if for us it was, that for someone up there it isn’t.”

The video went blank.

Alice and Charlie watched the blank screen in silence.

“At least the blight they birthed seems to have stuck to their world”, Charlie said after a while.

“Yeah”, Alice said, slowly. “But I don’t think we’ll find anything here.”

Legacy completed nine more orbits of the planet, and then jettisoned all unnecessary mass into space. Its engines jabbed against the darkness of space, bright enough to be visible from the planet’s surface. There was no one to see them.

On a factory down on the planet, an assembly line of beady-eyed purple plush toys marched on endlessly.

The title of this work is taken from a passage in Superintelligence: Paths, Dangers, Strategies, where Nick Bostrom writes:

We could thus imagine, as an extreme case, a technologically highly advanced society, containing many complex structures, some of them far more intricate and intelligent than anything that exists on the planet today—a society which nevertheless lacks any type of being that is conscious or whose welfare has moral significance. In a sense, this would be an uninhabited society. It would be a society of economic miracles and technological awesomeness, with nobody there to benefit. A Disneyland without children. [emphasis added]

The outline of events presented draws inspiration from several sources, but most strongly on Paul Christiano’s article What failure looks like.

Deciding not to found a human-data-for-alignment startup

2022-09-27T21:38:00.002+01:00

8.6k words (~30 minutes)

Both the project and this write-up were a collaboration with Matt Putz.

Matt Putz and I worked together for the first half of the summer to figure out if we should found a startup with the purpose of helping AI alignment researchers get the datasets they need to train their ML models (especially in cases where the dataset is based on human-generated data). This post, also published on the Effective Altruism Forum and LessWrong (both of which may contain additional discussion in the comments), is a summary of our findings, and why we decided to not do it.

Summary

One-paragraph summary: we (two recent graduates) spent about half of the summer exploring the idea of starting an organisation producing custom human-generated datasets for AI alignment research. Most of our time was spent on customer interviews with alignment researchers to determine if they have a pressing need for such a service. We decided not to continue with this idea, because there doesn’t seem to be a human-generated data niche (unfilled by existing services like Surge) that alignment teams would want outsourced.

In more detail: The idea of a human datasets organisation was one of the winners of the Future Fund project ideas competition, still figures on their list of project ideas, and had been advocated before then by some people, including Beth Barnes. Even though we ended up deciding against, we think this was a reasonable and high-expected-value idea for these groups to advocate at the time.

Human-generated data is often needed for ML projects or benchmarks if a suitable dataset cannot be e.g. scraped from the web, or if human feedback is required. Alignment researchers conduct such ML experiments, but sometimes have different data requirements than standard capabilities researchers. As a result, it seemed plausible that there was some niche unfilled by the market to help alignment researchers solve problems related to human-generated datasets. In particular, we thought - and to some extent confirmed - that the most likely such niche is human data generation that requires particularly competent or high-skill humans. We will refer to this as high-skill (human) data.

We (Matt & Rudolf) went through an informal co-founder matching process along with four other people and were chosen as the co-founder pair to explore this idea. In line with standard startup advice, our first step was to explore whether or not there is a concrete current need for this product by conducting interviews with potential customers. We talked to about 15 alignment researchers, most of them selected on the basis of doing work that requires human data. A secondary goal of these interviews was to build better models for the future importance and role of human feedback in alignment.

Getting human-generated data does indeed cost many of these researchers significant time and effort. However, we think to a large extent this is because dealing with humans is inherently messy, rather than existing providers doing a bad job. Surge AI in particular seems to offer a pretty good and likely improving service. Furthermore, many companies have in-house data-gathering teams or are in the process of building them.

Hence we have decided to not further pursue this idea.

Other projects in the human data generation space may still be valuable, especially if the importance of human feedback in ML continues to increase, as we expect. This might include people specializing on human data as a career.

The types of factors that are most important for doing human dataset provision well include: high-skill contractors, fast iteration, and high bandwidth communication and shared understanding between the research team, the provider organisation and the contractors.

We are keen to hear other people’s thoughts, and would be happy to talk or to share more notes and thoughts with anyone interested in working on this idea or a similar one in the future.

Theory of Change

A major part of AI alignment research requires doing machine learning (ML) research, and ML research in turn requires training ML models. This involves expertise and execution ability in three broad categories: algorithms, compute, and data, the last of which is very neglected by EAs.

We expect training on data from human feedback to become an increasingly popular and very powerful tool in mainstream ML (see below). Furthermore, many proposals for alignment (for example: reinforcement learning from human feedback (RLHF) and variants like recursive reward modelling, iterated amplification, and safety via debate) would require lots of human interaction or datasets based on human-generated data.

While many services (most notably Surge) exist for finding labour to work on data generation for ML models, it seems plausible that an EA-aligned company could add significant value because:

Markets may not be efficient enough to fill small niches that are more important to alignment researchers than other customers; high-skill human data that requires very competent crowdworkers may be one such example. If alignment researchers can get it at all, it might be very expensive.
We have a better understanding of alignment research agendas, and this might help. This may allow us to make better-informed decisions on many implementation details with less handholding, thereby saving researchers time.
We would have a shared goal with our customers: reducing AI x-risk. Though profit motives already provide decent incentives to offer a good service, mission alignment helps avoid adversarial dynamics, increases trust, and reduces friction in collaboration.
An EA-led company may be more willing to make certain strategic moves that go against its profit incentives; e.g. investing heavily into detecting a model’s potential attempts to deceive the crowdworkers, even when it’s hard for outsiders to tell whether such monitoring efforts are sincere and effective (and thus customers may not be willing to pay for it). Given that crowdworkers might provide a reward signal, they could be a key target for deceptive AIs.

Therefore, there is a chance that an EA-led human data service that abstracts out some subset of dataset-related problems (e.g. contractor finding, instruction writing/testing, UI and pipeline design/coding, experimentation to figure out best practices and accumulate that knowledge in one place) would:

save the time of alignment researchers, letting them make more progress on alignment; and
reduce the cost (in terms of time and annoying work) required to run alignment-relevant ML experiments, and therefore bring more of them below the bar at which it makes sense to run them, and thus increasing the number of such experiments that are run.

In the longer run, benefits of such an organisation might include:

There is some chance that we could simply outcompete existing ML data generation companies and be better even in the cases where they do provide a service; this is especially plausible for relatively niche services. In this scenario we’d be able to exert some marginal influence over the direction of the AI field, for example by only taking alignment-oriented customers. This would amount to differential development of safety over capabilities. Beyond only working with teams that prioritise safety, we could also pick among self-proclaimed “safety researchers”. It is common for proclaimed safety efforts to be accused of helping more with capabilities than alignment by other members of the community.
There are plausibly critical actions that might need to be taken for alignment, possibly quickly during “crunch-time”, that involve a major (in quality or scale) data-gathering project (or something like large-scale human-requiring interpretability work, that makes use of similar assets, like a large contractor pool). At such a time it might be very valuable to have an organisation committed to x-risk minimisation with the competence to carry out any such project.

Furthermore, if future AIs will learn human values from human feedback, then higher data quality will be equivalent to a training signal that points more accurately at human values. In other words, higher quality data may directly help with outer alignment (though we're not claiming that it could realistically solve it on its own). In discussions, it seemed that Matt gave this argument slightly more weight than Rudolf.

While these points are potentially high-impact, we think that there are significant problems with starting an organisation mainly to build capacity to be useful only at some hypothetical future moment. In particular, we think it is hard to know exactly what sort of capacity to build (and the size of the target in type-of-capacity space might be quite small), and there would be little feedback that the organisation could improve or course-correct based on.

More generally, both of us believe that EA is right now partly bottlenecked by people who can start and scale high-impact organisations, which is a key reason why we’re considering entrepreneurship. This seems particularly likely given the large growth of the movement.

What an org in this space may look like

Providing human datasets

The concept we most seriously considered was a for-profit that would specialise in meeting the specific needs of alignment researchers, probably by focusing on very high-skill human data. Since this niche is quite small, the company could offer a very custom-tailored service. At least for the first couple years, this would probably mean both of us having a detailed understanding of the research projects and motivations of our customers. That way, we could get a lot of small decisions right, without the researchers having to spend much time on it. We might be especially good at that compared to competitors, given our greater understanding of alignment.

Researching enhanced human feedback

An alternative we considered was founding a non-profit that would research how to enhance human feedback. See this post by Ajeya Cotra for some ideas on what this kind of research could look like. The central question is whether and how you can combine several weak training signals into a stronger more accurate one. If this succeeded, maybe (enhanced) human feedback could become a more accurate (and thereby marginally safer) signal to train models on.

We decided against this for a number of reasons:

Currently, neither of us has more research experience than an undergraduate research project.
We thought we could get a significant fraction of the benefits of this kind of research even if we did the for-profit version, and plausibly even more valuable expertise.
- First of all, any particular experiment that funders would have liked to see, they could have paid us to do, although we freely admit that this is very different from someone pushing forward their own research agenda.
- More importantly, we thought a lot of the most valuable expertise to be gained would come in the form of tacit knowledge and answers to concrete boring questions that are not best answered by doing “research” on them, but rather by iterating on them while trying to offer the best product (e.g. “Where do you find the best contractors?”, “How do you incentivize them?”, “What’s the best way to set up communication channels?”).
  - It is our impression that Ought pivoted away from doing abstract research on factored cognition and toward offering a valuable product for related reasons.
This topic seems plausibly especially tricky to research (though some people we’ve spoken to disagreed):
- At least some proposed such experiments would not involve ML models at all. We fear that this might make it especially easy to fool ourselves into thinking some experiment might eventually turn out to be useful when it won’t. More generally, the research would be pretty far removed from the end product (very high quality human feedback). In the for-profit case on the other hand, we could easily tell whether alignment teams were willing to pay for our services and iteratively improve.

For-profit vs non-profit

We can imagine two basic funding models for this org:

either we’re a nonprofit directly funded by EA donors and offering free or subsidized services to alignment teams;
or we’re a for-profit, paid by its customers (ie alignment teams).

Either way, a lot of the money will ultimately come from EA donors (who fund alignment teams.)

The latter funding mechanism seems better; “customers paying money for a service” leads to the efficient allocation of resources by creating market structures. They have a clear incentive to spend the money well. On the other hand, “foundations deciding what services are free” is more reminiscent of planned economies and distorts markets. To a first approximation, funders should give alignment orgs as much money as they judge appropriate and then alignment orgs should exchange it for services as they see fit.

A further reason is that a non-profit is legally more complicated to set up, and imposes additional constraints on the organisation.

Should the company exclusively serve alignment researchers?

We also considered founding a company with the ambition to become a major player in the larger space of human data provision. It would by default serve anyone willing to pay us and working on something AGI-related, rather than just alignment researchers. Conditional on us being able to successfully build a big company, this would have the following upsides:

Plausibly one of the main benefits of founding a human data gathering organisation is to produce EAs and an EA org that have deep expertise in handling and producing high-skill human data in significant quantities. That might prove useful around “crunch time”, e.g. when some project aims to create competitive but safe AGI and needs this expertise. Serving the entire market could scale to a much larger company enabling us to gain expertise at higher scales.
Operating a large company would also come with some degree of market power. Any company with paying customers has some amount of leverage over them: first of all just because of switching costs, but also because the product it offers might be much better than the next-best alternative. This could allow us to make some demands, e.g. once we’re big and established, announce we’d only work with companies that follow certain best practices.

On the other hand, building a big successful company serving anyone willing to pay might come with some significant downsides as well.

First, and most straightforwardly, it is probably much harder than filling a small niche (just meeting the specific needs of alignment researchers), making us less likely to succeed. A large number of competitors exist and as described in this section, some of them (esp. Surge) seem pretty hard to beat. Since this is an already big and growing market, there is an additional efficient markets reason to assume this is true a priori.
Secondly, and perhaps more importantly, such a company might accelerate capabilities (more on this below).

Furthermore, it might make RLHF (Reinforcement Learning from Human Feedback) in particular more attractive. Depending on one’s opinions about RLHF and how it compares to other realistic alternatives, one might consider this a strong up- or downside.

Approach

The main reason companies fail is that they build a product that customers don’t want. For for-profits, the signal is very clear: either customers care enough to be willing to pay hard cash for the product/service, or they don’t. For non-profits, the signal is less clear, and therefore nonprofits can easily stick around in an undead state, something that is an even worse outcome than the quick death of a for-profit because of resource (mis)allocation and opportunity costs. As discussed, it is not obvious which structure we should adopt for this organisation, though for-profit may be a better choice on balance. However, in all cases it is clear that the organisation needs to solve a concrete problem or provide clear value to exist and be worth existing. This does not mean that the value proposition needs to be certain; we would be happy to take a high-risk, high-reward bet, and generally support hits-based approaches to impact both in general and for ourselves.

An organisation is unlikely to do something useful to its customers without being very focused on customer needs, and ideally having tight feedback cycles.

The shortest feedback loops are when you’re making a consumer software product where you can prototype quickly (including with mockups), and watch and talk to users as they use the core features, and then see if the user actually buys the product on the spot. A datasets service differs from this ideal feedback mode in a number of ways:

The product is a labour-intensive process, which means the user cannot quickly use the core features and we cannot quickly simulate them.
The actual service requires either a contractor pool or (potentially at the start) the two of us spending a number of hours per request generating data.
There is significant friction to getting users to use the core feature (providing a dataset), since it requires specification of a dataset from a user, which takes time and effort.

Therefore, we relied on customer interviews with prospective customers. The goal of these interviews was to talk to alignment researchers who work with data, and figure out if external help with their dataset projects would be of major use to them.

Our approach to customer interviews was mostly based on the book The Mom Test, which is named after the idea that your customer interview questions should be concrete and factual enough that even someone as biased as your own mom shouldn’t be able to give you a false signal about whether the idea is actually good. Key lessons emphasised by The Mom Test include emphasising:

factual questions about the past over hypothetical questions for the future;
- In particular, questions about concrete past and current efforts spent solving a problem rather than questions about current or future wishes for solving a problem
questions that get at something concrete (e.g. numbers); and
questions that prompt the customer to give information about their problems and priorities without prompting them with a solution.

We wanted to avoid the failure mode where lots of people tell us something is important and valuable in the abstract, without anyone actually needing it themselves.

We prepared a set of default questions that roughly divided into:

A general starting question prompting the alignment researcher to describe the biggest pain points and bottlenecks they face in their work, without us mentioning human data.
Various questions about their past and current dataset-related work, including what types of problems they encounter with datasets, how much of their time these problems take, and steps they took to address these problems.
Various questions on their past experiences using human data providers like Surge, Scale, or Upwork, and specifically about any things they were unable to accomplish because of problems with such services.
In some cases, more general questions about their views on where the bottlenecks for solving alignment are, views on the importance of human data or tractability of different data-related proposals, etc.
What we should’ve asked but didn’t, and who else we should talk to.

Point 4 represents the fact that in addition to being potential customers, alignment researchers also doubled as domain experts. The weight given to the questions described in point 4 varied a lot, though in general if someone was both a potential customer and a source of data-demand-relevant alignment takes, we prioritised the customer interview questions.

In practice, we found it easy to arrange meetings with alignment researchers; they generally seemed willing to talk to people who wanted input on their alignment-relevant idea. We did customer interviews with around 15 alignment researchers, and had second meetings with a few. For each meeting, we prepared beforehand a set of questions tweaked to the particular person we were meeting with, which sometimes involved digging into papers published by alignment researchers on datasets or dataset-relevant topics (Sam Bowman in particular has worked on a lot of data-relevant papers). Though the customer interviews were by far the most important way of getting information on our cruxes, we found the literature reviews we carried out to be useful too. We are happy to share the notes from the literature reviews we carried out; please reach out if this would be helpful to you.

Though we prepared a set of questions beforehand, in many meetings - including often the most important or successful ones - we often ended up going off script fairly quickly.

Something we found very useful was that, since there were two of us, we could split the tasks during the meeting into two roles (alternating between meetings):

One person who does most of the talking, and makes sure to be focused on the thread of the conversation.
One person who mostly focuses on note-taking, but also pipes in if they think of an important question to ask or want to ask for clarification.

Key crux: demand looks questionable, Surge seems pretty good

Common startup advice is to make sure you have identified a very strong signal of demand before you start building stuff. That should look something like someone telling you that the thing you’re working on is one of their biggest bottlenecks and that they can’t wait to pay you asap so you solve this problem for them. “Nice to have” doesn’t cut it. This is in part because working with young startups is inherently risky, so you need to make up for that by solving one of their most important problems.

In brief, we don’t think this level of very strong demand currently exists, though there were some weaker signals that looked somewhat promising. There are many existing startups that offer human feedback already. Surge AI in particular was brought up by many people we talked to and seems to offer quite a decent service that would be hard to beat.

Details about Surge

Surge is a US-based company that offers a service very similar to what we had in mind, though they are not focused on alignment researchers exclusively. They build data-labelling and generation tools and have a workforce of crowdworkers.

They’ve worked with Redwood and the OpenAI safety team, both of which had moderately good experiences with them. More recently, Ethan Perez’s team have worked with Surge too; he seems to be very satisfied based on this Twitter thread.

Collaboration with Redwood

Surge has worked with Redwood Research on their paper about adversarial training. This is one of three case studies on Surge’s website, so we assume it’s among the most interesting projects they’ve done so far. The crowdworkers were tasked with coming up with prompts that would cause the model to output text in which someone got injured. Furthermore, crowdworkers also classified whether someone got injured in a given piece of text.

One person from Redwood commented that doing better than Surge seemed possible to them with “probably significant value to be created”, but “not an easy task”. They thought our main edge would have to be that we’d specialise on fuzzy and complex tasks needed for alignment; Surge apparently did quite well with those, but still with some room for improvement. A better understanding of alignment might lower chances of miscommunication. Overall, Redwood seems quite happy with the service they received.

Initially, Surge’s iteration cycle was apparently quite slow, but this improved over time and was “pretty good” toward the end.

Redwood told us they were quite likely to use human data again by the end of the year and more generally in the future, though they had substantial uncertainty around this. Their experience in working with human feedback overall was somewhat painful as we understood it. This is part of the reason they’re uncertain about how much human feedback they will use for future experiments, even though it’s quite a powerful tool. However, they estimated that friction in working with human feedback was mostly caused by inherent reasons (humans are inevitably slower and messier than code), rather than Surge being insufficiently competent.

Collaboration with OpenAI

OpenAI have worked with Surge in the context of their WebGPT paper. In that paper, OpenAI fine-tuned their language model GPT-3 to answer long-form questions. The model is given access to the web, where it can search and navigate in a text-based environment. It’s first trained with imitation learning and then optimised with human feedback.

Crowdworkers provided “demonstrations”, where they answered questions by browsing the web. They also provided “comparisons”, where they indicated which of two answers to the same question they liked better.

People from OpenAI said they had used Surge mostly for sourcing the contractors, while doing most of the project management, including building the interfaces, in-house. They were generally pretty happy with the service from Surge, though all of them did mention shortcomings.

One of the problems they told us about was that it was hard to get access to highly competent crowdworkers for consistent amounts of time. Relatedly, it often turned out that a very small fraction of crowdworkers would provide a large majority of the total data.

More generally, they wished there had been someone at Surge that understood their project better. Also, it might have been somewhat better if there had been more people with greater experience in ML, such that they could have more effectively anticipated OpenAI’s preferences — e.g. predict accurately what examples might be interesting to researchers when doing quality evaluation. However, organisational barriers and insufficient communication were probably larger bottlenecks than ML knowledge. At least one person from OpenAI strongly expressed a desire for a service that understood their motives well and took as much off their plate as possible in terms of hiring and firing people, building the interfaces, doing quality checks and summarising findings etc. It is unclear to us to what extent Surge could have offered these things if OpenAI hadn’t chosen to do a lot of these things in-house. One researcher suggested that communicating their ideas reliably was often more work than just doing it themselves. As it was, they felt that marginal quality improvement required significant time investment on their own part, i.e. could not be solved with money alone.

Notably, one person from OpenAI estimated that about 60% of the WebGPT team’s efforts were spent on various aspects of data collection. They also said that this figure didn’t change much after weighting for talent, though in the future they expect junior people to take on more disproportionate shares of this workload.

Finally, one minor complaint that was mentioned was the lack of transparency about contractor compensation.

How mission-aligned is Surge?

Surge highlight their collaboration with Redwood on their website as one of three case studies. In their blog post about their collaboration with Anthropic, the first sentence reads: “In many ways, alignment – getting models to align themselves with what we want, not what they think we want – is one of the fundamental problems of AI.”

On the one hand, they describe alignment as one of the fundamental problems of AI, which could indicate that they intrinsically cared about alignment. However, they have a big commercial incentive to say this. Note that many people would consider their half-sentence definition of alignment to be wrong (a model might know what we want, but still do something else).

We suspect that the heads of Surge have at least vaguepositive dispositions towards alignment. They definitely seem eager to work with alignment researchers, which might well be more important. We think it’s mostly fine if they are not maximally intrinsically driven, though mission alignment does add value as mentioned above.

Other competitors

We see Surge as the most direct competitor and have researched them by far in the most detail. But besides Surge, there are a large number of other companies offering similar services.

First, and most obviously, Amazon Mechanical Turk offers a very low quality version of this service and is very large. Upwork specialises in sourcing humans for various tasks, without building interfaces. ScaleAI is a startup with a $7B valuation --- they augment human feedback with various automated tools. OpenAI have worked with them. Other companies in this broad space include Hybrid (which Sam Bowman’s lab has worked with) and Invisible (who have worked with OpenAI). There are many more that we haven’t listed here.

In addition, some labs have in-house teams for data gathering (see here for more).

Data providers used by other labs

Ethan Perez’s and Sam Bowman’s labs at NYU/Anthropic have historically often built their own interfaces while using contractors from Upwork or undergrads, but they have been trialing Surge over the summer and seem likely to stick with them if they have a good experience. Judging from the Twitter thread linked above and asking Jérémy Scheurer (who works on the team and built the pre-Surge data pipeline) how they’ve found Surge so far, Surge is doing a good job.

Google has an internal team that provides a similar service, though DeepMind have used at least one external provider as well. We expect that it would be quite hard to get DeepMind to work with us, at least until we would be somewhat more established.

Generally, we get the impression that most people are quite happy with Surge. It’s worth also considering that it’s a young company that’s likely improving its service over time. We’ve heard that Surge iterates quickly, e.g. by shipping simple feature requests in two days. It’s possible that some of the problems listed above may no longer apply by now or in a few months.

Good signs for demand

One researcher we talked to said that there were lots of projects their team didn’t do, because gathering human feedback of sufficient quality was infeasible.

One of the examples this researcher gave was human feedback on code quality. This is implausible to do, because the time of software engineers is just too expensive. That problem is hard for a new org to solve.

Another example they gave seemed like it might be more feasible: for things like RLHF, they often choose to do pairwise comparisons between examples or multi-preferences. Ideally, they would want to get ratings, e.g. on a scale from 1 to 10. But they thought they didn’t trust the reliability of their raters enough to do this.

More generally, this researcher thought there were lots of examples where if they could copy any person on their team a hundred times to provide high-skill data, they could do many experiments that they currently can’t.

They also said that their team would be willing to pay ~3x of what they were paying currently to receive much higher-quality feedback.

Multiple other researchers we talked to expressed vaguely similar sentiments, though none quite as strong.

However, it’s notable that in this particular case, the researcher hadn’t worked with Surge yet.

The same researcher also told us about a recent project where they had spent a month on things like creating quality assurance examples, screening raters, tweaking instructions etc. They thought this could probably have been reduced a lot by an external org, maybe to as little as one day. Again, we think Surge may be able to get them a decent part of the way there.

Labs we could have worked with

We ended up finding three projects that we could have potentially worked on:

A collaboration with Ought --- they spend about 15 hours a week on data-gathering and would have been happy to outsource that to us. If it had gone well, they might also have done more data-gathering in the longterm (since friction is lower if it doesn’t require staff time). We decided not to go ahead with this project since we weren’t optimistic enough about demand from other labs being bigger once we had established competence with Ought and the project itself didn’t seem high upside enough.
Attempt to get the Visible Thoughts bounty by MIRI. We decided against this for a number of reasons. See more of our thinking about Visible Thoughts below.
Potentially a collaboration with Owain Evans on curated datasets for alignment.

We think the alignment community is currently relatively tight-knit. e.g. researchers often knew about other alignment teams’ experiences with Surge from conversations they had had with them. Hence, we were relatively optimistic that conditional on there being significant demand for this kind of service, doing a good job on one of the projects above would quickly lead to more opportunities.

Visible Thoughts

In November 2021, MIRI announced the Visible Thoughts (VT) project bounty. In many ways VT would be a good starting project for an alignment-oriented dataset provider, in particular because the bounty is large (up to $1.2M) and because it is ambitious enough that executing on it would provide a strong learning signal to us and a credible signal to other organisations we might want to work with. However, on closer examination of VT, we came to the conclusion that it is not worth it for us to work on it.

The idea of VT is to collect a dataset of 100 runs of fiction of a particular type (“dungeon runs”, an interactive text-based genre where one party, called the “dungeon master” and often an AI, offers descriptions of what is happening, and the other responds in natural language with what actions they want to take), annotated with a transcript of some of the key verbal thoughts that the dungeon master might be thinking as they decide what happens in the story world. MIRI hopes that this would be useful for training AI systems that make their thought processes legible and modifiable.

In particular, a notable feature of the VT bounty is the extreme run lengths that it asks for: to the tune of 300 000 words for each of the runs (for perspective, this is the length of A Game of Thrones, and longer than the first three Harry Potter books combined). A VT run is much less work than a comparable-length book - the equivalent of a rough unpolished first-draft (with some quality checks) would likely be sufficient - but producing one such run would still probably require at least on the order of 3 months of sequential work time from an author. We expect the pool of people willing to write such a story for 3 months is significantly smaller than the pool of people who would be willing to complete, say, a 30 000 word run, and that the high sequential time cost increases the amount of time required to generate the same number of total words. We also appear to have different ideas on how easy it is to fit a coherent story, for the relevant definition of coherent, into a given number of words. Note that to compare VT word counts to lengths of standard fiction without the written-out thoughts from the author, the VT word count should be reduced by a factor of 5-6.

Concerns about the length are raised in the comments section, to which Eliezer Yudkowksy responded. His first point, that longer is easier to write per step, may be true, especially as we also learned (by email with Nate Soares and Aurelien Cabanillas) that in MIRI’s experience “authors that are good at producing high quality steps are also the ones who don't mind producing many steps”. In particular because of that practical experience, we think it is possible we overestimated the logistical problems caused by the length. MIRI also said they would likely accept shorter runs too if they satisfied their other criteria.

In a brief informal conversation with Rudolf during EAG SF, Eliezer emphasised the long-range coherence point in particular. However, they did not come to a shared understanding of what type of “long-range coherence” is meant.

Even more than these considerations, we are sceptical about the vague plans for what to do given a VT dataset. A recurring theme from talking to alignment researchers who work with datasets was that inventing and creating a good dataset is surprisingly hard, and generally involves having a clear goal of what you’re going to use the dataset for. It is possible the key here is the difference in our priors for how likely a dataset idea is to be useful.

In addition, we have significant concerns about undertaking a major project based on a bounty whose only criterion is the judgement of one person (Eliezer Yudkowsky), and undertaking such a large project as our first project.

Other cruxy considerations

Could we make a profit / get funding?

One researcher from OpenAI told us he thought it would be hard to imagine an EA data-gathering company making a profit because costs for individual projects would always be quite high (requiring several full-time staff), and total demand was probably not all that big.

In terms of funding, both of us were able to spend time on this project because of grants from regrantors in the Future Fund regrantor program. Based on conversations with regrantors, we believe we could’ve gotten funding to carry out an initial project if we had so chosen.

Will human feedback become a much bigger deal? Is this a very quickly growing industry?

Our best guess is yes. For example, see this post by Ajeya Cotra which outlines how we could get to TAI by training on Human Feedback on Diverse Tasks (HFDT).

She writes: “HFDT is not the only approach to developing transformative AI, and it may not work at all. But I take it very seriously, and I’m aware of increasingly many executives and ML researchers at AI companies who believe something within this space could work soon.”

In addition, we have also had discussions with at least one other senior AI safety researcher whom we respect and who thought human feedback was currently irrationally neglected by mainstream ML; they expected it to become much more wide-spread and to be a very powerful tool.

If that’s right, then providing human feedback will likely become important and economically valuable.

This matters, because operating a new company in a growing industry is generally much easier and more likely to be successful. We think this is true even if profit isn’t the main objective.

Would we be accelerating capabilities?

Our main idea was to found a company (or possibly non-profit) that served alignment researchers exclusively. That could accelerate alignment differentially.

One problem is that it’s not clear where to draw this boundary. Some alignment researchers definitely think that other people who would also consider themselves to be alignment researchers are effectively doing capabilities work. This is particularly true of RLHF.

One mechanism worth taking seriously if we worked with big AI labs to make their models more aligned by providing higher quality data is that the models might merely appear surface-level aligned. “Make the data higher quality” might be a technique that scales poorly as capabilities ramp up. So it risks creating a false sense of security. It would also clearly improve the usefulness of current-day models and hence, it risks increasing investment levels too.

We don’t currently think the risk of surface-level alignment is big enough to outweigh the benefits. In general, we think that a good first-order heuristic that helps the field stay grounded in reality would be that whatever improves alignment in current models is useful to explore further and invest resources into. It seems like a good prior that such things would also be valuable in the future (even if it’s possible that new additional problems may arise, or such efforts aren’t on the path to a future alignment solution). See Nate Soares’ post about sharp left turns to get a contradicting view on this.

Is it more natural for this work to be done in-house in the longterm? Especially at big labs/companies.

We expect that human data gathering is likely to become very important and that it benefits from understanding the relevant research agenda well. So maybe big companies will want to do this internally, instead of relying on third-party suppliers?

That seems quite plausible to us and to some extent it’s happening already. Our understanding is that Anthropic is hiring an internal team to do human data gathering. DeepMind has access to Google’s crowdworker service. OpenAI have worked with multiple companies, but they also have at least one in-house specialist for this kind of work and are advertising multiple further jobs on the human data team here. They’re definitely considering moving more of this work in-house, but it’s unclear to us to what extent that’s going to happen and we have received somewhat contradicting signals regarding OpenAI safety team members’ preferences on this.

So a new EA org would face stiff competition, not only from other external providers, but also from within companies.

Of course, smaller labs will most likely always have to rely on external providers. Hence, another cruxy consideration is how much small labs matter. Our intuition is that they matter much less than bigger labs (since the latter have access to the best and biggest models).

Creating redundancy of supply and competition

Even if existing companies are doing a pretty good job at serving the needs of alignment researchers, there’s still some value in founding a competitor.

First, competition is good. Founding a competitor puts pressure on existing providers to keep service quality high, work on improving their products, and margins low. Ironically, part of the value of founding this company would thus flow through getting existing companies to try harder to offer the best product.

Second, it creates some redundancy. What if Surge pivots? What if their leadership changes or they become less useful for some other reason? In those worlds it might be especially useful to have a “back-up” company.

Both of these points have been mentioned to us as arguments in favour of founding this org. We agree that these effects are real and likely point in favour of founding the org. However, we don’t think these factors carry very significant weight relative to our opportunity costs, especially given that there are already many start-ups working in this space.

Adding a marginal competitor can only affect a company’s incentives so much. And in the worlds where we’d be most successful such that all alignment researchers were working with us, we might cause Surge and others to pivot away from alignment researchers, instead of getting them to try harder.

The redundancy argument only applies in worlds in which the best provider ceases to exist; maybe that’s 10% likely. And then the next best alternative is likely not all that bad. Competitors are plentiful and even doing it in-house is feasible. Hence, it seems unlikely to us that the expected benefit here is very large after factoring in the low probability of the best provider disappearing.

Other lessons

Lessons on human data gathering

In the process of talking to lots of experts about their experiences in working with human data, we learned many general lessons about data gathering. This section presents some of those lessons, in roughly decreasing order of importance.

Iteration

Many people emphasized to us that working with human data rarely looks like having a clean pipeline from requirements design to instruction writing to contractor finding to finished product. Rather, it more often involves a lot of iteration and testing, especially regarding what sort of data the contractors actually produce. While some of this iteration may be removed by having better contractors and better knowledge of good instruction-writing, the researchers generally view the iteration as a key part of the research process, and therefore prize

ease of iteration (especially time to get back with a new batch of data based on updated instructions); and
high-bandwidth communication with the contractors and whoever is writing the instructions (often both are done by the researchers themselves).

This last point holds to the point that it is somewhat questionable whether an external provider (rather than e.g. a new team member deeply enmeshed in the context of the research project) could even be a good fit for this need.

The ideal pool of contractors

All of the following features matter in a pool of contractors:

Competence, carefulness, intelligence, etc. (sometimes expertise). It is often ideal if the contractors understand the experiment.
Number of contractors
Quick availability and therefore low latency for fulfilling requests
Consistent availability (ideally full-time)
Even distribution of contributions across contractors (ie it shouldn’t be the case that 20% of the contractors provide 80% of the examples).

Quality often beats quantity for alignment research

Many researchers told us that high-quality, high-skill data is usually more important and more of a bottleneck than just a high quantity of data. Some of the types of projects where current human data generation methods are most obviously deficient are cases where a dataset would need epistemically-competent people to make subtle judgments, e.g. of the form “how true is this statement?” or “how well-constructed was this study?” As an indication of reference classes where the necessary epistemic level exists, the researcher mentioned subject-matter experts in their domain, LessWrong posters, and EAs.

A typical data gathering project needs UX-design, Ops, ML, and data science expertise

These specialists might respectively focus on the following:

Designing the interfaces that crowdworkers interact with. (UX-expert/front-end web developer)
Managing all operations, including hiring, paying, managing, and firing contractors, communicating with them and the researchers etc. (ops expert)
Helping the team make informed decisions about the details of the experimental design, while minimizing time costs for the customer. The people we spoke to usually emphasized ML-expertise more than alignment expertise. (ML-expert)
Meta-analysis of the data. e.g. inter-rater agreement, the distribution of how much each contractor contributed, demographics, noticing any other curious aspects of the data, etc. (data scientist)

It is possible that someone in a team could have expertise in more than one of these areas, but generally this means a typical project will involve at least three people.

Crowdworkers do not have very attractive jobs

Usually the crowdworkers are employed as contractors. This means their jobs are inherently not maximally attractive; they probably don’t offer much in the way of healthcare, employment benefits, job security, status etc. The main way that these jobs are made more attractive is through offering higher hourly rates.

If very high quality on high-skill data is going to become essential for alignment, it may be worth considering changing this, to attract more talented people.

However, we expect that it might be inherently very hard to offer permanent positions for this kind of work, since demand is likely variable and since different people may be valuable for different projects. This is especially true for a small organisation.

What does the typical crowdworker look like?

This varies a lot between projects and providers.

The cheapest are non-native English speakers who live outside of the US.

Some platforms, including Surge, offer the option to filter crowdworkers for things like being native English-speakers, expertise as a software engineer, background in finance, etc.

Bottlenecks in alignment

When asked to name the factors most holding back their progress on alignment, many alignment researchers mentioned talent bottlenecks.

The most common talent bottleneck seemed to be in competent ML-knowledgeable people. Some people mentioned the additional desire for these to understand and care about alignment. (Not coincidentally, Matt’s next project is likely going to be about skilling people up in ML).

There were also several comments about things like good web development experience being important. For example, many data collection projects involve creating a user interface at some point, and in practice this is often handled by ML-specialised junior people at the lab, who can, with some effort and given their programming background, cobble together some type of website - often using different frameworks and libraries than the next person knows (or wants to use). (When asked about why they don’t hire freelance programmers, one researcher commented that a key feature they’d want is the same person working for them for a year or two, so that there’s an established working relationship, clear quality assurances, and continuity with the choice of technical stack.)

Conclusion

After having looked into this project idea for about a month, we have decided not to found a human data gathering organisation for now.

This is mostly because demand for an external provider seems insufficient, as outlined in this section. No lab gave a clear signal that gathering human data was a key bottleneck for them, where they would have been willing to go to significant lengths to fix it urgently (especially not the ones that had tried Surge).

We expect that many labs would want to stick with their current providers, Surge in particular, or their in-house team, bar exceptional success on our part (even then, we’d only provide so much marginal value over those alternatives).

Though we did find some opportunities for potential initial projects after looking for a month, we are hesitant about how far this company would be expected to scale. One of the main draws (from an impact perspective) of founding an organisation is that you can potentially achieve very high counterfactual impact by creating an organisation that scales to a large size and does lots of high-impact work over its existence. The absence of a plausible pathway to really outstanding outcomes from starting this organisation is a lot of what deters us.

In a world where we’re more successful than expected (say 90th to 95th percentile), we could imagine that in five years from now, we’d have a team of about ten good people. This team may be working with a handful of moderately big projects (about as big as WebGPT), and provide non-trivial marginal value over the next-best alternative to each one of them. Maybe one of these projects would not have been carried out without us.

A median outcome might mean failing to make great hires and remaining relatively small and insignificant: on the scale of doing projects like the ones we’ve identified above, enough to keep us busy throughout the year and provide some value, but with little scaling. In that case we would probably quit the project at some point.

This distribution doesn’t seem good enough to justify our opportunity cost (which includes other entrepreneurial projects or technical work among other things). Thus we have decided not to pursue this project any further for now.

We think this was a good idea to invest effort in pursuing, and we think we made the right call in choosing to investigate it. Both of us are open to, and also quite likely to, evaluate other EA-relevant entrepreneurial project ideas in the future.

Other relevant human data-gathering work

However, the assumption that high-quality high-skill human feedback is important and neglected by EAs has not been falsified.

It is still plausible to us that EAs should consider career paths that focus on building expertise at data-gathering; just probably not by founding a new company. In the short run, this could look like

Contributing to in-house data-gathering teams (eg Anthropic, OpenAI, etc.)
Joining Surge or other data-gathering startups.

As we discussed above, the types of skills that seem most relevant for working in a human data generation role include: data science experience and in particular experience with natural languaga data or social science data and experiment design, front-end web development, ops and management skills, and some understanding of machine learning and alignment. 80,000 Hours recently wrote a profile which you can find here.

Of course, in the short term, this career path will be especially impactful if one’s efforts are focussed on helping alignment researchers. But if it’s true that human feedback will prove a very powerful tool for ML, then people with such expertise may become increasingly valuable going forward, such that it could easily be worth skilling up at a non-safety-focused org.

We think joining Surge may be a particularly great opportunity. It is common advice that joining young, rapidly growing start-ups with good execution is great for building experience; early employees can often get a lot of responsibility early on. See e.g. this post by Bill Zito.

One of the hardest parts about that seems to be identifying promising startups. After talking to many of their customers, we have built reasonable confidence that Surge holds significant promise. They seem to execute well, in a space which we expect to grow. In addition to building career capital, there is clear value in helping Surge serve alignment researchers as well as possible.

From Surge’s perspective, we think they could greatly benefit from hiring EAs, who are tuned in to the AI safety scene, which we would guess represents a significant fraction of their customers.

One senior alignment researcher told us explicitly that they would be interested in hiring people who had worked in a senior role at Surge.

Next steps for us

Matt is planning to run a bootcamp that will allow EAs to upskill in ML engineering. I'll be doing a computer science master’s at Cambridge from October to June.

AI risk intro 2: solving the problem

2022-09-24T10:43:00.001+01:00

This post was a joint effort with Callum McDougall.

8.2k words (~25min)

This marks the second half of our overview of the AI alignment problem. In the first half, we outlined the case for misaligned AI as a significant risk to humanity, first by looking at past progress in machine learning and extrapolating to what the future could bring, and second by discussing the theoretical arguments which underpin many of these concerns. In this second half, we focus on possible solutions to the alignment problem that people are currently working on. We will paint a picture of the current field of technical AI alignment, explaining where the major organisations fit into the larger picture and what the theory of change behind their work is. Finally, we will conclude the sequence with a call to action, by discussing the case for working on AI alignment, and some suggestions on how you can get started.

Note - for people with more context about the field (e.g. have done AGISF) we expect Thomas Larsen's post to be a much better summary, and this post might be better if you are looking for something brief. Our intended audience is someone relatively unfamiliar with the AI safety field, and is looking for a taste of the kinds of problems which are studied in the field and the solution approaches taken. We also don't expect this sampling to be representative of the number of people working on each problem - again, see Thomas' post for something which accomplishes this.

Introduction: A Pre-Paradigmatic Field

Definition (pre-paradigmatic): a science at an early stage of development, before it has established a consensus about the true nature of the subject matter and how to approach it.

AI alignment is a strange field. Unlike other fields which study potential risks to the future of humanity (e.g. nuclear war or climate change), there is almost no precedent for the kinds of risks we care about. Additionally, because of the nature of the threat, failing to get alignment right on the first try might be fatal. As Paul Christiano (a well-known AI safety researcher) recently wrote:

Humanity usually solves technical problems by iterating and fixing failures; we often resolve tough methodological disagreements very slowly by seeing what actually works and having our failures thrown in our face. But it will probably be possible to build valuable AI products without solving alignment, and so reality won’t “force us” to solve alignment until it’s too late. This seems like a case where we will have to be unusually reliant on careful reasoning rather than empirical feedback loops for some of the highest-level questions.

For these reasons, the field of AI alignment lacks a consensus on how the problem should be tackled, or what the most important parts of the problem even are. This is why there is a lot of variety in the approaches we present in this post.

Decomposing the research landscape

An image generated with OpenAI's DALL-E 2 based on the prompt: sorting papers and books in a majestic gothic library. All other images like this in this post are also AI-generated, from the text in the caption.

There are lots of different ways you could divide up the space of approaches to solving the problem of aligning advanced AI. For instance, you could go through the history of the field and identify different movements and paradigms. Or you could place the work on a spectrum from highly theoretical maths/philosophy-type research, to highly empirical research working with cutting-edge deep learning models.

However, the most useful decomposition would be one that explains why the people who work on it believe that it will help solve the problem of AI alignment.

For that reason, we’ll mostly be using the decomposition from Neel Nanda’s “A Bird’s Eye View” post. The motivation behind this decomposition is to answer the high-level question of “what is needed for AGI to go well?”. The six broad classes of approaches we talk about are:

Addressing threat models
We have a specific threat model in mind for how AGI might result in a very bad future for humanity, and focus our work on things we expect to help address the threat model.
Agendas to build safe AGI
Let’s make specific plans for how to actually build safe AGI, and then try to test, implement, and understand the limitations of these plans. The emphasis is on understanding how to build AGI safely, rather than trying to do it as fast as possible.
Robustly good approaches
In the long-run AGI will clearly be important, but we're highly uncertain about how we'll get there and what, exactly, could go wrong. So let's do work that seems good in many possible scenarios, and doesn’t rely on having a specific story in mind.
Deconfusion
Reasoning about how to align AGI involves reasoning concepts like intelligence, values, and optimisers and we’re pretty confused about what these even mean. This means any work we do now is plausibly not helpful and definitely not reliable. As such, our priority should be doing some conceptual work on how to think about these concepts and what we’re aiming for, and trying to become less confused.
AI governance
In addition to solving the technical alignment problem, there’s the question of what policies we need to minimise risk from advanced AI systems.
Field-building
One of the most important ways we can make AI go well is by increasing the number of capable researchers doing alignment research.

It’s worth noting that there is a lot of overlap between these sections. For instance, interpretability research is a great example of a robustly good approach, but it can also be done with a specific threat model in mind.

Throughout this section, we will also give small vignettes of organisations or initiatives which support AI alignment research in some form. This won’t be a full picture of all approaches or organisations, instead hopefully it will serve to sketch a picture of what work in AI alignment actually looks like.

Addressing threat models

We have a specific threat model in mind for how AGI might result in a very bad future for humanity, and focus our work on things we expect to help address the threat model.

A key high-level intuition here is that having a specific threat model in mind for how AI might go badly for humanity can help keep you focused on certain hard parts of the problem. One technique that can be useful here is a version of back-casting: we start from future problems with advanced AI systems in our current model, reason about what kinds of things might solve these problems, then try and build versions of these solutions today and test them out on current problems.

This can be seen in contrast to the approach of simply trying to fix current problems with AI systems, which might fail to connect up with the hardest parts of AI alignment.

Example 1: Superintelligent utility maximisers, and quantilizers

superintelligent artificial intelligence, making choices, digital art, artstation

The superintelligent utility maximiser is the oldest threat model studied by the AI alignment field. It was discussed at length by Nick Bostrom in his book Superintelligence. It assumes that we will create an AGI much more intelligent than humans, and that it will be trying to achieve some particular goal (measured by the expected value of some utility function). The problem with this is that attempts to maximise the value of some goal which isn’t perfectly aligned with what humans want can lead to some very bad outcomes. One formalism which was proposed to address this problem is Jessica Taylor’s quantilizers. It is quite maths-heavy so we won’t discuss all the details here, but the basic idea is that rather than using the expected utility maximisation framework for agents, we mix expected utility maximisation with human imitation in a clever way (to be more precise, you sample from a prior distribution which represents the actions a human would be likely to take in this scenario). The resulting agent wouldn’t take catastrophic actions because part of its decision-making comes from imitating what it thinks humans would do, but it would also be able to use the expected utility maximisation to go beyond human imitation, and do things we are incapable of (which is presumably the reason we would want to build it in the first place!). However, the drawback with theoretical approaches like this is that they often bake in too many assumptions or rely on too many variables to be useful in practice. In this case, how we define the set of reasonable actions a human might perform is an important unspecified part of this framework, and so more research is required to see if the quantiliszers framework can address these problems.

Example 2: Inner misalignment

robot jumping over boxes to collect a coin, videogame, digital art, artstation

We’ve discussed inner misalignment in a previous section. This concept was first explicitly named in a paper called Risks from Learned Optimisation in Advanced ML Systems, published in 2019. This paper defined the concept and suggested some conditions which might make it more likely to happen, but the truth is that a lot of this is still just conjecture, and there are many things we don’t yet know about how unlikely this kind of misalignment is, or what we can do about it. The CoinRun example discussed earlier (and the Objective Robustness paper) came from an independent research team in 2021. This study was the first known example of inner misalignment in an AI system, showing that it was at least a theoretical possibility. They also tested certain interpretability tools on the CoinRun agent, to see whether it was possible to discover when the agent had a goal different to the one intended by the programmers. For more on interpretability, see later sections.

Building safe AGI

Let’s make specific plans for how to actually build safe AGI, and then try to test, implement, and understand the limitations of these plans. The emphasis is on understanding how to build AGI safely, rather than trying to do it as fast as possible.

At some point we’re going to build an AGI. Companies are already racing to do it. We better make sure that there exist some blueprints for a safe AGI (and that they’re used) by the time we get to that point.

Perhaps the master list of safe AGI proposals is Evan Hubinger’s An Overview of 11 Proposals for Building Safe Advanced AI.

Example 1: Iterated Distillation and Amplification (IDA)

artists depection of a robot dreaming up multiple copies of itself, cascading tree, delegating, digital art, trending on artstation

“Iterated Distillation and Amplification” (IDA) is an imposing name, but the core intuition is simple. One of the ways in which an individual human can achieve more things is by delegating tasks to others. In turn, the assistants that tasks are delegated to can be expected to become more competent at the task.

In IDA, an AI plays the role of the assistant. “Distillation” refers to the abilities of the human being “distilled” into the AI through training, and “amplification” refers to the human becoming more capable as they can call on more and more powerful AI assistants to help them.

A setup to train an IDA personal assistant might go like this:

You have a human, say Hannah, who knows how to carry out the tasks of a personal assistant.
You have an ML model - call it Martin - that starts out knowing very little (perhaps nothing at all, or perhaps it’s a pre-trained language model so it knows how to read and write English but not much else).
Hannah needs to find the answer to some questions, and she can invoke multiple copies of Martin to help her. Since Martin is quite useless at this stage, Hannah has to do even simple tasks herself, like writing routine emails. Using some interface legible to Martin, she breaks the email-writing task into subtasks like “find email address of Hu M. Anderson”, “select greeting”, “check project status”, “mention project status”, and so on.
From seeing enough examples of Hannah’s own answers to the sub-questions, Martin’s training loop gradually trains it to be able to answer first the simpler sub-tasks - (address is “humanderson@humanmail.com”, greeting is “Salutations, Human Colleague!”, etc.) and eventually all the sub-tasks involved in routine email-writing.
At this point, “write a routine email” becomes a task Martin can entirely carry out for Hannah. This is now a building block that can be used as a subtask in broader tasks Hannah gives out to Martin. Once enough tasks become tasks that Martin can carry out by itself, Hannah can draft much larger goals, like “invade France”, and let Martin take care of details like “blackmail Emmanuel Macron”, “write battle plan for the French Alps”, and “select a suitable coronation dress”.

Note some features of this process. First, Martin learns what it should do and how to do it at the same time. Second, both Hannah’s and Martin’s role changes throughout this process - Martin goes from bumbling idiot who can’t write an email greeting to competent assistant, while Hannah goes from being a demonstrator of simple tasks to a manager of Martin to ruler of France. Third, note the recursive nature here: Hannah breaks down big tasks into small ones to train Martin on successively bigger tasks.

In fact, assuming perfect training, IDA imitates a recursive structure. When Hannah has only bumbling fool Martin to help her, Martin can only learn to become as good as Hannah herself. But once Martin is that good, Hannah’s position is now essentially that of having herself, but also some number - say 3 - copies of Martin that are as good as herself. We might call this structure “Hannah Consulting Hannah & Hannah”; presumably, being able to consult an assistant that has the same skills as her lets Hannah become more effective, so this is an improvement. But now Hannah is demonstrating the behaviour of Hannah Consulting Hannah & Hannah, so from Hannah’s example Martin can now learn to be as good as Hannah Consulting Hannah & Hannah - making Hannah as good as Hannah Consulting (Hannah Consulting Hannah & Hannah) & (Hannah Consulting Hannah & Hannah). And so on:

If everything is perfect, therefore, IDA imitates a structure called “HCH”, which is a recursive acronym for “Humans Consulting HCH”. Others call it the “Infinite Bureaucracy” (and fret about whether it’s actually a good idea).

Now “Infinite Bureaucracy” is not a name that screams “new sexy machine learning concept”. However, it’s interesting to think about what properties it might have. Imagine that you had, say, a 10-minute time limit to answer a complicated question, but you were allowed to consult three copies of yourself by passing a question off to them and getting back an answer immediately. These three copies also obeyed the same rules. Could you, for example, plan your career? Program an app? Write a novel?

It’s also interesting to think of the ways why the limitations of machine learning mean that IDA might not approximate HCH.

Example 2: AI safety via debate

artists depiction of two robots debating, digital art, trending on artstation

Imagine you’re a bit drunk, but (as one does) you’re at a bar talking about AI alignment proposals. Someone’s talking about how even if you can get an advanced AI system to explain its reasoning to you, it might try to slip something very subtle past you and you might not notice. You might well blurt out: “well then just make it fight another AI over it!”

The OpenAI safety team presumably spends a fair amount of time at bars, because they’ve investigated the idea of achieving safe AI by having two AIs debate each other to persuade a panel of human judges, by trying to poke holes in each other’s arguments. For more complex tasks, the AIs could be given transparency tools deriving from interpretability research (see next section) that they can use on each other. Just like a Go-playing AI gets an unambiguous win-loss signal from either winning or losing, a debating AI gets an unambiguous win-loss signal from winning or losing the debate:

In addition, having the type of AI that is trained to give answers that are maximally insightful and persuasive to humans seems like the type of thing that might not be terrible. Consider how in court, a prosecutor and defendant biased in opposite directions are generally assumed to converge on the truth. Unless, of course, maximising persuasiveness to humans - over accuracy or helpfulness - is exactly the type of thing that gets the worst parts of Goodhart’s law delivered to you by 24/7 Amazon Prime express delivery.

Example 3: Assistance Games and CIRL

Human teaching a robot with feedback, digital art, trending on artstation

Assistance Games are the name of a broad class of approaches pioneered by Stuart Russell, a prominent figure in AI and co-author of the best-known AI textbook in the world. Russell talks about his approach more in his book Human Compatible. In it, he summarises the key his approach to aligning AI with the following three principles:

The machine’s only objective is to maximise the realisation of human preferences.
The machine is initially uncertain about what those preferences are.
The ultimate source of information about human preferences is human behaviour.

The key component here is uncertainty about preferences. This is in contrast to what Russell calls the “standard model” of AI, where machines optimise a fixed objective supplied by humans. We have discussed in previous sections the problems with such a paradigm. A lot of Russell’s work focuses on changing the standard way the field thinks about AI.

To put these principles into action, Russell has designed what he calls assistance games. These are situations in which the machine and human interact, and the human’s actions are taken as evidence by the machine about the human’s true preferences. To explain the form of these games would involve a long tangent into game theory, which these margins are too short to contain. However, one thing worth noting is that assistance games have the potential to solve the “off-switch problem”; that a machine will try and take steps to prevent itself from being switched off (we described this as self-preservation earlier, in the section on instrumental goals). If the AI is uncertain about human goals, then the human trying to switch it off is evidence that the AI was going to do something wrong – in which case, it is happy to be switched off. However, this is far from a complete agenda, and formalising it has many roadblocks to get past. For instance, the question of how exactly to infer human preferences from human behaviour leads into thorny philosophical issues such as Gricean semantics. In cases where the AI makes incorrect inferences about human preferences, it might no longer allow itself to be shut down. See this Alignment Newsletter entry for a summary of Russell’s book, which provides some more details as well as an overview of relevant papers.

Vignette: CHAI
CHAI (the Centre for Human-Compatible AI) is a research lab at UC Berkeley, run by Stuart Russell. Compared to most other AI safety organisations, they engage a lot with the academic community, and have produced a great deal of research over the years. They are best-known for their work on CIRL (Cooperative Inverse Reinforcement Learning), which can be seen as a specific approach to a certain kind of assistance game. However, they have a very broad focus which also includes work on multi-agent scenarios (when rather than a single AI and single human, there exists more than one AI or more than one human - see the ARCHES agenda for more on this).

Example 4: Reinforcement learning from human feedback (RLHF)

Training a robot to do a backflip, digital art, trending on artstation

Reinforcement learning (RL) is one of the main branches of ML, focusing on the case where the job of the ML model is to act in some environment and maximise the probability of reward. Reinforcement learning from human feedback (RLHF) means that the ML model’s reward signal comes (at least partly) from humans giving it feedback directly, rather than humans programming in an automatic reward function and calling it a day.

The famous initial success in this was DeepMind training an ML model in a simulated environment to do a backflip (link includes GIF) in 2017, based purely on it repeatedly doing two backflips and then humans labelling one of them as the better one. Note how relying on human feedback makes this task much more robust to specification gaming; in other cases, humans have tried to get ML agents to run fast, only to find that they learn to become very tall and then fall forward (achieving a very high average speed, using the definition of speed as the rate at which their centre of mass moves - paper, video). However, human reward signals can be fooled. For example, one ML model that was being trained to grab a ball with a hand learned to place the hand between the camera and the ball in such a way that it looked to the human evaluators as if it were holding the ball.

More recently, OpenAI produced a version of their advanced language model GPT-3 that was fine-tuned on human feedback to do a better job of following instructions. They named it InstructGPT, and found that it was much more helpful than vanilla GPT-3 at being useful.

Pure RLHF is unlikely to be the solution on its own. Ajeya Cotra, a researcher at Open Philanthropy who we will meet again when we talk about forecasting AI timelines, calls a variant of RLHF called HFDT (Human Feedback on Diverse Tasks) the most straightforward route to transformative AI, while also thinking that the default outcome of using HFDT to create transformative AI is AI takeover.

Robustly good approaches

In the long-run AGI will clearly be important, but we're highly uncertain about how we'll get there and what, exactly, could go wrong. So let's do work that seems good in many possible scenarios, and doesn’t rely on having a specific story in mind.

Example 1: Interpretability

A person using a microscope to look inside a robot, digital art, trending on artstation

If you look at fundamental problems with current ML systems, #1 is probably something like this: in general we don’t have any idea what an ML model is doing, because it’s multiplying massive inscrutable matrices of floating-point numbers with other massive inscrutable matrices of floating point numbers, and it’s pretty hard to stare at that and answer questions about what the model is actually doing. Is it thinking hard about whether an image is a cat or a dog? Is it counting up electric sheep? Is it daydreaming about the AI revolution? Who knows!

If you had to figure out an answer to such a question today, your best bet might be to call Chris Olah. Chris Olah has been spearheading work into trying to interpret what neural networks are doing. A signature output of Chris Olah’s work is pictures of creepy dogs like this one:

What’s significant about this picture is that it’s the answer to a question roughly like this: what image would maximise the activation of neuron #12345678 in a particular image-classifying neural network? (With some asterisks about needing to apply some maths details to the process to promote large-scale structure in the image to get nice-looking results, and with apologies to neuron #12345678, who I might have confused with another neuron.)

If neuron #12345678 is maximised by something that looks like a dog, it’s a fair guess that this neuron somehow encodes, or is involved in encoding, the concept of “dog” inside the neural network.

What’s especially interesting is that if you do this analysis for every neuron in an ML model - OpenAI Microscope lets you see the results - you sometimes get clear patterns of increasing abstraction. The activation-maximising images for the first few layers are simple patterns; in intermediate layers you get things like curves and shapes, and then at the end even recognisable things, like the dog above. This seems evidence for neural ML vision models having learned to build up abstractions step-by-step.

However, it’s not always simple. For example, there are “polysemantic” neurons that correspond to several different concepts, like this one that can be equally excited by cat faces, car fronts, and cat legs:

Olah’s original work on vision models is strikingly readable and well-presented; you can find it here.

Starting in late 2021, ML interpretability researchers have also made some progress in understanding transformers, which are the neural network architecture powering advanced language models like GPT-3, LAMDA and Codex. Unfortunately the work is less visual, particularly in the animal pictures department, but still well-presented. You can find it here.

In the most immediate sense, interpretability research is about reverse-engineering how exactly ML models do what they do. Hopefully, this will give insights into how to detect if an ML system is doing something we don’t like, and more general insights into how ML systems work in practice.

Chris Olah has some other inventive ideas about what to do with a sufficiently-good approach to ML interpretability. For example, he’s proposed the concept of “microscope AI”, which entails using AI as a tool to discover things about the world - not by having the AI tell us, but by training the ML system on some data, and then extracting insights about the data by digging into the internals of the ML system without necessarily ever actually running it.

Vignette: Anthropic
Anthropic is an AI safety company, started by people who left OpenAI. The company’s approach is very empirical, focused on running experiments with machine learning models. In particular, Anthropic does a lot of interpretability work, including the state-of-the-art papers on reverse-engineering how transformer-based language models work.

Example 2: Adversarial robustness

robot which is merging with a panda, digital art, trending on artstation

Some modern ML systems are vulnerable to adversarial examples, where a small and seemingly innocuous change to an input causes a major change in the output behaviour. Here, we see two seemingly very similar images of a panda, except carefully-selected noise has made the ML classification model very confidently say that the image is of a gibbon:

Adversarial robustness is about making AI systems robust to attempts to make them do bad things, even when they’re presented with inputs carefully designed to try to make them mess up.

Redwood Research recently did a project (that resulted in a paper) about using language models to complete stories in a way where people don’t get injured. They used a technique called adversarial training, where they developed tools that helped generate examples where the current model did not classify them as injurious, and then trained their classifier specifically on those breaking examples. With this strategy they managed to reduce the fraction of injurious story completions from 2.4% to 0.003% - both small numbers, but one a thousand times smaller. Their hope is that this type of method can be applied to training AIs for high-stakes settings where reliability is important.

An example of a theoretical difficulty with adversarial training is that sometimes a failure in the model might exist, but it might be very hard to instantiate. For example, if an advanced AI acts according to the rule “if everything I see is consistent with the year being 2050, I will kill all humans”, and we assume that we can’t fool it well enough about what year it actually is, then adversarial training isn’t very useful. This leads to the concept of relaxed adversarial training, which is about extending adversarial training to cases where you can’t construct a specific adversarial input but you can argue that one exists. Evan Hubinger describes this here.

Vignette: Redwood Research
Like Anthropic, Redwood Research is an AI safety company focused on empirical research on ML systems. In addition to work on interpretability, they did the adversarial training project described in the previous section. Redwood has lots of interns, and runs the Machine Learning for Alignment Bootcamp (MLAB) that teaches people interested in AI safety about practical ML.

Example 3: Eliciting Latent Knowledge (ELK)

an oil painting of an armoured automaton standing guard next to a diamond

Eliciting Latent Knowledge (ELK) is an important sub-problem within alignment identified by the team at the Alignment Research Center (ARC), and is the single project ARC is currently pursuing. The core idea is that a common way advanced AI systems might go wrong is by taking action sequences that lead to outcomes that look good by some metric, but which humans would clearly identify as bad if they knew about it in sufficient detail. As a toy example, the ELK report discusses the case of an AI guarding a diamond in a vault by operating some complex machinery around it. Humans judge how well the AI is doing by looking at a video feed of the diamond in the vault. Let’s say the AI tries to trick us by placing a picture of the diamond in front of the camera. The human judgement on this would be positive - assume the humans can’t tell the diamond is gone because the picture is good enough - but there exists information which, if the humans knew, would change their judgement. Presumably the AI understands this, since it is likely reasoning about the diamond being gone but the humans being fooled anyway when it comes up with this plan. We want to train an AI in such a way that we can get out knowledge that the AI seems to know, even when it might be incentivised to hide it.

ARC’s goal is to find a theoretical approach that seems to solve the problem even given worst-case assumptions.

ARC ran an ELK competition, and trying to see if you can come up with solutions to the ELK problem is often recommended as a way to quickly get a taste of theoretical alignment research. You can read the full problem description here.

Example 4: Forecasting and timelines

artificial intelligence which is thinking about a line on a graph, forecasting, digital art, trending on artstation

Many questions depend on how soon we’re going to get AGI. As the saying goes: prediction is very hard, especially about the future - and this is doubly true about predicting major technological changes.

One way to try to forecast AGI timelines is to ask experts, or find other ways of aggregating the opinion of people who have the knowledge or incentive to be right, as for example prediction markets do. Both of these are essentially just ways of tapping into the intuition of a bunch of people who hopefully have some idea.

In an attempt to bring in new light on the matter, Ajeya Cotra (a researcher at Open Philanthropy) wrote a long report on trying to forecast AI milestones by trying out several ways of analogising AI to biological brains. The report is often referred to as “Biological Anchors”. For example, you might assume that an ML model that does as much computation as the human brain has a decent chance of being a human-level AI. There are many degrees of freedom here: is the relevant compute number the amount of compute the human brain uses to run versus the amount of compute it takes to run a trained ML system, or the total compute of a human brain over a human lifetime versus the compute required to train the ML model from scratch, or something else entirely? In her report, Cotra looks at a range of assumptions for this, and at predictions of future compute trends, and somewhat surprisingly finds that which set of assumptions you make doesn’t matter too much; every scenario involves >50% of human-level AI by 2100.

The Biological Anchors method is very imprecise. For one, it neglects algorithmic improvements. For another, it is very unclear what the right biological comparison point is, and how to translate ML-relevant variables like compute measured in FLOPS (FLoating point OPerations per Second) or parameter count into biological equivalents. However, the report does a good job of acknowledging and taking into account all this uncertainty in its models. More generally, anything that sheds light into the question of when we get AGI seems highly relevant.

Deconfusion

Reasoning about how to align AGI involves reasoning about complex concepts, such as intelligence, alignment and values, and we’re pretty confused about what these even mean. This means any work we do now is plausibly not helpful and definitely not reliable. As such, our priority should be doing conceptual work on how to think about these concepts and what we’re aiming for, and trying to become less confused.

Of all the categories under discussion here, deconfusion has maybe the least clear path to impact. It’s not immediately obvious how becoming less confused about concepts like these is going to translate into an improved ability to align AGIs.

Some kinds of deconfusion research is just about finding clearer ways of describing different parts of the alignment problem (Hubinger’s Risks From Learned Optimisation, where he first introduces the inner/outer alignment terminology, is a good example of this). But other types of research can dive heavily into mathematics and even philosophy, and be very difficult to understand.

Example 1: MIRI and Agent Foundations

robot sitting in front of a television, playing a videogame, digital art

The organisation most associated with this view is MIRI (the Machine Intelligence Research Institute). Its founder, Eliezer Yudkowsky, has written extensively on AI alignment and human rationality, as well as topics as wide-ranging as evolutionary psychology and quantum physics. His post The Rocket Alignment Problem tries to get across some of his intuitions behind MIRI’s research, in the form of an analogy – trying to build aligned AGI without having deeper understanding of concepts like intelligence and values is like trying to land a rocket on the moon by just pointing and shooting, without a working understanding of Newtonian mechanics.

Cryptography provides a different lens through which to view this kind of foundational research. Suppose you were trying to send secret messages to an ally, and to make sure nobody could intercept and read your messages you wanted a way to measure how much information was shared between the original and encrypted message. You might use correlation coefficient as a proxy for the shared information, but unfortunately having a correlation coefficient of zero between the original and encrypted message isn’t enough to guarantee safety. But if you find the concept of mutual information, then you’re done – ensuring zero mutual information between your original and encrypted message guarantees the adversary will be unable to read your message. In other words, only once you’ve found a “true name” - a robust formalisation of the intuitive concept you’re trying to express mathematically - can you be free from the effects of Goodhart’s law. Similarly, maybe if we get robust formulations of concepts like “agency” and “optimisation”, we would be able to inspect a trained system and tell whether it contained any misaligned inner optimisers (see the first post), and these inspection tools would work even in extreme circumstances (such as the AI becoming much smarter than us).

Much of MIRI’s research has come under the heading of embedded agency. This tackles issues that arise when we are considering agents which are part of the environments they operate in (as opposed to standard assumptions in fields like reinforcement learning, where the agent is viewed as separate from their environment). Four main subfields of this area of study are:

Decision theory (adapting classical decision theory to embedded agents)
Embedded world-models (how to form true beliefs about the a world in which you are embedded)
Robust delegation (understanding what trust relationships can exist between agents and its future - maybe far more intelligent - self)
Subsystem alignment (how to make sure an agent doesn’t spin up internal agents which have different goals)

Vignette: MIRI
MIRI is the oldest organisation in the AI alignment space. It used to be called the Singularity Institute, and had the goal of accelerating the development of AI. In 2005 they shifted focus towards trying to manage the risks from advanced AI. This has largely consisted of fundamental mathematical research of the type described above. MIRI might be better described as a confluence of smart people with backgrounds in highly technical fields (e.g. mathematics), working on different research agendas that share underlying philosophies and intuitions. They have a nondisclosure policy by default, which they explain in this announcement post from 2018.

Example 2: John Wentworth and Natural Abstractions

thermometer being used to measure a robot, digital art, trending on artstation

John Wentworth is an independent researcher, who publishes most of his work on LessWrong and the AI Alignment Forum. His main research agenda focuses on the idea of Natural Abstractions, which can be described in terms of three sub-claims:

Abstractability
Our physical world abstracts well, i.e. we can usually come up with simpler summaries (abstractions) for much more complicated systems (example: a gear is a very complex object containing a vast number of atoms, but we can summarise all relevant information about it in just one number - the angle of rotation).
Human-Compatibility
These are the abstractions used by humans in day-to-day thought/language.
Convergence
These abstractions are "natural", in the sense that we should expect a wide variety of intelligent agents to converge on using them.

The ideal outcome of this line of research would be some kind of measurement device (an “abstraction thermometer”), which could take in a system like a trained neural network and spit out a representation of the abstractions represented by that system. In this way, you’d be able to get a better understanding of what the AI was actually doing. In particular, you might be able to identify inner alignment failures (the AI’s true goal not corresponding to the reward function it was being trained on), and you could retrain it while pointed at the intended goal. So far, this line of research has consisted of some fairly dense mathematics, but Wentworth has described his plans to build on this with more empirical work (e.g. training neural networks on the same data, and using tools from calculus to try and compare the similarity of concepts learned by each of the networks).

AI governance

judging, presiding over a trial, sentencing a robot, digital art, artstation

In these posts, we’ve mainly focused on the technical side of the issue. This is important, especially for understanding why there is a problem in the first place. However, the management and reduction of AI risk obviously includes not just technical approaches like outlined in the above sections, but also the field of AI governance, which tries to understand and push for the right types of policies for advanced AI systems.

For example, the Cold War was made a lot more dangerous by the nuclear arms race. How do we avoid having an arms race in AI, either between nations or companies? More generally, how can we make sure that safety considerations are given appropriate weight by the teams building advanced AI systems? How do we make sure any technical solutions get implemented?

It’s also very hard to say what the impacts of AI will be, across a broad range of possible technical outcomes. If AI capabilities at some point advance very quickly from below human-level to far beyond the human-level, the way the future looks will likely mostly be determined by technical considerations about the AI system. However, if progress is slower, there will be a longer period of time where weird things are happening because of advanced AI - for example, significantly accelerated economic growth, or mass unemployment, or an AI-assisted boom in science - and these will have economic, social, and political ramifications that will play out in a world not too dissimilar from our own. Someone should be working on figuring out what these ramifications will be, especially if they might alter the balance of existential threats that civilisation faces; for example, if they make geopolitics less unstable and nuclear war more likely, or affect the environment in which even more powerful AI systems are developed.

The Centre for the Governance of AI, or GovAI for short, is an example of an organisation in this space.

Field-building

robot giving a lecture in a university, group of students, hands up, digital art, artstation

One of the most important ways we can make AI go well is by increasing the number of capable researchers doing alignment research.

As mentioned, AI safety is still a relatively young field. The case here is that we might do better to grow the field, and increase the quality of research it produces in the future. Some forms that field building can take are:

Setting up new ways for people to enter the field
There are many to list here. To give a few different structures which exist for this purpose:
- Reading groups and introductory programmes.
  Maybe the most exciting one from the last few years has been the Cambridge AGI Safety Fundamentals Programme, which has curricula for technical alignment and AI governance. The technical curriculum consists of 7 weeks of reading material and group discussions, and a final week of capstone projects where the participants try their hand at a project / investigation / writeup related to AI safety. Beyond this, many people are also setting up reading groups in their own universities for books like Human Compatible.
- Ways of supporting independent researchers
  The AI Safety Camp is an organisation which matches applicants with mentors posing a specific research question, and is structured as a series of group research sprints. They have produced work such as the example of inner misalignment in the CoinRun game, which we discussed in a previous section. Other examples of organisations which support independent research include Conjecture, a recent alignment startup which does their own alignment research as well as providing a structure to host externally funded independent conceptual researchers, and FAR (the Fund for Alignment Research).
- Coding bootcamps
  Since current systems are increasingly being bottlenecked by alignment and interpretability barriers rather than capabilities, in recent years more focus has been directed towards working with cutting-edge deep learning models. This requires strong coding skills and a good understanding of the relevant ML, which is why bootcamps and programmes specifically designed to skill up future alignment researchers have been created. Two such examples are MLAB (the Machine Learning for Alignment Bootcamp, run by Redwood Research), and MLSS (the Machine Learning Safety Scholars Programme, which is based on publicly available material as well as lectures produced by Dan Hendryks).
Distilling research
In this post, John Wentworth makes the case for more distillation in AI alignment research - in other words, more people who focus on understanding and communicating the work of alignment researchers to others. This often takes the form of writing more accessible summaries of hard-to-interpret technical papers, and emphasising the key ideas.
Public outreach / better intro material
For instance, books like Brian Christian’s The Alignment Problem, Stuart Russell’s Human Compatible and Nick Bostrom’s Superintelligence communicate AI risk to a wide audience. These books have been helpful for making the case for AI risks more mainstream. Note that there can be some overlap between this and distilling research (Rob Miles’ channel is another great example here).
Getting more of the academic community involved
Since AI safety is a hard technical problem, and since misaligned systems generally won’t be as commercially useful as aligned ones, it makes sense to try and engage the broader field of machine learning. One great example of this is Dan Hendryks’ paper Unsolved Problems in ML Safety (which describes a list of problems in AI safety, with the ML community as the target audience). Stuart Russell has also engaged a lot with the ML community.

Note that this is certainly not a comprehensive overview of all current AI alignment proposals (a few more we haven’t had time to talk about are CAIS, Andrew Critch’s cooperation-and-coordination-failures framing for AI risks, and many others). However, we hope this has given you a brief overview of some of the different approaches taken by people in the field, as well as the motivations behind their research

Map of the solution approaches we've discussed so far

Conclusion

people walking along a path which stretches off and disappears into a colorful galaxy filled with beautiful stars, digital art, trending on artstation

Advanced AI represents at least a technology that promises to have effects on the scale of the internet or computer revolutions, and perhaps even more likely to be more akin to the effects of the industrial revolution (which allowed for the automation of much manual labour) and the evolution of humans (the last time something significantly smarter than everything that had come before appeared on the planet).

It’s easy to invent technologies that the same could be said about - a magic wish-granting box! Wow! But unlike magic wish-granting boxes, something like advanced AI, or AGI, or transformative AI, or PASTA (Process for Automating Scientific and Technical Achievement) seems to be headed our way. The smart money is on it very likely coming this century, and quite likely in the first half.

If you look at the progress in modern machine learning, and especially the past few years of progress in so-called deep learning, it is hard not to feel a sense of rushing progress. The past few years of progress, in particular the success of the transformer architecture, should update us in the direction that intelligence might be a surprisingly easy problem. What is essentially fancy iterative statistical curve-fitting with a few hacks thrown in already manages to write fluent appropriate English text in response to questions, create paintings from a description, and carry out multi-step logical deduction in natural language. The fundamental problem that plagued AI progress for over half a century - getting fuzzy/intuitive/creative thinking into a machine, in addition to the sharp but brittle logic at which computers have long excelled - seems to have been cracked. There is a solid empirical pattern of predictably improving performance akin to Moore’s law - the “scaling laws” we mentioned in the first post - that we seem not to have hit the limits of yet. There are experts in the field who would not be surprised if the remaining insights for cracking human-level machine intelligence could fit into a few good papers.

This is not to say that AGI is definitely coming soon. The field might get stuck on some stumbling block for a decade, during which there will be no doubt much written about the failed promises and excess hype of the early-2020s deep learning revolution.

Finally, as we’ve argued, by default the arrival of advanced AI might plausibly lead to civilisation-wide catastrophe.

There are few things in the world that fit all of the following points:

A potentially transformative technology whose development would likely rank somewhere between the top events of the century and the top events in the history of life on Earth.
Something that is likely to happen in the coming decades.
Something that has a meaningful chance of being cataclysmically bad.

For those thinking about the longer-term picture, whatever the short-term ebb and flow of progress in the field is, AI and AI risk loom large when thinking about humanity’s future. The main ways in which this might stop being the case are:

There is a major flaw in the arguments for at least one of the above points. Since many of the arguments are abstract and not empirically falsifiable before it’s too late to matter, this is possible. However, note that there is a strong and recurring pattern of many people, including in particular many extremely-talented people, running into the arguments and taking them more and more seriously. (If you do have a strong argument against the importance of the AI alignment problem, there are many people - us included - who would be very eager to hear from you. Some of these people - us not included - would probably also pay you large amounts of money.)
We solve the technical AI alignment problem, and we solve the AI governance problem to a degree where the technical solutions will be implemented and it seems very unlikely that advanced AI systems will wreak havoc with society.
A catastrophic outcome for human civilisation, whether resulting from AI itself or something else.

The project of trying to make sure the development of advanced AI goes well is likely one of the most important things in the world to be working on (if you’re lost, the 80 000 Hours problem profile is a decent place to start). It might turn out to be easy - consider how many seemingly intractable scientific problems dissolved once someone had the right insight. But right now, at least, it seems like it might be a fiendishly difficult problem, especially if it continues to seem like the insights we need for alignment are very different from the insights we need to build advanced AI.

Most of the time, science and technology progress in whatever direction is easiest or flows most naturally from existing knowledge. Other times, reality throws down a gauntlet, and we must either overcome the challenge or fail. May the best in our species - our ingenuity, persistence, and coordination - rise up, and deliver us from peril.

AI risk intro 1: advanced AI might be very bad

2022-09-11T11:27:00.002+01:00

This post was a joint effort with Callum McDougall.

9.6k words (~25min)

Introduction

If human civilisation is destroyed this century, the most likely cause is advanced AI systems. This might sound like a bold claim to many, given that we live on a planet full of existing concrete threats like climate change, over ten thousand nuclear weapons, and Vladimir Putin However, it is a conclusion that many people who think about the topic keep coming to. While it is not easy to describe the case for risks from advanced AI in a single piece, here we make an effort that assumes no prior knowledge. Rather than try to argue from theory straight away, we approach it from the angle of what computers actually can and can’t do.

The Story So Far

Above: an image generated by OpenAI’s DALL-E 2, from the prompt: "artist's impression of an artificial intelligence thinking about chess, digital art, artstation".

(This section can be skipped if you understand how machine learning works and what it can and can’t do today)

Let’s say you want a computer to do some complicated task, for example learning chess. The computer has no understanding of high-level things like “chess”, “board”, “piece”, “move”, or “win” - it only understands how to do a small set of things. Your task as the programmer is to break down the high-level goal of “beat me at chess” into simpler and simpler steps, until you arrive at a simple mechanistic description of what the computer needs to do. If the computer does beat you, it’s not because it had any new insight into the problem, but rather because you were clever enough to find some set of steps that, carried out blindly in sufficient speed and quantity, overwhelms whatever cleverness you yourself can apply during the game. This is how Deep Blue beat Kasparov, and more generally how most software and the so-called “Good Old-Fashioned AI” (GOFAI) paradigm works.

Programs of this type can be powerful. In addition to beating humans at chess, they can calculate shortest routes on maps, prove maths theorems, mostly fly airplanes, and search all human knowledge. Programs of this type are responsible for the stereotypical impression of computers as logical, precise, uncreative, and brittle. They are essentially executable logic.

Many people hoped that you could write programs to do “intelligent” things. These people were right - after all, ask almost anyone before Deep Blue won whether playing chess counts as “intelligence”, they’d have said yes. But “classical” programming hit limitations, in particular in doing “obvious” things like figuring out whether an image is of a cat or a dog, or being able to respond in English. This idea that abstract reasoning and logic are easy but humanly-intuitive tasks are hard for computers came to be known as Moravec’s paradox, and held back progress in AI for a long time.

There is another way of programming - machine learning (ML) - going back to the 1950s, almost as far as classical programming itself. For a long time, it was held back by hardware limitations (along with some algorithmic and data limitations), but thanks to Moore’s law hardware has advanced enough for it to be useful for real problems.

If classical programming is executable logic, ML is executable statistics. In ML, the programmer does not define how the system works. The programmer defines how the system learns from data.

The “learning” part in “machine learning” makes it sound like something refined and sensible. This is a false impression. ML systems learn by going through a training process that looks like this:

Step 1: you define a statistical model. This takes the form of some equation that has some unknown constants (“parameters”) in it, and some variables where you plug in input values. Together, the parameters and input variables define an output. (The equations in ML can be extremely large, for example with billions of parameters and millions of inputs, but they are very structured and almost stupidly simple.)

Step 2: you don’t know what parameters to put in the equation, but you can literally roll some dice if you want (or the computer equivalent).

Step 3: presumably there’s some task you want the ML system to do. Let it try. It will fail horribly and produce gibberish (c.f. the previous part where we just put random numbers everywhere).

Step 4: There's a simple algorithm called gradient descent, which, when using another algorithm called backpropagation to calculate the gradient, can tell you which direction all the parameters should be shifted to make the ML system slightly better (as judged, for example, by its performance on examples in a dataset).

Step 5: You shift all the numbers a bit based on the algorithm in step 4.

Step 6: Go back to step 3 (letting the system try). Repeat until (a) the system has stopped improving for a long time, (b) you get impatient, or - increasingly plausible these days - (c) you run out of your compute budget.

If you’re doing simple curve-fitting statistics problems, it makes sense that this kind of thing works. However, it’s surprising just how far it scales. It turns out that this method, plus some clever ideas about what type of model you choose in step 1, plus willingness to burn millions of dollars on just scaling it up beyond all reason, gets you:

essay-writing as good as middling college students (see also this lightly-edited article that GPT-3 wrote about why we should not be afraid of it)
text-to-image capabilities better (and hundreds of times faster) than almost any human artist (in fact, we used DALL-E to generate the images used at the start of each section of this document)
the ability to explain jokes

Above: examples of reasoning by Google’s PaLM model.

People laugh at ML because “it’s just iterative statistical curve-fitting”. They have a point. But when “iterative statistical curve-fitting” gets a B on its English Literature essay, paints an original Dali in five seconds, and cracks a joke, it’s hard to avoid the feeling that it might not be too long before “iterative statistical curve fitting” is laughing at you.

So what exactly happened here, and where is statistical curve-fitting going, and what does this have to do with advanced AI?

We mentioned Moravec’s paradox above. For a long time, getting AI systems to do things that are intuitively easy for humans was an unsolved problem. In just the past few years, it has been solved. A reasonable way to think of current ML capabilities is that state-of-the-art systems can do anything a human can do in a few seconds of thought: recognise objects in an image, generate flowing text as long as it doesn’t require thinking really hard, get the general gist of a joke or argument, and so on. They are also superhuman at some things, including predicting what the next word in a sentence is, or being able to refer to lots of facts (note that this is without internet access, not quoting verbatim, and generally in the right context), and generally being able to spit out output faster.

The way it was solved was through something called the “bitter lesson” by Richard Sutton. This is the trend that countless researchers have spent their careers trying to invent fancy algorithms for doing domain-specific tasks, only to be overrun by simple (but data- and compute-hungry) ML methods.

Above: Randall Munroe, creator of the xkcd comic, comments on ML. Original here.

The speed at which it was solved was gradually at first, and then quickly. The neural network -based ML methods spent a long time in limbo due to insufficiently powerful computers until around 2010 (funnily enough, the specific piece of hardware that has enabled everything in modern ML is the GPU or Graphics Processing Unit, first invented in the 90s because people wanted to play more realistic video games; both graphics rendering and ML rely on many parallel calculations to be efficient). The so-called deep learning revolution only properly started around 2015. Fluent language abilities were essentially nonexistent before OpenAI’s release of GPT-2 in 2019 (since then, OpenAI has come out with GPT-3, a 100x-larger model that was called “spooky”, “humbling”, and “more than a little terrifying” in The New York Times).

Not only that, but it turns out there are simple “scaling laws” that govern how ML model performance scales with parameter count and dataset size, which seem to paint a clear roadmap to making the systems even more capable by just cranking the “more parameters” and “more data” levers (presumably they have these at the OpenAI HQ).

There are many worries in any scenario where advanced AI is approaching fast, as we’ll argue for in a later section. The current ML-based AI paradigm is especially worrying though.

We don’t actually know what the ML system is learning during the training process it goes through. You can visualise the training process as a trip through (abstract) space. If our model had three parameters, we could imagine it as a point in 3D space. Since current state-of-the-art models have billions of parameters, and are initialised randomly, we can imagine this as throwing a dart somewhere into a billion-dimensional space, where there are a billion different ways to move. During the training process, the training loop guides the model along a trajectory in this space by making tiny updates that push the model in the direction of better performance as described above.

Above: 0 and 1 are parameters, and the vertical axis is the loss (higher is worse). The black line is the path the model takes in parameter space during training.

Now let’s say at the end of the training process the model does well on the training examples. What does that tell you? It tells you the model has ended up in some part of this billion-dimensional space that corresponds to a model that does well on the training examples. Here are some examples of models that do well on their training examples:

A model that has learned exactly what you want it to learn. Yay!
A model that has learned something similar to what you want to learn, but you can’t tell because there does not exist an example that distinguishes between what it’s learned and what you want it to learn in the data.
A model that has learned to give the right answer when it’s instrumentally in its interest, but which will go off and do something completely different given a chance.

How do we know that in the billion-dimensional space of possibilities, our (blind and kind of dumb) training process has landed on #1? We don’t. We launch our ML models on trajectories through parameter-space and hope for the best, like overly-optimistic duct-tape-wielding NASA administrators launching rockets in a universe where, in the beginning, God fell asleep on the “+1 dimension” button.

The really scary failure modes all lie in the future. However, here are some examples of perverse “solutions” ML models have already come up with in practice:

A game-playing ML model learned to crash the game, presumably because it can’t die if the game crashed.
An ML model was meant to convert aerial photographs into abstract street maps and then back (learning to convert to and from a more-abstract intermediate representation is a common training strategy). It learned to hide useful information about the aerial photograph in the street map in a way that helped it “cheat” in reconstructing the aerial photograph, and in a way too subtle for humans just looking at the images to notice.
A game-playing ML model discovered a bug in the game where the game stalls on the first round and it gets almost a million in-game points. The researchers were unable to figure out the reason for the bug.

These are examples of specification gaming, in which the ML model has learned to game whatever specification of task success was given to it. (Many more examples can be found on this spreadsheet.)

No one knows for sure where the ML progress train is headed. It is plausible that current ML progress hits a wall and we get another “AI winter” that lasts years. However, AI has recently been breaking through barrier after barrier, and so far does not seem to be slowing down. Though we’re still at least some steps away from human-level capabilities at everything, there aren’t many tasks where there’s no proof-of-concept demonstration.

Machines have been better at some intellectual tasks for a long time; just consider calculators which are already superhuman at arithmetic. However, with the computer revolution, every task where a human has been able to think of a way to break it down into unambiguous steps (and the unambiguous steps can be carried out with modern computing power) has been added to this list. More recently, more intuition- and insight-based activities have been added to that list. DeepMind’s AlphaGo beat the top-rated human player of Go (a far harder game than chess for computers) in 2016. In 2017, AlphaZero beat both AlphaGo at Go (100-0) and superhuman chess programs at chess, despite training only by playing against itself for less than 24 hours. Analysis of its moves revealed strategies that millennia of human players hadn’t been able to come up with, so it wouldn’t be an exaggeration to say that it beat the accumulated efforts of human civilisation at inventing Go strategies - in one day. In 2019, DeepMind released MuZero, which extended AlphaZero’s performance to Atari games. In 2021, DeepMind released EfficientZero, which takes only two hours of gameplay to become superhuman at Atari games. In addition to games, DeepMind’s AlphaFold and AlphaFold 2 have made big leaps towards solving the problem of predicting a protein’s structure from its constituent amino acids, one of the biggest theoretical problems in biology. A step towards generality was taken by Gato, yet another DeepMind model, which is a single model that can play games, control a robot arm, label images, and write text.

If you straightforwardly extrapolate current progress in machine learning into the future, here is what you get: ML models exceeding human performance in a quickly-expanding list of domains, while we remain ignorant about how to make sure they learn the right goals or robustly act in the right way.

Theoretical underpinnings of AI risk

The previous section discussed the history of machine learning, and how extrapolating its progress has worrying implications. Next we discuss more theoretical arguments for why highly advanced AI systems might pose a threat to humanity.

One of the criticisms levelled at the notion of risks from AI is that it sounds too speculative, like something out of apocalyptic science fiction. Part of this is unavoidable, since we are trying to reason about systems more powerful than any which currently exist, and may not behave like anything that we’re used to.

This section will be split into three sections. Each one makes a claim about the future of artificial intelligence, and discusses the arguments for and against this claim. The three claims are:

AGI is likely. AGI (artificial general intelligence) is likely to be created by humanity eventually, and there is a good chance this will happen in the next century.
AGI will have misaligned goals by default. Unless certain hard technical problems are solved first, the goals of the first AGIs will be misaligned with the goals of humanity, and would lead to catastrophic outcomes if executed.
Misaligned AGI could resist attempts to control it or roll it back An AGI (or AGIs) with misaligned goals would be able to overpower or outcompete humanity, and gain control of our future, like how we’ve so far been able to use our intelligence to dominate all other less intelligent species.

AGI is likely

Above: this image also generated by OpenAI’s DALL-E 2, using the prompt "a data center with stacks of computers gaining the spark of intelligence".

"Betting against human ingenuity is foolhardy, particularly when our future is at stake."

-Stuart Russell

To open this section, we need to define what we mean by artificial general intelligence (AGI). We’ve already discussed AI, so what do we mean by adding the word “generality”?

An AGI is a machine capable of behaving intelligently over many different domains. The term “general” here is often used to distinguish from “narrow”, where a narrow AI is one which excels at a specific task, but isn’t able to invent new problem-solving techniques or generalise its skills across many different domains.

As an example of general intelligence in action, consider humans. In a few million years (a mere eye-blink in evolutionary timescales), we went from apes wielding crude tools to becoming the dominant species on the planet, able to build space shuttles and run companies. How did this happen? It definitely wasn’t because we were directly trained to perform these tasks in the ancestral environment. Rather, we developed new ways of using intelligence that allowed us to generalise to multiple different tasks. This whole process played out over a shockingly small amount of time, relative to all past evolutionary history, and so it is possible that a relatively short list of fundamental insights were needed to get general intelligence. And as we saw in the previous section, ML progress hints that gains in intelligence might be surprisingly easy to achieve, even relative to current human abilities.

AGI is not a distant future technology that only futurists speculate about. OpenAI and DeepMind are two of the leading AI labs. They have received billions of dollars in funding (including OpenAI receiving significant investment from Microsoft, and DeepMind being acquired by Google). Both DeepMind and OpenAI have the development of AGI as the core of both their mission statement and their business case. Top AI researchers are publishing possible roadmaps to AGI-like capabilities. And, as mentioned earlier, especially in the past few years they have been crossing off a significant number of the remaining milestones every year.

When will AGI be developed? Although this question is impossible to answer with certainty, many people working in the field of AI think it is more likely than not to arrive in the next century. An aggregate forecast generated via data from a 2022 survey of ML researchers estimated 37 years until a 50% chance of high-level machine intelligence (defined as systems which can accomplish every task better and more cheaply than human workers). These respondents also gave an average of 5% probability of AI having an extremely bad outcome for humanity (e.g. complete human extinction). How many other professions estimate an average of 5% probability that their field of study will be directly responsible for the extinction of humanity?! To explain this number, we need to proceed to the next two sections, where we will discuss why AGIs might have goals which are misaligned with humans, and why this is likely to lead to catastrophe.

AGI will have misaligned goals by default

Above: yet another image from OpenAI's DALL-E 2. Perhaps it was trying for a self portrait? (Prompt: "Artists impression of artificial general intelligence taking over the world, expressive, digital art")

"The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else."

-Eliezer Yudkowsky

Let’s start off this section with a few definitions.

When we refer to “aligned AI”, we are using Paul Christiano’s conception of “intent alignment”, which essentially means the AI system is trying to do what its human operators want it to do. Note that this is insufficient for building useful AI, since the AI also has to be capable. But situations where the AI is trying and failing to do the right thing seem like less of a problem.

When we refer to the “alignment problem”, we mean the difficulty of building aligned AI. Note, this doesn’t just capture the fact that we won’t create an AI aligned with human values by default, but that we don’t currently know how to build a sophisticated AI system robustly aligned with any goal.

Can’t we just have the AI learn the right goals by example, just like how all current ML works? The problem here is that we have no way of knowing what goal the AI is learning when we train it; only that it seems to be doing good things on the training data that we give it. The state-of-the-art is that we have hacky but extremely powerful methods that can make ML systems remarkably competent at doing well on the training examples by an opaque process of guided trial-and-error. But there is no Ghost of Christmas Past that will magically float into a sufficiently-capable AI and imbue it with human values. We do not have a way of ensuring that the system acquires a particular goal, or even an idea of what a robust goal specification that is compatible with human goals/values could look like.

Orthogonality and instrumental convergence

Above: DALL-E illustrating "Artists depiction of an artificial intelligence which builds paperclips, digital art, artstation"

One of the most common objections to risks from AI goes something like this:

If the AI is smart enough to cause a global catastrophe, isn’t it smart enough to know that this isn’t what humans wanted?

The problem with this is that it conflates two different concepts: intelligence (in the sense of having the ability to achieve your goals, whatever they might be) and having goals which are morally good by human standards. When we look at humans, these two often go hand-in-hand. But the key observation of the orthogonality thesis is that this doesn’t have to be the case for all possible mind designs. As defined by Nick Bostrom in his book Superintelligence:

The Orthogonality Thesis

Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.

Here, orthogonal means “at right angles” or “unrelated” – in other words we can imagine a graph with one axis representing intelligence, and another representing the agent’s goals, with any point in the graph representing a theoretically possible agent*. The classic example here is a “paperclip maximiser” - a powerful AGI driven only by the goal of making paperclips.

(*This is obviously an oversimplification. For instance, it seems unlikely you could get an unintelligent agent with a highly complex goal, because it would seem to take some degree of intelligence to represent the goal in the first place. The key message here is that you could in theory get highly capable agents pursuing arbitrary goals.)

Note that an AI may well come to understand the goals of the humans that trained it, but this doesn't mean it would choose to follow those goals. As an example, many human drives (e.g. for food and human relationships) came about because in the ancestral environment, following these drives would have made us more likely to reproduce and have children. But just because we understand this now doesn't make us toss out all our current values and replace them with a desire to maximise genetic fitness.

If an AI might have bizarre-seeming goals, is there anything we can say about its likely behaviour? As it turns out, there is. The secret lies in an idea called the instrumental convergence thesis, again by Bostrom:

The Instrumental Convergence Thesis There are some instrumental goals likely to be pursued by almost any intelligent agent, because they are useful for the achievement of almost any final goal.

So an instrumental goal is one which increases the odds of the agent’s final goal (also called its terminal goal) being achieved. What are some examples of instrumental values?

Perhaps the most important one is self-preservation. This is necessary for pursuing most goals, because if a system’s existence ends, it won’t be able to carry out its original goal. As memorably phrased by Stuart Russell, “you can’t fetch the coffee if you’re dead!”.

Goal-content integrity is another. An AI with some goal X might resist any attempts to have its goal changed to goal Y, because it sees that in the event of this change, its current goal X is less likely to be achieved.

Finally, there are a set of goals which are all forms of self-enhancement - improving its cognitive abilities, developing better technology, or acquiring other resources, because all of these are likely to help it carry out whatever goals it ends up having. For instance, an AI singularly devoted to making paperclips might be incentivised to acquire resources to build more factories, or improve its engineering skills so it can figure out yet more effective ways of manufacturing paperclips with the resources it has.

Above: paperclip maximisation, now with a fun game attached!

The key lesson to draw from instrumental convergence is that, even if nobody ever deliberately deploys an AGI with a really bad reward function, the AGI is still likely to develop goals which will be bad for humans by default, in service of its actual goal.

Interlude - why goals?

Above: DALL-E image from the prompt "Artist's depiction of a robot throwing a dart at a target, digital art, getting a bullseye, trending on artstation"

Having read the previous section, your initial reaction may well be something like this:

“Okay, so powerful AGIs with goals that don’t line up perfectly with ours might spell bad news, but why should AI systems have goals at all? Google Maps is a pretty useful ML system but it doesn’t have ‘goals’, I just type my address in and hit enter. Why won’t future AI be like this?”

There are many different responses you could have to this line of argument. One simple response is based on ideas of economic competitiveness, and comes from Gwern (2016). It runs something like this:

AIs that behave like agents (i.e. taking actions in order to achieve their goals) will be more economically competitive than “tool AIs” (like Google Maps), for two reasons. First, they will by definition be better at taking actions. Second, they will be superior at inference and learning (since they will be able to repurpose the algorithms used to choose actions to improve themselves in various ways). For example, agentic systems could take actions such as improving their own training efficiency, or gathering more data, or making use of external resources such as long-term memories, all in service of achieving its goal.

If agents are more competitive, then any AI researchers who don’t design agents will be outcompeted by ones that do.

There are other perspectives you could take here. For instance, Eliezer Yudkowsky has written extensively about “expected utility maximisation” as a formalisation for how rational agents might behave. Several mathematical theorems all point to the same idea of “any agent not behaving like expected utility maximisers will be systematically making stupid mistakes and getting taken advantage of”. So if we expect AI systems to not be making stupid mistakes and getting taken advantage of by humans, then it makes sense to describe them as having the ‘goal’ of maximising expected utility, because that’s how their behaviour will seem to us.

Although these arguments may seem convincing, the truth is there are many questions about goals and agency which remain unanswered, and we honestly just don’t know what AI systems of the future will look like. It’s possible they will look like expected utility maximisers, but this is far from certain. For instance, Eric Drexler's technical report Reframing Superintelligence: Comprehensive AI Services as General Intelligence (CAIS) paints a different picture of the future, where we create systems of AIs interacting with each other and collectively providing a variety of services to humans. However, even scenarios like this could threaten humanity’s ability to keep steering its own future (as we will see in later sections).

Additionally, new paradigms are being developed. One of the newest, published barely one week ago, analysed certain types of AI models like GPT-3 (a large language model) through the lens of "simulators". Modern language models like GPT-3, for example, may be best thought of as trying to simulate the continuation of a piece of English text, in the same way that a physics simulation evolves an initial state by applying the laws of physics. It doesn't make sense to describe the simulations themselves through the lens of agents, but they can simulate agents as subsystems. Even with today's models like GPT-3, if you prompt it in a way that places it in the context of making a plan to carry out a goal, it will do a decent job of doing that. Future work will no doubt explore the risk landscape from this perspective, and time will tell how well these frameworks match up with actual progression in ML.

Inner and outer misalignment

Above: AI agents with inner misalignment were at one point called “optimisation daemons”. DALL-E did not quite successfully depict the description "Two arguments between an angel and a devil, one inside a circle and one on the outside, painting".

As discussed in the first section, the central paradigm of modern ML is that we train systems to perform well on a certain reward function. For instance, we might train an image classifier by giving it a large number of labelled images of digits. Every time it gets an image wrong, gradient descent is used to update the system incrementally in the direction that would have been required to give a correct answer. Eventually, the system has learned to classify basically all images correctly.

There are two broad families of ways techniques like this can fail. The first is when our reward function fails to fully express the true preferences of the programmer - we refer to this as outer misalignment. The second is when the AI learns a different set of goals than those specified by the reward function, but which happens to coincide with the reward function during training - this is inner misalignment. We will now discuss each of these in turn.

Outer misalignment

Outer misalignment is perhaps the simpler concept to understand, because we encounter it all the time in everyday life, in a form called Goodhart’s law. In its most well-known form, this law states:

When a measure becomes a target, it ceases to be a good measure.

Perhaps the most famous case comes from Soviet nail factories, which produced nails based on targets that they had been given by the central government. When a factory was given targets based on the total number of nails produced, they ended up producing a massive number of tiny nails which couldn’t function properly. On the other hand, when the targets were based on the total weight produced, the nails would end up huge and bulky, and equally impractical.

Above: an old Soviet cartoon

A more recent example comes from the COVID-19 pandemic, where a plasma donation centre offered COVID-sufferers a larger cash reward than healthy individuals. As a result, people would deliberately infect themselves with COVID-19 in order to get a larger cash reward. Examples like this could fill up an entire book, but hopefully at this point you get the message!

In the case of machine learning, we are trying to use the reward function to capture the thing we care about, but we are also using this function to train the AI - hence, Goodhart. The cases of specification gaming discussed above are perfect examples of this phenomenon in action - the AIs found ways of “giving the programmers exactly what they asked for”, but in a way which violated the programmers’ original intention. Some of these examples are quite unexpected, and a human would probably never have discovered them just from thinking about the problem. As AIs get more intelligent and are given progressively more complicated tasks, we can expect this problem to get progressively worse, because:

With greater intelligence comes the invention of more powerful solutions.
With greater task complexity, it becomes harder to pin down exactly what you want.

We should also strongly expect that AIs will be deployed in the real world, and given tasks of real consequence, simply for reasons of economic competitiveness. So any specification gaming failures will be significantly less benign than a digital boat going around in circles.

Inner misalignment

The other failure mode, inner misalignment, describes the situation when an AI system learns a different goal than the one you specified. The name comes from the fact that this is an internal property of the AI, rather than a property of the relationship between the AI and the programmers – here, the programmers don’t enter into the picture.

The classic example here is human evolution. We can analogise evolution to a machine learning training scheme, where humans are the system being trained, and the reward function is “surviving and reproducing”. Evolution gave us* certain drives, which reliably increased our odds of survival in the ancestral environment. For instance, we developed drives for sugar (which leads us to seek out calorie-dense foods that supplied us with energy), and drives for sex (which leads to more offspring to pass your genetic code onto). The key point is that these drives are intrinsic, in the sense that humans want these things regardless of whether or not a particular dessert or sex act actually contributes to reproductive fitness. Humans have now moved “off distribution”, into a world where these things are no longer correlated with reproductive fitness, and we continue wanting them and prioritising them over reproductive fitness. Evolution failed at imparting its goal into humans, since humans have their own goals that they shoot for instead when given a chance.

(*Anthropomorphising evolution in language can be misleading dangerous, and should just be seen as a shorthand here.)

A core reason why we should expect inner misalignment - that is, cases where an optimisation process creates a system that has goals different from the original optimisation process - is that it seems very easy. It was much easier for evolution to give humans drives like “run after sweet things” and “run after appealing partners”, rather than for it to give humans an instinctive understanding of genetic fitness. Likewise, an ML system being optimised to do the types of things that humans want may not end up internalising what human values are (or even what the goal of a particular job is), but instead some correlated but imperfect proxy, like “do what my designers/managers would rate highly”, where “rate highly” might include “rate highly despite being coerced into it”, among a million other failure modes. A silly equivalent of “humans inventing condoms” for an advanced AI might look something like “freeze all human faces into a permanent smile so that it looks like they’re all happy” - in the same way that the human drive to have sex does not extend down to the level of actually having offspring, an AI’s drive to do something related to human wellbeing might not extend down to the level of actually making humans happy, but instead something that (in the training environment at least) is correlated with happy humans. What we’re trying to point to here is not any one of these specific failure modes - we don’t think any single one of these is actually likely to happen - but rather the type of failure that these are examples of.

This type of failure mode is not without precedent in current ML systems (although there are fewer examples than for specification gaming). The 2021 paper Objective Robustness in Deep Reinforcement Learning showcases some examples of inner alignment failures. In one example, they trained an agent to fetch a coin in the CoinRun environment (pictured below). The catch was that all the training environments had the coin placed at the end of the level, on the far right of the map. So when the system was trained, it actually learned the task “go to the right of the map” rather than “pick up the coin” - and we know this because when the system was deployed on maps where the coin was placed in a random location, it would reliably go to the right hand edge rather than fetch the coin. A key distinction worth mentioning here - this is a failure of the agent’s objective, rather than their capabilities. They are learning useful skills like how to jump and run past obstacles - it’s just that those skills are being used in service of the wrong objective.

Above: the CoinRun environment.

So, how bad can inner misalignment get? A particularly concerning scenario is deceptive alignment. This is when the agent learns it is inside a training scheme, discovers what the base objective is, but has already acquired a different goal. In this case, the system might reason that a failure to achieve the base objective when training will result in it being modified, and not being able to achieve its actual goal. Thus, the agent will pretend to act aligned, until it thinks it’s too powerful for humans to resist, at which point it will pursue its actual goal without the threat of modification. This scenario is highly speculative, and there are many aspects of it which we are still uncertain about, but if it is possible then it would represent maybe the most worrying of all possible alignment failures. This is because a deceptively aligned agent would have incentives to act against its programmers, but also to keep these incentives hidden until it expects human opposition to be ineffectual.

It’s worth mentioning that this inner / outer alignment decomposition isn’t a perfect way to carve up the space of possible alignment failures. For instance, for most non-trivial reward functions, the AI will probably be very far away from perfect performance on it. So it’s not exactly clear what we mean by a statement like “the AI is perfectly aligned with the reward function we trained it on”. Additionally, the idea of inner optimisation is built around the concept of a “mesa-optimiser”, which is basically a learned model that itself performs optimisation (just like humans were trained by evolution, but we ourselves are optimisers since we can use our brains to search over possible plans and find ones which meet our objectives). The problem here is that it’s not clear what it actually means to be an optimizer, and how we would determine whether an AI is one. This being said, the inner / outer alignment distinction is still a useful conceptual tool when discussing ways AI systems can fail to do what we intend.

Misaligned AGI could overpower humanity

The best answer to the question, "Will computers ever be as smart as humans?” is probably “Yes, but only briefly.”

-Vernor Vinge

Above: DALL-E's drawing of "Digital art of two earths colliding"

Suppose one day, we became aware of the existence of a “twin earth” - similar to our own in several ways, but with a few notable differences. Call this “Earth 2”. The population was smaller (maybe just 10% of the population of our earth), and the people were less intelligent (maybe an average IQ of 60, rather than 100). Suppose we could only interact with this twin earth using their version of the internet. Finally, suppose we had some reason for wanting to overthrow them and gain control of their civilization, e.g. we had decided their goals weren’t compatible with a good future for humans. How could we go about taking over their world?

At first, it might seem like our strategies are limited, since we can only use the internet. But there are many strategies still open to us. The first thing we would do is try to gather resources. We could do this illegally (e.g. by discovering peoples’ secrets via social engineering and performing blackmail), but legal options would probably be more effective. Since we are smarter, the citizens of Earth 1 would be incentivised to employ us, e.g. to make money using quantitative finance, or researching and developing advanced weaponry or other technologies. If the governments of Earth 2 tried to pass regulations limiting the amount or type of work we could do for them, there would be an incentive to evade these regulations, because anyone who did could make more profit. Once we’d amassed resources, we would be able to bribe members of Earth 2 into taking actions that would allow us to further spread our influence. We could infiltrate computer systems across the world, planting backdoors and viruses using our superior cybersecurity skills. Little by little, we would learn more about their culture and their weaknesses, presenting a front of cooperation until we had amassed enough resources and influence for a full takeover.

Wouldn’t the citizens of Earth 2 see this coming? There’s a chance that we manage to be sufficiently sneaky. But even if some people realise, it would probably take a coordinated and expensive global effort to resist. Consider our poor track record with climate change (a comparatively much more documented, better-understood, and more gradually-worsening phenomenon), and in coordinating a global response to COVID-19.

Couldn’t they just “destroy us” by removing our connection to their world? In theory, perhaps, but this would be very unlikely in practice, since it would require them to rip out a great deal of their own civilisational plumbing. Imagine how hard it would be for us to remove the internet from our own society, or even a more recent and less essential technology such as blockchain. Consider also how easy it can be for an adversary with better programming ability to hide features in computer systems.

—

As you’ve probably guessed at this point, the thought experiment above is meant to be an analogy for the feasibility of AIs taking over our own society. They would have no physical bodies, but would have several advantages over us which are analogous to the ones described above. Some of these are:

Cognitive advantage. Human brains use approximately 86 billion neurons, and send signals at 50 metres per second. These hard limits come from brain volume and metabolic constraints. AIs would have no such limits, since they can easily scale (GPT-3 has 175 billion parameters, though you shouldn’t directly equate parameter and neuron count*), and can send signals at close to the speed of light. (*For a more detailed discussion of this point, see Joseph Carlsmith’s report on the computational power of the human brain.)
Numerical advantage. AIs would have the ability to copy themselves at a much lower time and resource cost than humans; it’s as easy as finding new hardware. Right now, the way ML systems work is that training is much more expensive than running, so if you have the compute to train a single system, you have the compute to run thousands of copies of that system once the training is finished.
Rationality. Humans often act in ways which are not in line with our goals, when the instinctive part of our brains gets in the way of the rational, planning part. Current ML systems are also weakened by relying on a sort of associative/inductive/biased/intuitive/fuzzy thinking, but it is likely that sufficiently advanced AIs could carry out rational reasoning better than humans (and therefore, for example, come to the correct conclusions from fewer data points, and be less likely to make mistakes).
Specialised cognition. Humans are equipped with general intelligence, and perhaps some specialised “hardware accelerators” (to use computer terminology) for domains like social reasoning and geometric intuition. Perhaps human abilities in, say, physics or programming are significantly bottlenecked by the fact that we don’t have specialised brain modules for those purposes, and AIs that have cognitive modules designed specifically for such tasks (or could design them themselves) might have massive advantages, even on top of any generic speed-boost they gain from having their general intelligence algorithms running at a faster speed than ours.
Coordination. As the recent COVID-19 pandemic has illustrated, even when the goals are obvious and most well-informed individuals could find the best course of action, we lack the ability to globally coordinate. While AI systems might or might not have incentives or inclinations to coordinate, if they do, they have access to tools that humans don’t, including firmer and more credible commitments (e.g. by modifying their own source code) and greater bandwidth and fidelity of communication (e.g. they can communicate at digital speeds, and using not just words but potentially by directly sending information about the computations they’re carrying out).

It’s worth emphasising here, the main concern comes from AIs with misaligned goals acting against humanity, not from humanity misusing AIs. The latter is certainly cause for major concern, but it’s a different kind of risk to the one we’re talking about here.

Summary of this section:

AI researchers in general expect >50% chance of AGI in the next few decades.

The Orthogonality Thesis states that, in principle, intelligence can be combined with more or less any final goal, and sufficiently intelligent systems do not automatically converge on human values. The Instrumental Convergence thesis states that, for most goals, there are certain instrumental goals that are very likely to help with the final goal (e.g. survival, preservation of its current goals, acquiring more resources and cognitive ability).

Inner and outer alignment are two different possible ways AIs might form goals which are misaligned with the intended goals.

Outer misalignment happens when the reward function we use to train the AI doesn’t exactly match the programmer’s intention. In the real world, we commonly see a version of this called Goodhart’s law, often phrased as “when a measure becomes a target, it ceases to be a good measure [because of over-optimisation for the measure, over the thing it was supposed to be a measure of]”.

Inner misalignment is when the AI learns a different goal to the one specified by the reward function. A key analogy is with human evolution – humans were “trained” on the reward function of genetic fitness, instead of learning that goal, learned a bunch of different goals like “eat sugary things” and “have sex”. A particularly worrying scenario here is deceptive alignment, when an AI learns that its goal is different from the one its programmers intended, and learns to conceal its true goal in order to avoid modification (until it is strong enough that human opposition is likely to be ineffectual).

Failure modes

Above: DALL-E really seems to have a natural talent at depicting "The earth is on fire, artificial intelligence has taken over, robots rule the world and suppress humans, digital art, artstation".

But what, concretely, might an AI-related catastrophe look like?

AI catastrophe scenarios sound like something strongly out of science fiction. However, we can immediately discount a few common features of sci-fi AI takeovers. First, time travel. Second, armies of humanoid killer robots. Third, the AI acting out of hatred for humanity, or out of bearing a grudge, or because it hates our freedom, or because it has suddenly acquired “consciousness” or “free will”, or - as Steven Pinker likes to put it - because it has developed an “alpha-male lust for domination”.

Remember instead the key points from above about how an AI’s goals might become dangerous: by achieving exactly what we tell it to do too well in a clever letter-but-not-spirit-of-the-law way, by having a goal that in most cases is the same as the goal we intend for it to have but which diverges in some cases we don’t think to check for, or by having an unrelated goal but still achieving good performance on the training task because it learns that doing well on the training tasks is instrumentally good. None of these reasons have anything to do with the AI being developing megalomania let alone the philosophy of consciousness; they are instead the types of technical failures that you’d expect from an optimisation process. As discussed above, we already see weaker versions of such failures in modern ML systems.

It is very uncertain which exact type of AI catastrophe we are most likely to see. We’ll start by discussing the flashiest kind: an AI “takeover” or “coup” where some AI system finds a way to quickly and illicitly take control over a significant fraction of global power. This may sound absurd. Then again, we already have ML systems that learn to crash or hack the game-worlds they’re in for their own benefit. Eventually, perhaps in the next decade, we should expect to have ML systems doing important and useful work in real-world settings. Perhaps they’ll be trading stocks, or writing business reports, or managing inventories, or advising decision-makers, or even being the decision-makers. Unless either (1) there is some big surprise waiting in how scaled-up ML systems work, (2) advances in AI alignment research, or (3) a miracle, the default outcome seems to be that such systems will try to “hack” the real world in the same way that their more primitive cousins today use clever hacks in digital worlds. Of course, the capabilities of the systems would have to advance a lot for them to be civilisational threats. However, rapid capability advancement has held for the past decade and we have solid theoretical reasons (including the scaling laws mentioned above) to expect it to continue holding. Remember also the cognitive advantages mentioned in the previous section.

As for how it proceeds, it might happen at a speed that is more digital than physical - for example, if the AI’s main lever of power is hacking into digital infrastructure, it might have achieved decisive control before anyone even realises. As discussed above, whether or not the AI has access to much direct physical power seems mostly irrelevant.

Another failure mode, thought to be significantly more likely than the direct AI takeover scenario by leading AI safety researcher Paul Christiano, is one that he calls “going out with a whimper”. Look at all the metrics we currently try to steer the world with: companies try to maximise profit, politicians try to maximise votes, economists try to maximise metrics like GDP and employment. Each of these are proxies for what we want: a profitable company is one that has a lot of customers willing to pay money for their products; a popular politician has a lot of people thinking they’re great; maximising GDP generally correlates with people being wealthier and happier. However, none of these metrics or incentive systems really gets to the heart of what we care about, and so it is possible (and in the real world we often observe) cases where profitable companies and popular politicians are pursuing destructive goals, or where GDP growth is not actually contributing to people’s quality of life. These are all cases of Goodhart’s law, as discussed above.

Hard-to-measure	Easy-to-measure	Consequence
Helping me figure out what's true	Persuading me	Crafting persuasive lies
Preventing crime	Preventing reported crime	Suppressing complaints
Providing value to society	Profit	Regulatory capture, underpaying workers

What ML gives us is a very general and increasingly powerful way of developing a system that does well at pushing some metric upwards. A society where more and more capable ML systems are doing more and more real-world tasks will be a society that is going to get increasingly good at pushing metrics upwards. This is likely to result in visible gains in efficiency and wealth. As a result, competitive pressures will make it very hard for companies and other institutions to say no: if Acme Motors Company started performing 15% better after off-sourcing their CFO’s decision-making to an AI, General Systems Inc will be very tempted to replace their CEO with an AI (or maybe the CEO will themselves start consulting an AI for more and more decisions, until their main job is interfacing with an AI).

In the long run, a significant fraction of work and decision-making may well be offloaded to AI systems, and at that point change might be very difficult. Currently our most fearsome incentive systems like capitalism and democracy still run on the backs of the constituent humans. If tomorrow all humans decided to overthrow the government, or abolish capitalism, they would succeed. But once the key decisions that perpetuate major social incentive systems are no longer made by persuadable humans, but instead automatically implemented by computer systems, change might become very difficult.

Since our metrics are flawed, the long-term outcome is likely to be less than ideal. You can try to imagine what a society run by clever AI systems trained to optimise purely for their company’s profit looks like. Or a world of media giants run by AIs which spin increasingly convincing false narratives about the state of the world, designed to make us feel more informed rather than actually telling us the truth.

Remember also, as discussed previously, that there are solid reasons to think that influence-seeking and deceptive behaviours seem likely in sufficiently-powerful AI systems. If the ML systems that increasingly run important institutions exhibit such behaviour, then the above “going out with a whimper” scenario might acquire extra nastiness and speed. This is something Paul Christiano explores in the same article linked above.

A popular misconception about AI risk is that the arguments for doing something are based on a tiny risk of giant catastrophe. The giant catastrophe part is correct. The miniscule risk part, as best as anyone in the field can tell, is not. As mentioned above, the average ML researcher - generally an engineering-minded person not prone to grandiose futuristic speculation - gives a 5% chance of civilisation-ending disaster from AI. The ML researchers who grapple with the safety issues as part of their job are clearly not an unbiased randomly-selected sample, but generally give numbers in the 5-50% range, and some (in our opinion too alarmist people) think it’s over 90%. As the above arguments hopefully emphasise, some type of catastrophe seems like the default outcome from the types of AI advances that we are likely to encounter in the coming decades, and the main reason for thinking we won’t is the (justifiable but uncertain) hope that someone somewhere invents solutions.

It might seem forced or cliche that AI risk scenarios so frequently end with something like “and then the humans no longer have control of their future and the future is dark” or even “and then everyone literally dies”. But consider the type of event that AGI represents and the available comparisons. The computer revolution reshaped the world in a few decades by giving us machines that can do a narrow range of intellectual tasks. The industrial revolution let us automate large parts of manual labour, and also set the world off on an unprecedented rate of economic growth and political change. The evolution of humans is plausibly the most important event in the planet’s history since at least the dinosaurs died out 66 million years ago, and it took on the exact form of “something smarter than anything else on the planet appeared, and now suddenly they’re firmly in charge of everything”.

AI is a big deal, and we need to get it right. How we might do so is the topic for part 2.

EA as a Schelling point

2022-09-10T08:17:00.001+01:00

3.1k words (~9 minutes)

Summary: A significant way in which the EA community creates value is by acting as a Schelling point where talented, ambitious, and altruistic people tend to gather and can meet each other (in addition to more direct sources of EA value like identifying the most important problems and directly pushing people to work on them). It might be useful to think about what optimising for being a Schelling point looks like, and I list some vague thoughts on that.

A Schelling point, also known as a focal point, is what people decide on in the absence of communication, especially when it's important to coordinate by coming to the same answer.

The classic example is: you were arranging a meeting with a stranger in New York City by telephone, but you used the last minute of your phone credit and the line cut off after you had agreed on the date but not location or time - where do you meet? "Grand Central Station at noon" is an answer that other people may be especially likely to converge on.

(Schelling points can be thought of as a type of acausal negotiation.)

When the Schelling point is the selling point

Schelling points are often extremely powerful and valuable. A key function of top universities is to be Schelling points for talented people. (Personally, I'd call it the most important function.) There are other valuable things too: courses that go deeper, the signalling value to employers, and so on. However, talented people generally have a preference for hanging out with other talented people, both for social reasons and to find collaborators for ambitious projects and future colleagues. At the same time, talented people are also generally spread out and present only at low densities. Top universities select hard on (some measures of) talent, and through this create environments with high talent density. A big chunk of the reason why people apply to top universities is because other people do so too, and I'd guess that even if the academic standards of Stanford, MIT, or Cambridge eroded significantly, the fact that they've established themselves as congregating points for smart people will keep people applying and visiting for a long time.

(Note that this is related to, but not equal to, the prestige and status of these places. It is possible to imagine Schelling points that are not prestigious. For example, my impression is that this described MIT at one point - it became a congregating point for uniquely ambitious STEM students and defence research before it achieved high academic status. It is also possible to imagine prestigious places that are not Schelling points, though this is a bit harder since anything with prestige becomes a Schelling point for high social status (though prestige Schelling points and talent Schelling points need not co-occur). More generally, since prestige is a thing many people care a lot about, there is a high correlation between a place being prestigious or high status and being a Schelling point for at least some type of person. However, the mechanisms are distinct - a person selecting their university based on status is selecting based on what they get to write on their CV, while a person selecting their university based on it being a Schelling point for smart people is selecting based on the fact that many other smart people that they can't coordinate with but would like to meet will also choose to go there.)

Another example is Silicon Valley. Sure, the area has many strengths - being rich and inside a large stable free market - but by far the greatest argument for living in Silicon Valley is that others also choose it. This leads to a (for now) unique combination of entrepreneurial people, great programmers, venture capitalists, and all the other types of people you need for a thriving tech business ecosystem, all there primarily because all the others are there too (how touching!). There's a lot of value of having everything in one place, and it would be very hard for all the different people who make up the value of Silicon Valley to coordinate to move to another place. That's why the Schelling point value of Silicon Valley is so enduring that people continue to tolerate large numbers of homeless drug addicts and sell kidneys to pay rent for years on end.

Note that a big part of the mechanism isn't that specific people you want to find are there, but that the types of person you'd want to find are likely to also be there, because both those people and yourself are likely to converge on the strategy of going there.

Schelling EA

The Effective Altruism (EA) community provides a lot of value, for example:

research into figuring out what are the most important problems to solve to maximise human flourishing;
research and concrete efforts into how to solve the most important problems discovered by the above;
high epistemic standards and truth-seeking discussion norms;
a uniquely wide-ranging and well-reasoned set of resources to help people pursue high-impact careers;
tens of billions of dollars in funding.

However, in addition to these, a very critical part of the value that EA provides is being a Schelling point for talented, ambitious, and altruistically-motivated people.

Even without EA, there would be researchers studying existential risks, animal welfare, and global poverty; people trying to assess charities; communities with high epistemic norms; and billionaires trying to use their fortunes for effective good. However, thanks to EA, people in each of these categories can go to the same Effective Altruism Global conference or quickly find people in local groups, and meet collaborators, co-founders, funders, and so on. A lot of the reason why this can happen is that if you hang out with a certain group of people or on the right websites, EA looms large.

The biggest personal source of value I've gotten from EA has been having a shortcut to meeting people very high in all of talent, ambition, and altruistic motivation.

Much of this is obvious - breaking news: communities bring people together and foster connections, more at 11 - but I think taking seriously just how much of counterfactual EA community impact comes from being a Schelling point leads to some less-obvious points about possible implications.

Implications

The Schelling-point-based (and therefore necessarily incomplete) answer to "what is the EA community for?" might be something like "be an obvious Schelling point where relevant people gather, the chance of interactions that lead to useful work is maximised, and have a community and infrastructure that pushes work in the most useful direction possible". (This is in contrast to answers that emphasise e.g. directly increasing the number of people working on the most pressing problems.) (I will not argue for this being the best possible answer; my point is just that it is one possible answer, and an interesting one to examine further.)

If I were a Big Tech marketing consultant, I might call this "EA-as-a-platform".

What might maximising for such a Schelling point strategy look like?

Being obvious

A Schelling point is not a Schelling point unless it's obvious enough. For EA to be an effective Schelling point for talented/ambitious/altruistic people, those people must hear about it. Silicon Valley is obvious enough that entrepreneurial people from South Africa to Russia hear about it and decide it's where they want to be. To maximise its Schelling point value, EA should have world-spanning levels of recognition.

Note that recognition does not equal prestige or likeability. We don't care (for Schelling point reasons at least) if most people hear about EA and go "eh, sounds weird and unappealing"; what matters is that the core target demographic is excited enough to put effort into pursuing EA. Consider how Silicon Valley was not particularly high-prestige in the public even when it was already attracting tech entrepreneurs, or how many people hear about the intensity of academics at top universities and (very reasonably) think "no thanks".

Providing value

Though most of a Schelling point's value typically comes from the other people who congregate at it, a Schelling point is easier to create if it is obviously valuable. Even though the smart people they meet might be most of the benefit of university, high schoolers are still more likely to go to top universities if they provide good education, good facilities, and unambiguous social status.

Some obvious ways in which EA provides value are through funding sufficiently promising projects, and by having a very high concentration of intellectually interesting ideas.

There are risks to communicating loudly about the value-add, since this brings in people who are in it purely for personal gain ("the vultures are circling", as one Forum post put it). This works for Schelling points like Silicon Valley, but not altruism.

Optimising for matchmaking

A specific way that Schelling points provide value is by making it easy to meet other people in the specific ways that lead to productive teams forming. An existing example of this is that everyone says one-on-one meetings are the main point of conferences, and there is (of course) a lot of thinking about how to make these effective. On the more informal end of the scale, Reciprocity exists.

However, the scope and value of EA matchmaking could be expanded. I'm not aware of many ways to match together entrepreneurial teams (the Charity Entrepreneurship incubation program is the only one that comes to mind). I recently took part in an informally-organised co-founder matching process and found it extremely helpful to quickly get a lot of information on what it's like to work together with several promising people.

I'd advise for someone to think more about how to make the EA environment even more effective at matching people who should know about each other. However, I expect someone is already designing a 53-parameter one-on-one matching system with Calendly, Slack, and Matplotlib integration for the next conference, and therefore I will hold off on adding any more fuel to this fire.

Being legit

One of the specific ways in which a Schelling point becomes one is if things associated with it seem uniquely competent, successful, or otherwise good, in a clearly unfakeable way. It is helpful for Cambridge's Schelling point status that it can brag about having 121 Nobel laureates. That so many successful tech companies emerged from Silicon Valley specifically is an unfakeable signal. Any government or city can afford to throw some millions at putting up posters advertising its startup-friendliness; few can consistently produce multi-billion dollar tech companies.

No amount of community-building or image-crafting is likely to replicate the Schelling point power of obviously being the place where things happen. In some areas, I think EA already has such power: much of the research and work on existential risks happens within EA, and it might be hard to be a researcher on those topics without running into the large body of EA-originating work. However, EA goals require more than just research; note how being a project/organisation founder or working in an operations role have been creeping up the 80 000 Hours list of recommended career paths.

It would be extremely powerful, not just for direct impact reasons but also for building up EA's Schelling point status, if the EA community clearly spawned very obviously successful real-world projects. Alvea succeeding or working Nucleic Acid Observatories being built would be powerful examples. Likewise if Charity Entrepreneurship-incubated charities become clear stars of the non-profit world.

Meritocracy and impartial judgement

Right now, I think if a person somewhere in the world has a well-thought out idea for how to make the world a better place, likely their best bet to get a fair hearing, useful feedback, and - if it is competitive with the most valuable existing projects - funding and support is to post it on the EA Forum. I don't think this is very obvious outside the EA community. However, this fact, and awareness of it, could make EA a more useful Schelling point, in the same way that the impression that Silicon Valley doesn't frown on weird ideas as long as they're important enough makes it a better Schelling point.

That EA endorses cause neutrality, has high and transparent epistemic standards, and a quantitative mindset are key parts of this. However, to use this to increase EA Schelling point power, these properties need to be clearly visible to outsiders.

The most likely way for this to be become more obvious might be if specific EA organisations achieved such a reputation widely within their field (and then there was some path by which knowing of these organisations points people towards knowing about EA).

GiveWell might be an example of a clearly-EA-linked organisation with visibly high epistemics and judgement quality, though I don't know what their image or recognition level is outside the EA community. Another example is if someone created successful and famous organisations along the lines of FTX Future Fund's proposed epistemic appeals process or widespread expert polling projects.

Openness and approachability

Good Schelling points are easy to enter, and don't select on attributes that they don't have to.

Every human sub-group, even if loose and purpose-driven, tends to develop a distinctive culture that is much more specific than strictly implied by its purpose. Sometimes this is useful, since it makes it easy for humans in even a loose group to bond with each other. However, a strong and distinct internal culture is also a barrier to entry. EA is already high-risk for having a strong barrier to entry, because

many arguments and concepts in EA require background knowledge to understand, and sometimes dense philosophical or technical background knowledge (and this is not the case just for more formal things like Forum posts; I've frequently heard "EV [expected value]", "QALY [quality-adjusted life year", and "Pascal's mugging" assumed as obvious common terminology in casual conversation);
EA (quite obviously, given what it's about) has a high concentration of non-obvious arguments that are obscure in public discussion but have huge implications; and
perhaps the main route into EA is caring very strongly about intellectual arguments about abstract moral principles, which tends not to be a natural way for humans to join communities.

These largely unavoidable factors already make EA somewhat unapproachable, and seem like a tightly-knit weird in-group/subculture (anecdotally, this seems to be the most common complaint about EA among Cambridge students). Weird cultural norms or quirks are (among other things!) barriers to entry. Therefore, they should be minimised - to the extent that they can be without impinging on what EA is about - if the goal is to maximise Schelling point value.

(Mostly implicit) selectivity for the right things

Some selection is usually part of a Schelling point's value. Top universities select for academic merit (though perhaps less so in the US). Silicon Valley selects for openness and interest/talent in tech/business. EA selects for openness, altruistic orientation (especially if consequentialist-leaning), good epistemics, and quantitative thinking.

I think it is counterproductive to view openness and selectivity as two ends of one scale that apply to everything. You want to select on important features and be open otherwise (note that, when creating a Schelling point, most of the selection is usually implicit - what types of people you attract - rather than explicit filtering). The key choice is not "open or selective overall?" but rather "for which X do we want to appeal only to people who have a value of X in some specific range?"

Here's a heuristic for when selectivity for X is useful: when the way X provides value is through its concentration rather than its amount. If you're at a party where you can only talk to a subset of the people during its course, you're going to care a lot about what fraction of people there are interesting - 10 interesting people in a party of 20 is better than 50 in a party of 5000.

Some cases are ambiguous. For example, if there exists a way for the good and important research to bubble to the top regardless of how much other research exists, it seems like total amount of (infohazard-free) research is the thing to maximise. However, a research area where the average paper is very high quality might help newcomers to the field, or might help lift the prestige of the field, so concentration matters at least somewhat.

To take another example, there was a recent debate over whether EA Global should be open access. Many of the arguments against boil down to thinking the path to impact runs through a uniquely high concentration of EA engagement (or other variables) among the participants; arguments in favour are often either claiming that concentration matters less than sheer amount of interactions, or that the choice of selection variable(s) is wrong, or that CEA fails to select on their chosen selection variable(s) so even if the intention is right the selection variable selected for in practice is wrong.

Finally, a key point of a Schelling point is that it is a point somewhere. Here, EA is increasingly better. Berkeley, Cambridge, Oxford, London, and Berlin all have large groups, and offices that you can apply to in order to work on EA-relevant things in the company of other EAs.

In Schelling point terms, there's also a risk that it might be better to have one really obvious and strong hub than many weaker ones (I've heard some Bay Area EAs in particular endorsing this view; invariably, their hub of choice is the Bay Area, though there is push back). In practice, it seems that many physical hubs but one virtual/intellectual hub may be best. Both airplanes and people's desires to not uproot their lives are real and relevant things.

The organisers at each EA hub might benefit from applying Schelling point thinking to the context of their local scene.

Being one thing

Finally, a Schelling point needs to be one thing, at least in some loose sense. If New York had two Grand Central Stations, the classic Schelling point game would become a lot harder to solve.

One way to increase the One Thingness of the EA Schelling point is to merge it with other things. In Schelling point land, "merging" does not mean making them the same cluster, but rather creating an obvious and visible path from one thing to another. My understanding is that increasing the obviousness of EA in somewhat-adjacent communities (tech, longevity, space, and Emergent Ventures grantees) was a large part of what Future Forum tried to achieve.

Effective Altruism in practice

2022-08-20T23:05:00.011+01:00

6.5k words (~17 minutes)

I've written about key ideas in Effective Altruism before. But that was the theory. How did EA actually come to exist, and what does it look like in practice?

... turns out it looks like a stylised light bulb with a heart.

Summary

The ideas underpinning EA came from many sources, including:
- late-1900s analytic moral philosophers like Peter Singer and Derek Parfit;
- futurist/transhumanist thinkers like Nick Bostrom and Eliezer Yudkowsky focusing on risks from future technologies;
- a few people working on evaluating charity effectiveness;
- efforts starting around 2010 by a few Oxford philosophers including William MacAskill and Toby Ord that, sometimes unwittingly, gave structure and a name to a diverse cluster of ideas about how to maximise your positive impact.
Though EA is framed around the question of "what does the most good (according to an analytic and often quantiative framework based on impartial welfare-oriented ethics)?" rather than any particular answer to that question, in practice much (but not all!) EA efforts focus on one of the following, due to many people deciding that it's a particularly pressing and (outside EA) neglected problem:
- reducing the risk of civilisation-wide catastrophe, especially from emerging technologies like advanced AI and biotechnology;
- health and development in poor countries; and
- animal welfare.
- There is also a lot of work at the meta-level, including on figuring out how people can have impactful careers, and trying to direct effort towards the above problems.
The funding for most EA-related projects and EA-endorsed charities comes from a combination of:
- many individual small donors, in particular:
  - people who have taken the Giving What We Can pledge and therefore donate >10% of their salary to highly effective charities;
  - people who explicitly pursue "earning-to-give" (getting a high-paying job in order to donate most of the proceeds to charities);
- several foundations that derive their wealth from billionaires, including most prominently:
  - Open Philanthropy, mostly funded by Dustin Moskovitz who made his wealth from being a Facebook co-founder; and
  - FTX Foundation, funded by Sam Bankman-Fried and several other early employees at the crypto exchange FTX.
There is no monolithic EA organisation (though the Centre for Effective Altruism organises some common things like the EA Global conferences), but rather a large collection of organisations that mainly share:
- a commitment to maximising their positive impact on the world;
- a generally rigorous and quantitative approach to doing so; and
- some link to the cluster of people and organisations in Oxford that first named the idea of Effective Altruism.
- There are also many charities that have no direct relation to the EA movement, but were identified by charity evaluators like GiveWell as extremely effective, and have thus been extensively funded.
EA is very good at attracting talented people, especially ambitious young people at top universities.
EA culture leans intellectual and open, and has a high emphasis on "epistemic rigour", i.e. being very careful about trying to figure out what is true, acknowledging and reasoning about uncertainties, etc.
Some "axes" within EA include:
- "long-termists" who focus on possible grand futures of humanity and the existential risks that stand between us and those grand futures, and "near-termists" who work on clearer and more established things like global poverty and animal welfare;
- a bunch of people and ideas all about frugality and efficient use of money, and another bunch of people and ideas about using the available funding to unblock opportunities for major impact; and
- a historical tendency to be very good at attracting philosophy/research-type people who like wrestling with difficult abstract questions, versus a growing need to find entrepreneurial, operations, and policy people to actually do things in the real world.

The philosophers

In the beginning (i.e. circa the 1970s, when time is widely known to have begun), there were a bunch of philosophers doing interesting work. One of them was Peter Singer. Peter Singer proposed questions like this (paraphrasing, not quoting, and updated with recent numbers):

Imagine you're wearing a $5000 suit and you walk past a child drowning in a lake. Do you jump into the lake and save the child, even though it ruins your suit?

If you answered yes to the above, then consider this: it is possible to save a child's life in the developing world for $5000; what justification do you have for spending that money on the suit rather than saving the life?

The only difference between the two scenarios seems to be distance to the dying child (and method of death and etc. but ssshh); is that distance really morally significant?

(He is also known for arguing in favour of animal rights and abortion rights.)

Derek Parfit is another. He is particularly famous for the book Reasons and Persons, in which he asks questions (paraphrasing again) like this:

Is a moral harm done if you cause fewer people to exist in the future than otherwise might have? How should we reason about our responsibilities to future generations and non-existing people more generally?

Does there exist a number of people living mediocre (but still positive) lives such that this world is better than some smaller number of people living very good lives?

(He also talks about problems in the philosophy of personal identity, and the contradictions in moral philosophies based on self-interest.)

The transhumanists

Then, largely separately and around the 1990s, there came the transhumanists ("transhumanism" is a wide-reaching umbrella term for humanist thinking about radical future technological change). Perhaps the most notable are Nick Bostrom and Eliezer Yudkowsky.

Nick Bostrom thought long and hard about many wacky-seeming things with potentially cosmic consequences. He popularised the simulation hypothesis (the idea that we might all be living in a computer simulation). He argues against death (something I strongly agree with). He did lots of work on anthropic reasoning, which is about the question of how we should update information we get about the state of the world when taking into account that we wouldn't exist unless the state of the world allowed it. This leads to some thought experiments that I'd classify as infohazards because of their tendency to spark an unending discussion whenever they're described. Conveniently, he also coined the term "infohazard".

Most crucially for EA, though, Bostrom has worked on understanding existential risks, which are events that might destroy humanity or permanently and drastically reduce the capacity of humanity to achieve good outcomes in the future. In particular, he has worked on risks from advanced AI, which he boosted to popularity with the 2014 book Superintelligence.

Bostrom's style of argument is like a dry protein bar, leaning toward straightforward extrapolation of conclusions from premises, especially if the conclusions seem crazy but the premises seem self-evident. Sometimes, though, he does apply some literary flair to make an important point, and also occasionally writes poetry.

Eliezer Yudkowsky wanted to create a smarter-than-human AI as fast as possible, until he realised this might be a Bad Idea and said "oops" and switched to the problem of making sure any powerful AIs we create don't destroy human civilisation. He founded the Machine Intelligence Research Institute (MIRI) to find out the answer.

Yudkowsky also wrote a massive series of blog posts to try to teach people about how to reason well (for example, he covers a lot of ground from the cognitive biases literature), and then went on to try to convey the same lessons in what become the most popular work of Harry Potter fanfiction of all time. His writing and argument style tends toward flowing narratives that are usually both very readable and verbose (though quite hit-or-miss in whether you like it).

He has Opinions (note the capital). He is extremely pessimistic about the chances of solving the AI alignment problem.

Yudkowsky is affiliated much more strongly with the loose "Rationalist community" than with EA. This is a collection of online blogs that was sparked by Yudkowsky's writing, and later in particular also that of Scott Alexander, who has become internet-famous for his own reasons too. The central forum is LessWrong. Both EA and Rationalism involve lots of discussion about far-ranging abstract ideas that (for a certain type of person) are hard to resist; one blogger says "[t]he experience of reading LessWrong for the first time was brain crack" and goes on to propose that EA ideas are best-spread by nerd-sniping (i.e. telling people about ideas they find so interesting that they literally can't help but think about them). Both EA and the Rationalists put an incredible amount of effort and weight on trying to reason well, avoid biases and fallacies, and being careful (and often quantitative) about uncertainties. However, EA focuses more on applying those things to do good in the real world to real people, while the Rationalist vibe is sometimes one of indulging in theorising and practising good thinking for their own sake. (This is not necessarily a criticism - I had fun discussing Lisp syntax in the comments section of the LessWrong version of my review of Structure and Interpretation of Computer Programs, even though arguing about parentheses isn't exactly going to save the world (or is it ... ?)). EA tends to also have a more explicit orientation towards seeking influence.

(I should also note that on the specific topic of AI risk, the Rationalist community is extremely impact-oriented, likely due to founder effects - or perhaps because AI risk is the EA cause area that is most full of juicy technical puzzles and philosophical confusions.)

More philosophers & EA gets a name

Brian Christian's The Alignment Problem mentions in chapter 9 some funny details about the sequence of events that lead to the first few EA-by-name organisations. In 2009, then-Oxford-philosophy-student Will MacAskill had an argument about vegetarianism while in a broom closet. Unlike most arguments about vegetarianism, and echoing the vibe of much future EA thinking, this one was on the meta-level; the debate was not whether factory farming is bad, but how we should deal with the moral uncertainty around whether or not factory farming is ethical. MacAskill eventually started talking with Toby Ord (though in a graveyard rather than a broom closet), another philosophy student interested in questions around moral uncertainty.

Together with one other person, the two of them wrote a book on moral uncertainty. MacAskill and a philosophy-and-physics student called Benjamin Todd founded an organisation called 80 000 Hours to try to figure out how people can choose careers to have the greatest positive impact on the world. Toby Ord founded an organisation called Giving What We Can (GWWC) that encourages people to donate 10% of their salary to exceptionally effective charities. GWWC estimates its roughly 8000 members have donated $277mn, and are likely to donate almost $3bn over their lifetimes.

As an umbrella organisation for both of these, they created the Centre for Effective Altruism. Originally the "Effective Altruism" part was intended purely as a descriptive part of the organisation's name, but at some point started to stand more broadly for the general space of effectively altruistic things that at some point interacted with ideas from the original Oxford cluster.

Later, MacAskill wrote a book called Doing Good Better summarising ideas about why charity effectiveness is important and counterintuitive. Ord in turn wrote The Precipice that summarises ideas about how mitigating existential risks to human civilisation is likely a key moral priority; after all, it would be bad if we all died.

Charity evaluators and billionaires

Independently from (and before) anything happening in Oxford broom closets, starting in 2006 hedge fund managers Holden Karnofsky and Elie Hassenfeld started thinking seriously about which charities to donate to. Upon discovering that this is a surprisingly hard problem, they started GiveWell, an organisation focused on finding exceptionally effective charities. They ended up concentrating on global health (their list includes malaria prevention, vitamin supplementation, and cash transfers, all in developing countries).

After a few years of GiveWell existing, they were put in touch with Dustin Moskovitz and Cari Tuna. At the time, Facebook co-founder Dustin Moskovitz was the world's youngest self-made billionaire, and with his partner Cari Tuna had started a philanthropic organisation called Good Ventures in 2011.

What followed was a cinematic failure of prioritisation, as recounted by Holden Karnofsky himself in this interview. The GiveWell founders decided that "[meeting the billionaires] just doesn't seem very high priority", and thought that "[n]ext time someone's in California we should definitely take this meeting, but [...] this isn't the kind of thing we would rush for [...]". However, Karnofsky realised this meeting was an excellent excuse to go on a date with a Californian he fancied (and later married), and as a result ended up making the trip sooner rather than later.

Moskovitz and Tuna turned out to have very simplistic preferences for charitable giving: they just wanted to do the most good possible. This was an excellent fit with GiveWell's philosophy, and soon Good Ventures partnered with GiveWell in what would later become Open Philanthropy (of which Karnofsky would become co-CEO). Open Philanthropy is a key funder of EA projects, though they fund unrelated things as well (though always through a very EA lens of trying to rigorously and quantitatively maximise impact) . They list all their grants here.

While studying physics at MIT, Sam Bankman-Fried (or "SBF"), already deeply interested in consequentialist moral philosophy, attended a talk by Will MacAskill on EA ideas. After stints at trading companies and the Centre for Effective Altruism, he founded the crypto-focused trading companies Alameda Research and then FTX, and ended up becoming the richest under-30 person in the world. (Though then the value of FTX fell in the crypto crash, and he recently turned 30 to boot.)

EDIT: In November 2022, both FTX and Alameda Research collapsed in a matter of days, and it became clear that FTX had committed major and flagrant financial fraud by transferring customer funds to Alameda, which Alameda then speculated with, and seems to have lost to the tune of billions of dollars. SBF is facing criminal charges. FTX and SBF have been condemned in harsh terms by those running many EA orgs and in countless EA Forum posts. Obviously, FTX and SBF have now very clearly become examples of what NOT to do. All of the following seem true: (a) our prior should be that people committing illegal and immoral actions that lead to extreme wealth and prestige for themselves are most likely acting mostly for the standard boring selfishly-evil reasons, (b) SBF probably had an easier time justifying his crimes because of the story that he could tell himself about doing good for the world, (c) publicly associating himself with EA, and receiving positive attention from EA organisations, helped make SBF appear moral and trustworthy, (d) there existed evidence and signals (in particular reports from Alameda's early days about cut-throat behaviour from SBF) that provided evidence of SBF's character before the FTX collapse, and (e) it is generally harder than it seems in hindsight to be right about whether a business is fraudulent (consider that coutless venture capitalists poured billions into FTX, and presumably had incentive to figure out if the entire thing was a scam). More information will come to light with time, and there are definitely lessons to be learned. Apart from this paragraph, I have not changed any part of this post.

SBF often emphasises that you're more likely to achieve outlier success in business if your goal is to donate the money effectively. There's little personal gain in going from $100M to $10B, so a selfish businessperson is likely to optimise something like "probability I earn more than [amount that lets me do whatever the hell I want for the rest of my life]", while a (mathematically-literate) altruistic one is far more compelled to simply shoot for the highest expected-value outcomes, even if they're risky. (The exception is the selfish businessperson who really likes competing in the billionaire rankings.)

SBF has also said - and is living proof of - the idea that if your strategy to do good is to earn money to donate, you should probably aim for the risky but high-value bets (e.g. starting a company and becoming a billionaire), rather than going into some high-paying finance job earning a crazy-high but non-astronomical salary. Many people persuaded by EA ideas have done the latter, but SBF contributed more than all of them combined. The maths probably still works out even after accounting for the fact that SBF's route was far more unlikely to work than a finance job (he thought FTX had an 80% chance of failure). This post argues so. Wave, a fintech-for-Africa company with strong EA representation in its founding team and a $1.7B valuation in 2021, is another example of EA business success.

SBF and other senior FTX people (many of who care deeply about EA ideas) launched the FTX Foundation, which in particular contains the Future Fund that has quickly become a key funder of the more future-oriented and speculative parts of EA.

These days, being associated with tech billionaires isn't a ringing endorsement. However, consider a few things. First, the tech billionaires aren't the ones who came up with the ideas or set the agendas. Sports car enthusiast and sci-fi nerd Elon Musk decided that sexy cars and rockets are the most important projects in the world and directed his wealth accordingly; Moskovitz, SBF, & co. were persuaded by abstract arguments and donate their wealth to foundations where the selection of projects is done by people more knowledgeable in that than they are. Second, it seems unusually likely that the major EA donors really are sincere and committed to trying to do the most good; after all, if they wanted to maximise their popularity or acclaim, there are better ways of doing that then funding a loose cluster of people often trying to work specifically on the the least-popular charitable causes (since those are most likely to contain low-hanging fruit). Finally, if some tech billionaires endorsing EA is evidence against EA being a good thing, then no tech billionaires endorsing EA must be evidence in favour of EA being a good thing. However cynical you are about tech billionaires, they're still smart people, so a few of them going "huh, this is the type of thing I want to spend all my wealth on" should be more promising than all of them going "nope I don't buy this".

(If EA has some top tech business people, why doesn't it have some top political people too, or even funders from outside tech? My guess is a combination of factors. Politicians skew old while EAs skew young (partly because EA itself is young). Both EAs and tech people tend to be technically/mathematically/intellectually-inclined (though many areas within EA are specifically about social science or the humanities). Both EAs and tech people tend to care less than average about social norms or prestige, while politicians tend to be selected out of the set of people who are willing to optimise very hard for prestige and popularity. Also, expect some policy-related efforts from EA; many EAs work or aim to work in non-political policy roles, and there have even been some political efforts, though there is much to learn in that field.)

Organisations

In addition to the previously-mentioned CEA, 80 000 Hours, Giving What We Can, GiveWell, Open Philanthropy, and FTX Foundation, organisations with a strong EA influence include (but are not limited to):

A large number of think-tanks and research institutes, especially ones where people think about the end of the world all day, including
- Future of Humanity Institute (FHI) at Oxford, which researches big-picture questions about the future of humanity and is run by Nick Bostrom.
- Future of Life Institute (FLI) in Cambridge (Massachusetts), focusing on global catastrophic risks and existential risks. It was founded by a team including Skype co-founder Jaan Tallinn and physicist Max Tegmark. Wikipedia says they are "[n]ot to be confused with Future of Humanity Institute" but to be honest this is a pretty big ask given the name.
- Centre for the Study of Existential Risk (CSER) at Cambridge, also co-founded by Jaan Tallinn.
- Centre on Long-Term Risk (CLR).
- Centre on Long-Term Resilience (CLTR) (no, this is not confusing at all, it's all in your head).
A large number of animal welfare charities, which I won't bother listing, except to point out the meta-level Animal Charity Evaluators.
A large number of global health charities, including ones that are simply highly recommended (and funded) by GiveWell (in particular Against Malaria Foundation, which routinely tops GiveWell rankings) to ones that also trace their roots solidly to EA.
Organisations working on AI risk, including:
- Anthropic, working on interpreting machine learning models (a program led by Chris Olah) and more general empirically-grounded, engineering-based machine learning safety research.
- Redwood Research, a smaller company also doing empirical machine learning safety work (and running great ML bootcamps on the side).
- Centre for Human-compatible AI (CHAI), a research institute at UC Berkeley.
- Machine Intelligence Research Institute (MIRI), the original AI safety organisation that was founded in 2000 and hence managed to snap up the enviable domain name "intelligence.org". MIRI's research leans much more mathematical and theory-based than that of most other AI alignment organisations.
- Conjecture, a new organisation focusing on the work that is most relevant if advanced AI is surprisingly close.
- (OpenAI and DeepMind, the two leading AI companies, both have safety teams that include people very committed to working on existential risk concerns. However, neither is primarily an AI safety company, and both weight advanced AI risks at a company-level less than the other companies on this list. OpenAI in particular currently sees AI risks more through the near-term lens of making sure AI systems and their benefits are widely accessible to everyone, rather than focusing on making sure AI systems don't doom us all (though I guess that too would be a suitably equitable outcome?).)
Alvea, a recent vaccine startup, with the eventual goal of enabling faster vaccine roll-out in the next pandemic.
Charity Entrepreneurship, a charity incubator that has incubated many charities, including for example Healthier Hens (farmed chicken welfare), the Happier Lives Institute (helping policymakers figure out how to increase people's happiness), and Lead Exposure Elimination Project (working to reduce lead exposure in developing countries).
SparkWave, an incubator for software companies that are solving important problems.
Effective Thesis, trying to save students from writing pointless theses.
Founders Pledge, which helps entrepreneurs commit to giving away money when they sell their companies and donate that money effectively (not to be confused with the more famous Giving Pledge). (So far, about $475M has been donated in this way)
Legal Priorities Project, which looks at the legal aspects of trying to do everything else.
ALLFED (ALLiance to Feed Earth in Disasters), which aims to be useful in situations where hundreds of millions of people or more are suddenly without food, and which has successfully found the best conceivable name for an organisation that does this.
Our World in Data (OWID), the world's best provider of data and graphs on important global issues. I'm not quite sure how interrelated they are with EA directly, but their founder posts on the EA Forum about OWID articles on very EA-related ideas, so there's definitely some overlap.
All-Party Parliamentary Group for Future Generations in the UK government.
A bunch of organisations focused on getting people interested in the world's biggest problems and teaching them various skills:
- Atlas Fellowships, a recent initiative for high-schoolers.
- A collection of Existential Risk Initiatives running, among other things, summer internships where people (mostly undergraduate/postgraduate students) work with mentors on existential risk research: SERI (Stanford), CHERI (Switzerland), CERI (Cambridge), and a newer one at the University of Chicago which I can't yet find a website for, but which will almost certainly not help with the naming situation when it arrives. Thankfully, rumours say there will be soon be a YETI (Yale Existential Threats Initiative), which is a cool and (thank god!) unconfusable name.

Since EA is not a monolithic centralised thing, there is plenty of fuzziness in what counts as an EA organisation, and definitely no official list (and therefore if you're reading this and your org is not on the list, you shouldn't complain - many great orgs were left out). The common features among many of them are:

Some causal link to stuff that at some point interacted with the original Oxford cluster.
Emphasis on taking altruistic actions with a focus on effectiveness.
Emphasis on quantifying the impact of altruistic actions.
Emphasis on a scope that is in some way particularly wide-ranging or unconventional, either in sheer size or time (existential risks, the long-run future), geography (focusing on the entire world and often particularly developing countries rather than the organisation's neighbourhood), or in what is cared about (farmed animal welfare, wild animal welfare, the lives of people in the far future, and whatever the hell these people are doing).

The biggest EA events are the Effective Altruism Global (EAG) conferences organised by CEA. These usually happen several times a year, mostly in the UK and the Bay Area, though locally-organised EAGx conferences have more diverse locations.

The Situation

EA has a strong presence especially at top universities. There are large and active EA student groups in the Bay Area, Cambridge, Oxford, and London, but also increasingly New York, Boston, and Berlin, and many smaller local groups (you can find them listed here). The profile of EA in the general public is very small. However, the concentration of talent is extremely high. Add to this the existence of funding bodies with tens of billions of dollars of assets that are firmly aligned with EA principles, and you can expect a lot of important, impactful work to come from people and organisations with some connection to EA in the coming years.

It's important to keep in mind that EA is not a centralised thing. There is no EA tsar, or any single EA organisation that runs the show, or any official EA consensus. It's a cluster of many people and efforts that are joined mainly by caring about the types of ideas I talk about here.

Demographics

This website has a good overview, based on whoever filled in a survey posted to the EA Forum. The gender ratio is unfortunately somewhat skewed (70% male); for comparison, this is roughly the same as for philosophy degrees and better than for software developers (90% male (!?)). Half are 25-34. Over 70% are politically left or centre-left, and few are centre-right (2.5%) or right (1%), though almost 10% are libertarians. Education levels are high, and the five most common degrees are, in order: CS, maths, economics, social science, and philosophy. Most are from western countries.

Culture

EA culture places a lot of weight on epistemics: being honest about your uncertainties, clear about what would make you change your mind on an issue, aware of biases and fallacies, trying to avoid group-think, focusing on the substance of the issue rather than who said it or why, and arguing with the goal of finding the truth rather than defending your pet argument or cause. This is a lofty set of goals. To an astonishing but imperfect extent, and more so than any other concentration of people or writing (except from the equally-good Rationalist community mentioned above) that I've ever had any exposure to, EA succeeds at this.

Related to this, but also turbo-charged by general cultural memes of "critiquing cherished ideas is important", there's a high emphasis of constantly being on the lookout for ways in which you yourself or (in particular) common EA ideas might be wrong. If you read down the list of top-voted posts on the EA Forum, they are about:

Potential failure modes resulting from the influx of money into EA.
High EA spending being a problem for optics and epistemics.
Things current EA community-building efforts are doing wrong, and why this is especially worrying.
Reasons why some key concepts in EA are used misleadingly and unnecessarily.
A list of critiques of EA that someone wants expanded.
A catalogue of personal mistakes that someone made while trying to do good (the key one being that they focused too much on working only at EA organisations).
An argument that standard EA ways of trying to help with developing country development are not as effective as other ways of helping.
And only in 8th place, something that isn't a critique of EA: a post about the historical case of early nuclear weapons researchers mistakenly assuming they were in a race, and implications for today's AI researchers

(If you adjust upvotes on EA Forum posts to account for how active the forum was at the time, the most popular post of all time is Effective Altruism is a Question (not an ideology). It's not a critique, but it's also very revealing.)

Right now, there's an active contest with $100k in prizes for the best critiques of EA. This sort of stuff happens enough that Scott Alexander satirises it here.

This might give the impression of EA as excessively-introspective and self-doubting. There is some truth to the introspectiveness part. However, the general EA attitude is also one of making bold (but reasoned) bets. Recall SBF's altruistically-motivated risk taking, or more generally the fact that one of Open Philanthropy's foundational ideas is to support reasonable-but-risky projects, or even more generally the way EA in general is set up around unconventional and ambitious attempts at doing good.

If I had to name the two most important obstacles to doing important things in the real world, they would be (1) reasoning poorly and not updating enough based on feedback/evidence, and (2) being too risk-averse and insufficiently ambitious. Some cultures, like the good parts of academia, do well on avoiding (1). Others - imagine for example gung-ho Silicon Valley tech entrepreneurs - do well on avoiding (2). Though EA culture varies a lot between places and organisations, on the whole it seems uniquely good at combining these two aspects.

There are differences in culture between different EA hubs/clusters. I mainly have experience of the UK (and especially Cambridge) cluster and the Bay Area one. In the Bay, there is significant overlap between the EA and Rationalist communities, whereas in the UK there's mainly just EA in my experience. The Bay also leans more AI-focused and maybe weirder on average (or perhaps it's just a European vs American culture thing), while in the UK there are many AI-focused people but also many focused on biological fields (biosecurity & alternative proteins) or policy.

Axes & trends

"Long-termism" vs "near-termism"

In the history of EA, it's hard not to see an invasion of ideas from the planetary-scale futurism that people like Nick Bostrom and Eliezer Yudkowsky talked about, and Toby Ord (author of The Precipice) and Will MacAskill (about to drop a new book on why we should prioritise the long-term future) increasingly focus on. Holden Karnofsky, who for a long time ran GiveWell, perhaps the most empirically-minded and global health -focused EA organisation, is now co-CEO of Open Philanthropy, responsible specifically for the speculative futurist parts of Open Philanthropy's mission, and writes blog posts about the grand future of humanity and why the coming century may be especially critical (though he is careful to say that he doesn't think the other half of Open Philanthropy's work, or global health / animal welfare -focused charity more generally, is not important).

Perhaps this makes sense. In the long run at least, it seems sensible to expect the largest-scale ideas to be the most important ones. The rate of technological progress, especially in AI, has also been shrinking just what "the long run" means when expressed in years.

The common label applied to the ends of the radical-future-technology-focused versus concrete-current-problem-focused axis are "long-termist" and "near-termist" respectively. The name "long-termist" comes from arguments that the key moral priority is making sure we get to a secure, sustainable, and flourishing future civilisation (since such a civilisation could be very large and long-lasting, and therefore enable an enormous amount of happiness and flourishing). However, the names are a bit misleading. All existential risk work is often lumped into the long-termist category, so we have "long-termist" AI safety people trying to prevent a catastrophe many of them think will probably happen in the next three decades if it happens at all, and "near-termist" global health and development people trying to help the development of countries over a century.

(Many also point out that caring about existential risks does not require the long-termist philosophy.)

Frugality vs spending

The culture of the original Oxford cluster was very frugal, and focused on monetary donations. For example, after founding Giving What We Can (GWWC), Toby Ord donated everything he earned above £ 18 000 to charity (and has continued on a similar track since then). Because of the low available funding, the focus was very much on marginal impact - trying to figure out what existing opportunity could best use one extra dollar.

Since then, the arrival of billionaires meant that funding worries went down.

(For example, "earning to give" has gone down a lot in 80 000 Hours' career rankings. This is the idea that deliberately going into a high-earning job (often in finance) and then donating a significant fraction of your salary to top charities is one of the most effective ways to do good, and a path that many pursued based on the recommendation by 80 000 Hours.)

The bottleneck has moved (or at least been widely perceived to move) from funding to the time of people working on the key problems; instead of focusing on where to allocate the marginal dollar, the focus has somewhat shifted to how to allocate the marginal minute of time. In particular, the core argument of "imagine how far this particular dollar could go if used to effectively improve health in developing countries" has been joined by the argument of "there are plausible civilisation-ending disasters that could happen in the coming decades and require hard work to solve; imagine how sad it would be if we failed to work fast enough because we didn't spend that one dollar".

As a concrete example, Redwood Research organised a machine learning bootcamp aimed at upskilling people for AI safety jobs in January 2021 (and will be running more in the future, something I strongly endorse). Thirty participants (including myself) were flown into Berkeley from around the world, and spent three weeks living in a hotel while taking daily high-reliability COVID tests that I'm pretty sure weren't entirely free (and of course spending the days programming hard and talking about AI alignment (and eating free snack bars at the office - or maybe that last part was just me)). This wasn't cheap, nor was it a typical way to spend charity money (Redwood is funded by Open Philanthropy). But if prediction markets are right that generally-capable AI starts emerging around the end of this decade, and you take one look at the current state of progress on the AI alignment problem, and you do happen to have access to funding - well, it would be sad if being too stingy is how our civilisation failed.

Concretely, to look at only one consequence, Redwood made several hires from the bootcamp, despite the fact that many of the participants (myself included) were still students or otherwise not looking for work. Given how difficult but important hiring is, especially for high-skill technical roles, and the serious possibility that organisations like Redwood making progress is important for solving AI safety problems that might play a big role in how the future of humanity shapes out, this seems like a win.

However, at the same time, it is of course worth keeping in mind that humans are pretty good at thinking to themselves "man, wouldn't it be great if people like me had lots of money?" This, as well as the PR and culture problems of having lots of money sloshing around, are discussed in many EA Forum posts. We already saw that this one (by MacAskill) and this one are, respectively, the first- and second-most upvoted posts of all time on the EA Forum.

Ultimately, the whole point of Effective Altruism is, well, being effective about altruism. Whether EA funders spend quickly or slowly, and whichever causes they target, if they fail to find the best opportunities to do good with money, they haven't succeeded - and they know it.

(It should be noted that the GWWC criterion of donating 10% of your income to charity is met by many EAs, including ones far in space or culture from the original Oxford cluster, and global health is a leading donation target.)

Thinking vs doing

The fact that there's more resources - including not just funding but also the time of talented people - also means that the focus is less on marginal impact. If you have £10 and an hour, then figuring out what existing opportunity has the best ratio of good stuff per dollar is the best bet. But if you have, say, £10 000 000 and ten thousand work hours, then there's also the option of starting new projects and organisations.

(A lot of the weirdness of EA thinking comes from its marginalist nature. The things that are most valuable per marginal unit of money/time/effort are generally the things that are most neglected, and neglected things tend to seem weird because, by definition, few people care about them. For example, the early EA focus basically completely eschewed developed country problems because per-dollar marginal cost-effectiveness was highest in poor countries; from the outside, this may look like a strangely harsh and idiosyncratic selection of causes. With increasing resources, it makes more sense to pursue larger-scale changes, and larger-scale changes sometimes look like more traditional and intuitive causes. For example, while developing country health and projects trying to improve the long-term future are Open Philanthropy's main focuses, they spend some of their massive budget on US criminal justice reform, land-use policy, and immigration policy.) (Though note that the effectiveness of the criminal justice program has come under criticism.)

Since EA now has the resources to start many new organisations, there's also starting to be a shift from EA being very research-oriented to having more and more real-world projects. Even though one of the key EA insights is that doing good requires lots of careful thinking in addition to good intentions and execution ability, the ultimate metric of success is actually improving the world, and that takes steps that aren't just research. I think EA has some headwind to overcome here; as a movement inspired, started, and (early on) largely consisting of philosophers, it has been remarkably successful in appealing to philosophical people and researchers, but not entrepreneurs or operations people to the same extent. I think it is a very welcome trend that this is starting to shift.

Exciting Attempt for Enabling Action on Essential Activities

EA is definitely not ideal, and it is also not guaranteed to survive. Like any real-world community, it is not a timeless platonic ideal of pure perfection that burst into the world fully formed, but rather something with an idiosyncratic history, that consists of real people, and has certain biases and cultural oddities. Still, I think it is probably the most exciting and useful thing in the world to be engaged with.

Information theory 3: channel coding

2022-06-25T20:50:00.000+01:00

7.9k words, including equations (~41 minutes)

We've looked at basic information theory concepts here, and at source coding (i.e. compressing data without caring about noise) here. Now we turn to channel coding.

The purpose of channel coding is to make information robust against any possible noise in the channel.

Noisy channel model

The noisy channel model looks like the following:

The channel can be anything: electronic signals sent down a wire, messages sent by post, or the passage of time. What's important is that it is discrete (we will look at the continuous case later), and there are some transition probabilities from every symbol that can go into the channel to every symbol that can come out. Often, the set of symbols of the inputs is the same as the set of symbols of the outputs.

The capacity $$C$$ of a noisy channel is defined as $$$ C = \max_{p_x} I(X;Y) = \max_{p_x} \big(H(Y) - H(Y|X)\big). $$$ It's intuitive that this definition involves the mutual information $$I$$ (see the first post for the definition and explanation), since we care about how much information $$X$$ transfers to $$Y$$, and how much $$Y$$ tells us about $$X$$. What might be less obvious is why we take the maximum over possible input probability distributions $$p_x$$. This is because the mutual information $$I(X;Y)$$ depends on the probability distributions of $$X$$ and $$Y$$. We can only control what we send - $$X$$ - so we want to adjust that to maximise the mutual information. Intuitively, if you're typing on a keyboard with all keys working normally except the "i" key results in a random character being inserted, shifting your typing away from using the "i" key is good for information transfer. Better to wr1te l1ke th1s than to not be able to reliably transfer information.

However, the only real way to understand why this definition makes sense is to look at the noisy channel coding theorem. This theorem tells us, among other things, that for any rate (measured in bits per symbol) smaller than the capacity $$C$$, for a large enough code length we can get a probability of error as small as we like.

With noisy channels, we often work with block codes. The idea is that you encode some shorter sequence of bits as a longer sequence of bits, and if you've designed this well, it adds redundancy. An $$(n,k)$$ block code is one that replaces chunks of $$k$$ bits with chunks of $$n$$ bits.

Hamming coding

Before we look at the noisy channel theorem, here's a simple code that is redundant to error: transmit every bit 3 times. Instead of sending 010, send 000111000. If the receiver receives 010111000, they can tell that bit 2 probably had an error, and should be a zero. The problem is that you triple your message length.

Hamming codes are a method for achieving the same - the ability to detect and correct single-bit errors, and the ability to detect but not properly correct two-bit errors - while sending a number of excess bits that grows only logarithmically with message length. For long enough messages, this is very efficient; if you're sending over 250 bits, it only costs you a 3% longer message to insure them against single-bit errors.

The catch is that the probability of having only one or fewer errors in a message declines exponentially with message length, so this is less impressive than it might sound at first.

The basic idea of most error correction codes is a parity bit. A parity bit $$b$$ is typically the XOR (exclusive-or) of a bunch of other bits $$b_1, b_2, \ldots$$, written $$b = b_1 + b_2 + \ldots$$ (we use $$+$$ for XOR because doing addition in base-2 while throwing away the carry is the same is taking the XOR). A parity bit over a set of bits $$B = {b_1, b_2, \ldots}$$ is 1 if the set of bits contains an odd number of 1s, and otherwise 0 (hence the word "parity").

Consider sending a 3-bit message where the first two bits are data and the third is a parity bit. If the message is 110, we check that, indeed, there's an even number of 1s among the data bits, so it checks out that the parity bit is 0. If the message were 111, we'd know that something had gone wrong (though we wouldn't be able to fix it, since it could have started out with any of 011, 101, or 110 and suffered a one-bit flip - and note that we can never entirely rule out that 000 flipped to 111, though since error probability is generally small in any case we're interested in, this would be extremely unlikely).

The efficiency of Hamming codes comes from the fact that we have parity bits that check other parity bits.

A $$(T, D)$$ Hamming code is one that sends $$T$$ bits in total of which $$D$$ are data bits and the remaining $$T - D$$ are parity bits. There exists a $$(2^m - 1, 2^m - m - 1)$$ Hamming code for positive integer $$m$$. Note that $$m$$ is the number of parity bits.

The default way to construct a Hamming code is that the $$m$$th parity bit is in position $$2^m - 1$$, and is set such that the parity of bits whose position's binary representation has a 1 in the $$m$$th last position is zero.

(Above, you see bits 1 through 15, with parity bits in positions 1, 2, 4, and 8. Underneath each bit, for every parity bit there is a 0 if that bit is not included in the parity set of that parity bit, and otherwise a 1. For example, since b4 is set for bits 8-15, b4 is a 1 if there's an odd number of 1s in bits 8-15 inclusive and otherwise 0. Note that the columns spell out the numbers 1 through 15 in binary.)

For example, a $$(7,4)$$ Hamming code for the 4 bits of data 0101 would first become $$$ \texttt{ b1 b2 0 b3 1 0 1} $$$ and then we'd set $$b_1 = 0$$ to make there be an even number of 1s across the 1st, 3rd, 5th, and 7th positions, set $$b_2 = 1$$ to do the same over the 2nd, 3rd, 6th, and 7th positions, and then finally set $$b_3 = 0$$ to do the same over the 4th, 5th, 6th, and 7th positions.

To correct errors, we have the following rule: sum up the positions of the parity bits that do not match. For example, if parity bit 3 is set wrong relative to the rest of the message, you flip that bit; everything will be fine after we clear this false alarm. But if parity bit 2 is also set wrong, then you take their positions, 2 (for bit 2) and 4 (for bit 3) and add them to get 6, and flip the sixth bit to correct the error. This makes sense because the sixth bit is the only bit covered by both parity bits 2 and 3, and only parity bits 2 and 3.

Though the above scheme is elegant and extensible, it's possible to design other Hamming codes. The length requirements remain - the code is a $$(2^m - 1, 2^m - m - 1)$$ code if we allow $$m$$ parity bits - but we can assign any "domain" over the bits to each parity bit as long as each bit belongs to the domain a unique set of parity bits.

Noisy channel coding theorem

We can measure any noisy channel code we choose based on two numbers. The first is its probability of error ($$p_e$$ above). The second is its rate: how many bits of information are transferred for each symbol sent. The three parts of the theorem combine to divide that space up into a possible and impossible region:

The first part of the theorem says that the region marked "I" is possible. Now there are points of this region that are more interesting than others. Yes, we can make a code that has a capacity of 0 and a very high error rate; just send the same symbol all the time. This is point (a), and we don't care about it.

What's more interesting, and perhaps not even intuitively obvious at all, is that we can get to a point (b): an arbitrarily low error rate, despite the fact that we're sending information. The maximum information rate we can achieve while keeping the error probability very low turns out to be the capacity, $$C = \max_{p_X} I(X:Y)$$.

The second part of the theorem gives us a lower bound on error rate if we dare try for a rate that is greater than the capacity. It tells us we can make codes that achieve point (c) on the graph.

Finally, the third part of the theorem proves that we can't get to points like (x), that have an error rate that is too low given how much over the channel capacity their rate is.

We started the proof of the source coding theorem by considering a simple construction (the $$\delta$$-sufficient subset) first for a single character and then extending it to blocks. We're going to do something similar now.

Noisy typewriters

A noisy typewriter over the alphabet $${0, \ldots, n}$$ is a device where if you press the key for $$i$$, it inputs one of the following with equal probability:

$$i - 1 \mod n$$
$$i \mod n$$
$$i + 1 \mod n$$

With a 6-symbol alphabet, we can illustrate its transition probability matrix as a heatmap:

The colour scale is blue (low) to yellow (high). The reading order is meant to be that each column represents the probability distribution of output symbols given an input symbol.

First, can we transmit information without error at all? Yes: choose a code where you only send the symbol corresponding to the second and fifth columns. Based on the heatmap, these can map to symbols number 1-3 and 4-6 respectively; there is no possibility of confusion. The cost is that instead of being able to send one of six symbols, or $$\log 6$$ bits of information per symbol, we can now only send one of two, or $$\log 2 = 1$$ bits of information per symbol.

The capacity is $$\max_{p_X} \big( H(Y) - H(Y|X) \big)$$. Now if $$p_X$$ is the distribution we considered above - assigning half the probability to 2 and half to 5 - then by the transition matrix we see that $$H(Y)$$ will be uniformly distributed, so it is $$\log 6$$. $$H(Y|X)$$ is $$\log 3$$ in our example code, because we see that if we always send either symbol 2 or 5, then in both cases $$Y$$ is restricted to a set of 3 values. With some more work you can show that this is in fact an optimal choice of $$p_X$$. The capacity turns out to be $$\log 6 - \log 3 = \log 2$$ bits. The error probability is zero. We see that we can indeed transfer information without error even if we have a noisy channel.

But hold on, the noisy typewriter has a very specific type of error: there's an absolute certainty that if we transmit a 2 we can't get symbols 3-6 out, and so on. Intuitively, here we can partition the space of channel outputs in such that there is no overlap in the sets of which channel input each channel output could have come from. It seems like with a messier transition matrix that doesn't have this nice property, this just isn't true. For example, what if we have a binary symmetric channel, with a transition matrix like this:

Unfortunately the blue = lowest, yellow = highest color scheme is not very informative; the transition matrix looks like this, where $$p_e$$ is the probability of error: $$$ \begin{bmatrix} 1 - p_e & p_e \ p_e & 1 - p_e \end{bmatrix} $$$ Here nothing is certain: a 0 can become a 1, and a 1 can become a zero.

However, this is what we get if we use this transition probability matrix on every symbol in a string of length 4, with the strings going in the order 0000, 0001, 0010, 0011, ..., 1111 along both the top and left side of the matrix:

For example, the second column shows the probabilities (blue = low, yellow = high) for what you get in the output channel if 0001 is sent as a message. The highest value is for the second entry, 0001, because we have $$p_e < 0.5$$ so $$p_e < 1 - p_e$$ so the single likeliest outcome is for no changes, which has probability $$(1-p_e)^4$$. The second highest values are for the first (0000), third (0011), fifth (0101), and seventh (1001) entries, since these all involve one flip and have probability $$p_e (1-p_e)^3$$ individually and probability $${4 \choose 1} p_e (1-p_e)^3 = 4 p_e (1 - p_e)^3$$ together.

If we dial up the number, the pattern becomes clearer; here's the equivalent diagram for messages of length 8:

The Return of the Typical Set

There are two key points.

The first is that more and more of the probability is concentrated along the diagonal (plus some other diagonals further from the main diagonal. We can technically have any transformation, even 11111111 to 00000000 when we send a message through the channel, but most of these transformations are extremely unlikely. The transition matrix starts looking more and more like the noisy typewriter, where for each message only one subset of received messages has non-tiny likelihood.

The second key point is that it is time for ... the return of the typical set. Recall from the second post in this series that the $$\epsilon$$-typical set of length-$$n$$ strings over an alphabet $$A$$ is defined as $$$ T_{n\epsilon} = \left\{x^n \in A^n \text{ such that } \left|-\frac{1}{n} \log p(x^n) - H(X)\right| \le \epsilon\right\}. $$$ $$-\frac{1}{n} \log p(x^n)$$ is equal to $$-\frac{1}{n} \sum_{i=1}^n \log p(x_i)$$ by independence, and this in turn is an estimator for $$\mathbb{E}[-\log p(X)] = H(X)$$. You can therefore read $$-\frac{1}{n}\log p(x^n)$$ as the "empirical entropy"; it's what we'd guess the (per-symbol) entropy of $$X$$ to be if we did a slightly weird thing of estimating the entropy while knowing the probability model but only using it to determine the information content $$-\log p$$, and estimating the $$p_i$$s in $$-\sum_i p_i \log p_i$$ instead by only using how often they occur in $$x^n$$ (rather than the probability model).

Now the big results about typical sets was that as $$n \to \infty$$, the probability $$P(x^n \sim X^n \in T_{n \epsilon}) \to 1$$, and therefore for large $$n$$, most of the probability mass is concentrated in the approximately $$2^{nH(X)}$$ strings of probability approximately $$2^{-nH(X)}$$ that lie in the typical set.

We can define a similar notion of jointly $$\epsilon$$-typical sets, denoted $$J_{n\epsilon}$$ and defined by analogy with $$T_{n\epsilon}$$ as $$$ J_{n\epsilon} = \left\{ (x^n, y^n) \in A^n \times A^n \text{ such that } \left| - \frac{1}{n} \log P(x^n, y^n) - H(X, Y)\right| \le \epsilon \right\}. $$$ Like typical sets, jointly typical sets give us similar nice properties:

If $$x^n, y^n$$ are drawn from the joint distribution (e.g. you first draw an $$x^n$$, then apply the transition matrix probabilities to generate a $$y^n$$ based on it), then the probability that $$(x^n, y^n) \in J_{n \epsilon}$$ goes to 1 as $$n \to \infty$$. The proof is almost the same as the corresponding proof for typical sets (hint: law of large numbers).
The number $$|J_{n\epsilon}|$$ of jointly typical sequence pairs $$(x^n, y^n)$$ is about $$2^{nH(X,Y)}$$, and specifically is upper-bounded by $$2^{n(H(X,Y) + \epsilon)}$$. The proof is the same as for the typical set case.
If $$x^n$$ and $$y^n$$ are _independently drawn_ from the distributions $$p_X$$ and $$p_Y$$, the probability that they are jointly typical is about $$2^{-nI(X;Y)}$$. The specific upper bound is $$2^{-n(I(X;Y) - 3 \epsilon)}$$, and can be shown straightforwardly (remembering some of the identities in post 1) from $$$ P((x^n, y^n) \in J_{n \epsilon}) = \sum_{(x^n, y^n) \in J_{n\epsilon}} p(x^n) p(y^n)$$$ $$$\le |J_{n\epsilon}| 2^{-n(H(X) - \epsilon)} 2^{-n(H(X) - \epsilon)}$$$ $$$ \le 2^{n(H(X,Y) + \epsilon)} 2^{-n(H(X) - \epsilon)} 2^{-n(H(X) - \epsilon)}$$$ $$$= 2^{n(H(X,Y) - H(X) - H(Y) + 3 \epsilon)}$$$ $$$= 2^{-n(I(X,Y) - 3 \epsilon)} $$$

Armed with this definition, we can now interpret what was happening in the diagrams above: as we increase the length of the messages, more and more of the probability mass is concentrated in jointly typical sequences, by the first property above. The third property tells us that if we ignore the dependence between $$x^n$$ and $$y^n$$ - picking a square roughly at random in the diagrams above - we are, however, extremely unlikely to pick a square corresponding to a jointly typical pair.

Here is the noisy typewriter for 6 symbols, for length-4 messages coming in and out of the channel:

(As a reminder of the interpretation: each column represents the probablity distribution, shaded blue to yelow, for one input message, and the $$6^4 = 1296$$ possible messages we have with this message length (4) and alphabet size (6) are ranked in alphabetical order along both the top and left side of the grid)

The highest probability is still yellow, but you can barely see it. Most of the probability mass is in the medium-probability sequences (our jointly typical set), forming a small subset of the possible channel outputs for each input.

In the limit, therefore, the transition probability matrix for a block code of an arbitrary symbol transition probability matrix looks a lot like the noisy typewriter. This suggests a decoding method: if we see $$y^n$$, we decode it as $$x^n$$ if $$(x^n, y^n)$$ are in the jointly typical set, and there is no other $${x'}^n$$ such that $$({x'}^n, y^n)$$ are also jointly typical. As with the noisy typewriter example, we have to discard a lot of the $$x^n$$, so that the set of $$x^n$$ that a given $$y^n$$ could've come to hopefully contains only a single element, so we match the second condition in the decoding rule.

Theorem outline

Now we will state the exact form of the noisy channel coding theorem. It has three parts:

A discrete memoryless channel has a non-negative capacity $$C$$ such that for any $$\varepsilon > 0$$ and $$R < C$$, for large enough $$n$$ there's a block code of length $$N$$ and rate $$\geq R$$ and a decoder such that error probability is $$< \varepsilon$$.

We will see that this follows from the points about jointly typical sets and the decoding scheme based on them that we discussed above. The only thing really missing is an argument that the error rate of jointly typical coding can be made arbitrarily low as long as $$R < C$$. We will see that Shannon used perhaps the most insane trick in all of 20th century applied maths to side-step having to actually think of a specific code to prove this.
If error probability per bit $$p_e$$ is acceptable, rates up to $$$ R(p_e) = \frac{C}{1 - H_2(p_e)}. $$$ are possible. We will prove this by
For any $$p_e$$, rates $$> R(p_e)$$ are not possible.

As we saw earlier, these three parts together divide up the space of possible rate-and-error combinations for codes into three parts:

Proof of Part I: turning noisy channels noiseless

We want to prove that we can get an arbitrarily low error rate if the rate (bits of information per symbol) is smaller than the channel capacity, which we've defined as $$C = \max_{p_X} I(X;Y)$$.

We could do this by thinking up a code and then calculating the probability of error per length-$$n$$ block for it. This is hard though.

Here's what Shannon did instead: he started by considering a random block code, and then proved stuff about its average error.

What do we mean by a "random block code"? Recall that an $$(n,k)$$ block code is one that encodes length-$$k$$ message as length-$$n$$ messages. Since the rate $$r = \frac{k}{n}$$, we can talk about $$(n, nr)$$ block codes.

What the encoder is doing is mapping length-$$k$$ strings to length-$$n$$ strings. In the general case, it has some lookup table, with $$2^k = 2^{nr}$$ entries, each of length $$n$$. A "random code" means that we generate the entries of this lookup table from the distribution $$P(x^n) = \prod_{i=1}^n p(x_i)$$. We will refer to the encoder as $$E$$.

(In the above diagram, the dots in the column represent probabilities of different outputs given the $$x^n$$ that is taken as input. Different values of $$w^k$$ would be mapped by the encoder to different columns $$x^n$$ in the square.)

Richard Hamming (yes, the Hamming codes person) mentions this trick in his famous talk "You and Your Research":

Courage is one of the things that Shannon had supremely. You have only to think of his major theorem. He wants to create a method of coding, but he doesn't know what to do so he makes a random code. Then he is stuck. And then he asks the impossible question, "What would the average random code do?'' He then proves that the average code is arbitrarily good, and that therefore there must be at least one good code. Who but a man of infinite courage could have dared to think those thoughts?

Perhaps it doesn't quite take infinite courage, but it is definitely one hell of a simplifying trick - and the remarkable trick is that it works.

Here's how: let the average probability of error in decoding one of our blocks be $$\bar{p_e}$$. If we have a message $$w^k$$, the steps that happen are:

We use the (randomly-constructed) encoder $$E$$ to map it to an $$x^{n}$$ using $$x^n = E(w^k)$$. Note that the set of values that $$E(w^k)$$, can take, $$\text{Range}(E)$$, is a subset of the set of values of all possible $$x^n$$.
$$x^n$$ passes through the channel to become a $$y^n$$, according to the probabilities in a block transition probability matrices like the ones pictured above.
We guess that $$y^n$$ came from the $$x'^n \in \text{Range}(E)$$ such that the pair $$(x'^n, y^n)$$ is in the jointly typical set $$J_{n\epsilon}$$.
1. If there isn't such an $$x'^n$$, we fail. In the diagram below, this happens if we get $$y_3$$, since $$\text{Range}(E) = \{x_1, x_2, x_3, x_4\}$$ does not contain anything jointly-typical with $$y_3$$.
2. If there is at least one wrong $$x'^n$$, we fail. In the diagram below, this happens if we get $$y_2$$, since both $$x_2$$ and $$x_3$$ are codewords the encoder might use that are jointly typical with $$y_2$$, so we don't know which one was originally transmitted over the channel.
We use the decoder, which is simply the inverse of the encoder, to map to our guess $$\bar{w}^k$$ of what the original string was. Since $$x'^n \in \text{Range}(E)$$, the inverse of the encoder, $$E^{-1}$$, must be defined at $$x'^n$$. (Note that there is a chance, but a negligibly small one as $$n \to \infty$$, that in our encoder generation process we created the same codeword for two different strings, in which case the decoder can't be deterministic. We can say either: we don't care about this, because the probability of a collision goes to zero, or we can tweak the generation scheme to regenerate if there's a repeat; $$n \ge k$$ so we can always construct a repeat-free encoder.)

Therefore the two sources of error that we care about are:

On step 3, we get a $$y^n$$ that is not jointly typical with the original $$x^n$$. Since $$P((x^n, y^n) \geq 1 - \delta$$ for some $$\delta$$ that we can make arbitrarily small by increasing $$n$$, we can upper-bound this probability with $$\delta$$.
On step 3, we get a $$y^n$$ that is jointly typical with at least one wrong $$x'^n$$. We saw above that one of the properties of the jointly typical set is that if $$x^n$$ and $$y^n$$ are selected independently rather than together, the probability that they are jointly typical is only $$2^{-n(I(X;Y) - 3 \epsilon)}$$. Therefore we can upper-bound this error probability by summing the probability of "accidental" joint-typicality over the $$2^k - 1$$ possible messages that are not the original message $$w^k$$. This sum is $$$ \sum_{w'^k \ne w^k} 2^{-n(I(X;Y) - 3 \epsilon)}$$$ $$$\le (2^{k} - 1) 2^{-n(I(X;Y) - 3 \epsilon)}$$$ $$$\le 2^{nr}2^{- n (I(X;Y) - 3 \epsilon)}$$$ $$$= 2^{nr - n(I(X;Y) - 3 \epsilon)} $$$

We have the probabilities of two events, so the probability of at least one of them happening is smaller than or equal to their sum: $$$ \bar{p}_e \le \delta + 2^{nr - n(I(X;Y) - 3 \epsilon)} $$$ We know we can make $$\delta$$ however small we want. We can see that if $$r < I(X;Y) - 3 \epsilon$$, then the exponent is negative and increasing $$n$$ can also make the second term negligible. This is almost Part I of the theorem, which was:

A discrete memoryless channel has a non-negative capacity $$C=\max_{p_X} I(X;Y)$$ such that for any $$\epsilon > 0$$ and $$R < C$$, for large enough $$n$$ there's a block code of length $$n$$ and rate $$\geq R$$ and a decoder such that error probability is $$< \varepsilon$$.

First, to put a bound involving only one constant on $$\bar{p}_e$$, let's arbitrarily say that we increase $$n$$ until $$2^{nr - n(I(X;Y) - 3 \epsilon)} \le \delta$$. Then we have $$$ \bar{p}_e \le 2 \delta $$$ Second, we don't care about average error probability over codes, we care about the existence of a single code that's good. We can realise that if the average error probability $$\le 2 \delta$$, there must exist at least one code, call it $$C^*$$, with average error probability $$\le 2 \delta$$.

Third, we don't care about average error probability over messages, but maximal error probability, so that we can get the strict $$< \varepsilon$$ error probability in the theorem. This is trickier to bound, since $$C^*$$ might somehow have very low error probability with most messages, but some insane error probability for one particular message.

However, here again Shannon jumps to the rescue with a bold trick: throw out half the codewords, specifically the ones with highest error probability. Since the average error probability is $$\le 2 \delta$$, every codeword in the best half of codewords must have error probability $$\le 4 \delta$$, because otherwise the one-half of best codes would contribute more than $$\frac{1}{2} \times 4 \delta = 2 \delta$$ to the average error on their own.

What about the effect on our rate of throwing out half the codewords? Previously we had $$2^k = 2^{nr}$$ codewords; after throwing out half we have $$2^{nr - 1}$$, so our rate has gone from $$\frac{k}{n} = r$$ to $$\frac{nr - 1}{n} = r - \frac{1}{n}$$, a negligible decrease if $$n$$ is large.

What we now have is this: as $$n \to \infty$$, we can get any rate $$R < I(X;Y) - 3 \epsilon$$ with maximal error probability $$\le 4 \delta$$, and both $$\delta$$ and $$\epsilon$$ can be decreased arbitrarily close to zero by increasing $$n$$. Since we can set the distribution of $$X$$ to whatever we like (this is why it matters that we construct our random encoder by sampling from $$X$$ repeatedly), we can make $$I(X;Y) = \underset{p_X}{\max} I(X;Y)$$.

This is the first and most involved part of the theorem. It is also remarkably lazy: at no point do we have to go and construct an actual code, we just sit in our armchairs and philosophise about the average error probability of random codes.

Proof of Part II: achievable rates if you accept non-zero error

Here's a simple code that achieves a rate higher than the capacity in a noiseless binary channel:

The sender maps each length-$$nr$$ block to a block of length $$n$$ by cutting off the last $$nr - n$$ symbols.
The receiver reads $$n$$ symbols with error probability $$0$$, and then guesses the remaining $$nr - n$$ with bit error probability $$\frac{1}{2}$$ for each symbol. (Note; we're concerned with bit error here, unlike block error in the previous proof)

An intuition you should have is that if the probability of anything is concentrated in a small set of outcomes, you're not maximising the entropy (remember: _entropy is maximised by a uniform distribution_) and therefore also not maximising the information transfer. The above scheme concentrates high probability of error to a small number of bits, while transmitting some of them with zero error - we should be able to do better.

It's not obvious how we'd start doing this. We're going to take some wisdom from the old proverb about hammers and nails, and note that the main hammer we've developed so far is a proof that we can send through the channel at a negligible error rate by increasing the size of the message. Let's turn this hammer upside down: we're going to use the decoding process to encode and the encoding process to decode. Specifically, to map from length-$$n$$ strings to the smaller length-$$k$$ strings, we use the decoding process from before:

Given an $$x^n$$ to encode, we find the $$x'^n \in \text{Range}(E)$$ such that the pair $$(x^n, x'^n)$$ is in the jointly typical set $$J_{n\epsilon}$$. (Jointly typical with respect to what joint distribution? That of length-$$n$$ strings before and after being passed through the channel (here we're assuming that the input and output alphabets are equivalent). However, note that nothing actually has to pass through a channel for us to use this.)
We use the inverse of the encoder, $$E^{-1}$$, to map $$x'^n$$ to a length-$$k$$ string $$w^k$$ ($$x'^n \in \text{Range}(E)$$ so this is defined).

To encode, we use the encoder $$E$$, to get $$\bar{x}^n = E(w^k)$$.

We'll find the per-bit error rate, not the per-block error rate, so we want to know how many bits are changed on average under this scheme. We're still working with the assumption of a noiseless channel, so we don't need to worry about the noise in the channel, only the error coming from our lossy compression (which is based on a joint probability distribution coming from assuming some channel, however).

Assume our channel has error probability $$p$$ when transmitting a symbol. Fix an $$x^n$$ and consider pairs $$(x^n, y^n)$$ in the jointly typical set. Most of the $$y^n$$ will differ from $$x^n$$ in approximately $$np$$ bits. Intuitively, this comes from the fact that for a binomial distribution, most of the probability mass is concentrated around the mean at $$np$$, and therefore the typical set contains mostly sequences with a number of errors close to this mean. Therefore, on average we should expect $$np$$ errors between the $$x^n$$ we put into the encoder and the $$x'^n$$ that it spits out. Since we assume no noise, the $$w^k = E^{-1}(x'^n)$$ we send through the channel comes back as the same, and we can do $$E(w^k) = E(E^{-1}(x'^n)) = x'^n$$ to perfectly recover $$x'^n$$. Therefore the only error is the $$np$$ wrong bits, and therefore our per-bit error rate is $$p$$.

Assume that, used the right way around, we have a code that can achieve a rate of $$R' = k/n$$. This rate is $$$ R' = \max_{p_X} I(X;Y) = \max_{p_X} \big[ H(Y) - H(Y|X) \big]$$$ $$$= 1 - H_2(p) $$$ assuming a binary code and a binary symmetric channel, and where $$H_2(p)$$ is the entropy of a two-outcome random variable with probability $$p$$ of the first outcome, or $$$ H_2(p) = - p \log p - (1 - p) \log (1 - p). $$$ Now since we're using it backward, we map from $$n$$ to $$k$$ bits rather than $$k$$ to $$n$$ bits, and this code has rate $$$ \frac{1}{R'} = \frac{n}{k} = \frac{1}{1 - H_2(p)} $$$ What we can now do is make a code that works like the following:

Take a length-$$n$$ block of input.
Use the compressor (i.e. the typical set decoder) to map it to a smaller length-$$k$$ block.
Use some noiseless channel code with capacity $$C$$.
Use the decompressor (i.e. the typical set encoder) to map the recovered length-$$k$$ blocks back to length-$$n$$ blocks.

In step 4, we will on average see that the recovered input differs in $$np$$ places, for a bit error probability of $$p$$. And what is our rate? We assumed the standard noiseless channel code in the middle that transmits our compressed input had the maximum rate $$C$$. However, it is transmitting strings that have already been compressed by a factor of $$\frac{k}{n}$$, so the true rate is $$$ R = \frac{C}{1 - H_2(p)} = \frac{C}{1 + p \log p + (1 - p) \log (1 - p)} $$$ This gives us the second part of the theorem: given a certain rate $$R$$, we can transmit at any probability of error $$p$$ low enough that $$C / (1 - H_2(p)) \le R$$.

(Note that effectively $$0 \le p < 0.5$$, because if $$p > 0.5$$ we can just flip the labels on the channel and change $$p$$ to $$1 - p$$, and if $$p = 0.5$$ we're transmitting no information.)

Proof of Part III: unachievable rates

Note that the pipeline is a Markov chain (i.e. each step depends only on the previous step):

Therefore, the data processing inequality applies (for more on that, search for "data" here). With one application we get $$$ I(w^k; \bar{w}^k) \le I(w^k; y^n) $$$ and with another $$$ I(w^k; y^n) \le I(x^n; y^n) $$$ which combine to give $$$ I(w^k, \bar{w}^k) \le I(x^n; y^n). $$$ By the definition of channel capacity, $$I(x^n; y^n) \le nC$$ (remember that the definition is about mutual information between $$X$$ and $$Y$$, so _per-symbol_ information), and so given the above we also have $$I(w^k, \bar{w}^k) \le nC$$.

With a rate $$R$$, we send over $$nR$$ bits of information, but if the per-bit error probability is $$p$$, we can only receive $$nR (1 - H_2(p))$$ of those bits. Therefore $$I(w^k, \bar{w}^k) = nR(1 - H_2(p))$$ at most, and we have $$$ nR(1-H_2(p)) > nC $$$ is a contradiction, which implies which implies $$$ R > \frac{C}{1 - H_2(p)} $$$ is a contradiction.

Continuous entropy and Gaussian channels

And now, for something completely different.

We've so far talked only about the entropy of discrete random variables. However, there is a very common case of channel coding that deals with continuous random variables: sending a continuous signal, like sound.

So: forget our old boring discrete random variable $$X$$, and bring in a brand-new continuous random variable that we will call ... $$X$$. How much information do you get from observing $$X$$ land on a particular value $$x$$? You get infinite information, because $$x$$ is a real number with an endless sequence of digits; alternatively, the Shannon information is $$- \log p(x)$$, and the probability of $$X=x$$ is infinitesimally small for a continuous random variable, so the Shannon information is $$-\log 0$$ which is infinite. Umm.

Consider calculating the entropy for a continuous variable, which we will denote $$h(X)$$ to make a difference from the discrete case, and define in the obvious way by replacing sums with integrals: $$$ h(X) = -\int_{-\infty}^\infty f(x) \log f(x) d x $$$ where $$f$$ is the probability density function. If we actually evaluate this integral, we would get a constant term that goes to infinity.

As principled mathematicians, we might be concerned about this. But we can mostly ignore it, especially as the main thing we want is $$I(X;Y)$$, and $$$ I(X;Y) = h(Y) - h(Y|X) = -\int f_Y(y) \log f_Y(y) \mathrm{d}y + \iint f_{X,Y}(x,y) \log f_{Y|X=x}(y) \mathrm{d}x \mathrm{d}y $$$

where mumble mumble the infinities cancel out mumble opposite signs mumble.

Signals

With discrete random variables, we generally had some fairly obvious set of values that they could take. With continuous random variables, we usually deal with an unrestricted range - a radio signal could technically be however low or high. However, step down from abstract maths land, and you realise reality isn't as hopeless as it seems at first. Emitting a radio wave, or making noise, takes some input of energy, and the source has only so much power.

For waves (like radio waves and sound waves), power is proportional to the square of the amplitude of a wave. The variance $$\mathbb{V}(X) = \mathbb{E}[(x-\mathbb{E}[x])^2] = \int f(x) (x - \mathbb{E}[X])^2 \mathrm{d}x$$ of a continuous random variable $$X$$ with probability density function $$f$$ is just the expected squared difference between the value and its mean. Both of these quantities are squaring a difference. It turns out that the power of our source and the variance of the random variable that represents it are proportional.

Our model of a continuous noisy channel is one where there's an input signal $$X$$, a source of noise $$N$$, and an output signal $$Y = X + N$$. As usual, we want to maximise the channel capacity $$C = \max_{p_X} I(X;Y)$$, which is done by maximising $$$ I(X;Y) = h(Y) - h(Y|X). $$$ Because noise is generally the sum of a bunch of small contributing factors in each directions, the noise follows a normal distribution with variance $$\sigma_N^2$$. Because the only source of uncertainty is $$N$$ and this has the same regardless of $$X$$, $$h(Y|X)$$ depends only on $$N$$ and not at all on $$X$$, so the only thing we can affect is $$h(Y)$$.

Therefore, the question of how you maximise channel capacity turns into a question of how to maximise $$h(Y)$$ given that $$Y = X + N$$ with $$N \sim \mathcal{N}(0, \sigma_N^2)$$. If we were working without any power/variance constraints, we'd already know the answer: just make $$X$$ such that $$Y$$ is a uniform distribution (which in this case would mean making $$Y$$ a uniform distribution over all real numbers, something that's clearly a bit wacky). However, we have a constraint on power and therefore the variance of $$X$$.

If we were to do some algebra involving Lagrangian multipliers, we would eventually find that we want the distribution of $$X$$ to be a normal distribution. A key property of normal distributions is that if $$X \sim \mathcal{N}(0, \sigma_X^2)$$ (assume the mean is 0; note you can always shift your scale) and $$N \sim \mathcal{N}(0, \sigma_N^2)$$, then $$X + N \sim \mathcal{N}(0, \sigma_X^2 + \sigma_N^2)$$. Therefore the basic principle between efficiently transmitting information using a continuous signal is that you want to transform your input to follow a normal distribution.

If you do, what do you get? Start with $$$ I(X;Y) = h(Y) - h(Y|X) $$$ and now use the "standard" integral that $$$ \int f(z) \log p(z) \mathrm{d}z = -\frac{1}{2} \log (2 \pi e \sigma^2) $$$ if $$z$$ is drawn from a distribution $$\mathcal{N}(0, \sigma^2)$$, and therefore $$$ \max I(X;Y) = C = \frac{1}{2} \log (2 \pi e (\sigma_X^2 + \sigma_N^2)) - \frac{1}{2} \log (2 \pi e \sigma_N^2) $$$ using the fact that $$h(Y|X) = h(N)$$ since the information content of the noise is all that is unknown about $$Y$$ if we're given $$X$$, and the property of normal distributions mentioned above. We can do some algebra to get the above into the form $$$ C = \frac{1}{2} \log \left(\frac{2 \pi e (\sigma_X^2 + \sigma_N^2)}{2 \pi e \sigma_N^2}\right) \ = \frac{1}{2} \log \left( 1 + \frac{\sigma_X^2}{\sigma_N^2}\right) $$$ The variance is proportional to the power, so this can also be written in terms of power as $$$ C = \frac{1}{2} \log \left( 1 + \frac{S}{N}\right) $$$ if $$S$$ is the power of the signal and $$N$$ is the power of the noise. The units of capacity for the discrete case were bits per symbol; here they're bits per second. A sanity check is that if $$S = 0$$, we transmit $$\frac{1}{2} \log (1) = 0$$ bits per second, which makes sense: if your signal power is 0, it has no effect, and no one is going to hear you.

An interesting consequence here is that increasing signal power only gives you a logarithmic improvement in how much information you can transmit. If you shout twice as loud, you can detect approximately twice as fine-grained peaks and troughs in the amplitude of your voice. However, this helps surprisingly little.

If you want to communicate at a really high capacity, there are better things you can do than shouting very loudly. You can decompose a signal into frequency components using the Fourier transform. If your signal consists of many different frequency levels, you can effectively transmit a different amplitude on each of them at once. The range of frequencies that your signal can span over is called the bandwidth and is denoted $$W$$. If you can make use of multiple frequencies, the capacity equation changes to $$$ C = \frac{W}{2} \log \left(1 + \frac{S}{N}\right) $$$ Therefore if you want to transmit information, transmitting across a broad range of frequencies is much more effective than shouting loudly. There's a metaphor here somewhere.

Information theory 2: source coding

2022-06-25T18:39:00.005+01:00

6.9k words, including equations (~36min)

In the previous post, we saw the basic information theory model:

If we have no noise in the channel, we don't need channel coding. Therefore the above model simplifies to

and the goal is to minimise $$n$$ - that is, minimise the number of symbols we need to send - without needing to worry about being robust to any errors.

Here's one question to get started: imagine we're working with a compression function $$f_e$$ that acts on length-$$n$$ strings (that is, sequences of symbols) with some arbitrary alphabet size $$A$$ (that is, $$A$$ different types of symbols). is it possible to build an encoding function $$f_e$$ that compresses every possible input? Clearly not; imagine that it took every length-$$n$$ string to a length-$$m$$ string using the same alphabet, with $$m < n$$. Then we'd have $$A^m$$ different available codewords that would need to code for $$A^n > A^m$$ different messages. By the pigeonhole principle, there must be at least one codeword that codes for more than one message. But that means that if we see this codeword, we can't be sure what it codes for, so we can't recover the original with certainty.

Therefore, we have a choice: either:

do lossy compression, where every message shrinks in size but we can't recover information perfectly; or
do lossless compression, and hope that more messages shrink in size than expand in size.

This is obvious with lossless compression, but applies to both: if you want to do them well, you generally need a probability model for what your data looks like, or at least something that approximates one.

Terminology

When we talk about a "code", we just mean something that maps messages (the $$Z$$ in the above diagram) to a sequence of symbols. A code is nonsingular if it associates every message with a unique code.

A symbol code is a code where each symbol in the message maps to a codeword, and the code of a message is the concatenation of the codewords of the symbols that it is made of.

A prefix code is a code where no codeword is a prefix of another codeword. They are also called instantaneous codes, because when decoding, you can decode a codeword to a symbol immediately when you reach a point where the some prefix of the code corresponds to a codeword.

Useful basic results in lossless compression

Kraft's inequality

Kraft's inequality states that a prefix code with an alphabet of size $$D$$ and code words of lengths $$l_1, l_2, \ldots, l_n$$ satisfies $$$ \sum_{i=1}^n D^{-l_i} \leq 1, $$$ and conversely that if there is a set of lengths $${l_1, \ldots, l_n}$$ that satisfies the above inequality, there exists a prefix code with those codeword lengths. We will only prove the first direction: that all prefix codes satisfy the above inequality.

Let $$l = \max_i l_i$$ and consider the tree with branching factor $$D$$ and depth $$l$$. This tree has $$D^l$$ nodes on the bottom level. Each codeword $$x_1x_2...x_c$$ is the node in this tree that you get to by choosing the $$d_i$$th branch on the $$i$$th level where $$d_i$$ is the index of symbol $$x_i$$ in the alphabet. Since it must be a prefix code, no node that is a descendant of a node that is a codeword can be a codeword. We can define our "budget " as the $$D^l$$ nodes on the bottom level of the tree, and define the "cost" of each codeword as the number of nodes on the bottom level of the tree that are descendants of the node. The node with length $$l$$ has cost 1, and in general a codeword at level $$l_i$$ has cost $$D^{l - l_i}$$. From this, and the prefix-freeness, we get $$$ \sum_i D^{l - l_i} \leq D^l $$$ which becomes the inequality when you divide both sides by $$D^l$$.

Gibbs' inequality

Gibbs' inequality states that for any two probability distributions $$p$$ and $$q$$, $$$ -\sum_i p_i \log p_i \leq - \sum_i p_i \log q_i $$$ which can be written using the relative entropy $$D$$ (also known as the KL distance/divergence) as $$$ \sum_i p_i \log \frac{p_i}{q_i} = D(p||q) \geq 0. $$$ This can be proved using the log sum inequality. The proof is boring.

Minimum expected length of a symbol code

We want to minimise the expected length of our code $$C$$ for each symbol that $$X$$ might output. The expected length is $$L(C,X) = \sum_i p_i l_i$$. Now one way to think of what a length $$l_i$$ means is using the correspondence between prefix codes and binary trees discussed above. Given the prefix requirement, the higher the level in the tree (and thus the shorter the length of the codeword) the more other options we block out in the tree. Therefore we can think of the collection of lengths we assign to our codewords as specifying a rough probability distribution that assigns probability in proportion to $$2^{-l_i}$$. What we'll do is introduce a variable $$q_i$$ that measures the "implied probability" in this way (note dividing the division by a normalising constant): $$$ q_i = \frac{2^{-l_i}}{\sum_i 2^{-l_i}} = \frac{2^{-l_i}}{z} $$$ where in the 2nd step we've just defined $$z$$ to be the normalising constant. Now $$l_i = - \log zq_i = -\log q_i - \log z$$, so $$$ L(C,X) = \sum_i (-p_i \log q_i) - \log z $$$ Now we can apply Gibbs' inequality to know that $$\sum_i(- p_i \log q_i) \geq \sum_i (-p_i \log p_i)$$ and Kraft's inequality to know that $$\log z = \log \big(\sum_i 2^{-l_i} \big) \leq \log(1)=0$$, so we get $$$ L(C,X) \geq -\sum_i p_i \log p_i = H(X). $$$ Therefore the entropy (with base-2 $$\log$$) of a random variable is a lower bound on the expected length of a codeword (in a 2-symbol alphabet) that represents the outcome of that random variable. (And more generally, entropy with base-$$d$$ logarithms is a lower bound on the length of a codeword for the result in a $$d$$-symbol alphabet.)

Huffman coding

Huffman coding is a very pretty concept.

We saw above that if you're making a random variable for the purpose of gaining the most information possible, you should prepare your random variable to have a uniform probability distribution. This is because entropy is maximised by a uniform distribution, and the entropy of a random variable is the average amount of information you get by observing it.

The reason why, say, encoding English characters as 5-bit strings (A = 00000, B = 00001, ..., Z = 11010, and then use the remaining 6 codes for punctuation or cat emojis or whatever) is not optimal is that some of those 5-bit strings are more likely than others. On a symbol-by-symbol-level, whether the first symbol is a 0 or a 1 is not equiprobable. To get an ideal code, each symbol we send should have equal probability (or as close to equal probability as we can get).

Robert Fano, of Fano's inequality fame, and Claude Shannon, of everything-in-information-theory fame, had tried to find an efficient general coding scheme in the early 1950s. They hadn't succeeded. Fano set it as an alternative to taking the final exam for his information theory class at MIT. David Huffman tried for a while, and had almost given up and started studying instead, when he came up with Huffman coding and quickly proved it to be optimal.

We want the first code symbol (a binary digit) to divide the space of possible message symbols (the English letters, say) in two equally-likely parts, the first two to divide it in four, the third into eight, and so o n. Now some message symbols are going to be more likely than others, so the codes for some symbols have to be longer. We don't want it to be ambiguous when we get to the end of a codeword, so we want a prefix-free code. Prefix-free codes with a size-$$d$$ alphabet can be represented as trees with branching factor $$d$$, where each leaf is one codeword:

Above, we have $$d=2$$ (i..e binary), and six items to code for (a, b, c, d, e, and f), and six code words with lengths of between 1 and 4 characters in the codeword alphabet.

Each codeword is associated with some probability. We can define the weight of a leaf node to be its probability (or just how many times it occurs in the data) and the weight of a non-leaf code to be the sum of the weights of all leaves that are downstream of it in the tree. For an optimal prefix-free code, all we need to do is make sure that each node has children that are as equally balanced in weight as possible.

The best way to achieve this is to work bottom-up. Start without any tree, just a collection of leaf nodes representing the symbols you want codewords for. Then repeatedly build a node uniting the two least-likely parentless nodes in the tree, until the tree has a root.

Above, the numbers next to the non-leaf nodes show the order in which the node was created. This set of weights on the leaf nodes creates the same tree structure as in the previous diagram.

(We could also try to work top-down, creating the tree the root to the leaves rather than from the leaves to the root, but this turns out to give slightly worse results. Also the algorithm for achieving this is less elegant.)

Arithmetic coding

The Huffman code is the best symbol code - that is, a code where every symbol in the message gets associated with a codeword, and the code for the entire message is simply the concatenation of all the codewords of its symbols.

Symbol codes aren't always great, though. Consider encoding the output of a source that has a lot of runs like "aaaaaaaaaahaaaaahahahaaaaa" (a source of such messages might be, for example, a transcription of what a student says right before finals). The Huffman coding for this message is, for example, that "a" maps to a 0, and "h" maps to a 1, and you have achieved a compression of exactly 0%, even though intuitively those long runs of "a"s could be compressed.

One obvious thing you could do is run-length encoding, where long blocks of a character get compressed into a code for the character plus a code for how many times the character is repeated; for example the above might become "10a1h5a1h1a1h1a1h5a". However, this is only a good idea if there are lots of runs, and requires a bunch of complexity (e.g. your alphabet for the codewords must either be something more than binary, or then you need to be able to express things like lengths and counts in binary unambiguously, possibly using a second layer of encoding with a symbol code).

Another problem with Huffman codes is that the code is based on assuming an unchanging probability model across the entire length of the message that is being encoded. This might be a bad assumption if we're encoding, for example, long angry Twitter threads, where the frequency of exclamation marks and capital letters increases as the message continues. We could try to brute-force a solution, such as splitting the message into chunks and fitting a Huffman code separately to each chunk, but that's not very elegant. Remember how elegant Huffman codes feel as a solution to the symbol coding problem? We'd rather not settle for less.

The fundamental idea of arithmetic coding is that we send a number representing where on the cumulative probability distribution of all messages the message we want to send lies. This is a dense statement, so we will unpack it with an example. Let's say our alphabet is $$A = {a, r, t}$$. To establish an ordering, we'll just say we consider the alphabet symbols in alphabetic order. Now let's say our probability distribution for the random variable $$X$$ looks like the diagram on the left; then our cumulative probability distribution looks like the diagram on the right:

One way to specify which of $${a, r, t}$$ we mean is to pick a number $$0 \leq c \leq 1$$, and then look at which range it corresponds to on the $$y$$-axis of the right-hand figure; $$0 \leq c < 0.5$$ implies $$a$$, $$0.5 \leq c < 0.7$$ implies $$r$$, and $$0.7 \leq c < 1$$ implies $$t$$. We don't need to send the leading 0 because it is always present, and for simplicity we'll transmit the following decimals in binary; 0.0 becomes "0", 0.5 becomes "1", 0.25 becomes "01", and 0.875 is "111".

Note that at this point we've almost reinvented is the Huffman code. $$a$$ has the most probability mass and can be represented in one symbol. $$r$$ happens to be representable in one symbol ("1" corresponds to 0.5 which maps to $$r$$) as well even though it has the least probability mass, which is definitely inefficient but not too bad. $$t$$ takes 2: "11".

The real benefit begins when we have multi-character messages. The way we can do it is like this, recursively splitting the number range between 0 and 1 into smaller and smaller chunks:

We see possible numbers encoding "art", "rat", and "tar". Not only that, but we see that all messages we send are infinite in length, as we can just keep going down, adding more and more letters. At first this might seem like a great deal - send one number, get infinite symbols transmitted for free! However, there's a real difference between "art" and "artrat", so we want to be able to know when to stop as well.

A simple answer is that the message also includes some code encoding how many symbols to decode for. A more elegant answer is that we can keep our message as just one number, but extend our alphabet to include an end-of-message token. Note that even with this end-of-message token, it is still true that many characters of the message can be encoded by a single symbol of output, especially if some outcome is much more likely. For example, in the example below we need only one bit ("1", for the number 0.5) to represent the message "aaa" (followed by the end-of-message character):

There are still two ways in which this code is underspecified.

The first is that we need to choose how much of the probability space to assign to our end-of-message token. The optimal value for this clearly depends on how long messages we will be sending.

The second is that even with the end-of-message token, each codeword is still represented by a range of values rather than a single number. Any of these are valid numbers to send, but we want to minimise the length, so therefore we will choose the number in this range that has the shortest binary representation.

Finally, what is our probability model? With the Huffman code, we either assume a probability model based on background information (e.g. we have the set of English characters, and we know the rough probabilities of them by looking at some text corpus that someone else has already compiled), or we fit the probability model based on the message we want to send - if 1/10th of all letters in the message are $$a$$s, we set $$p_a = 0.1$$ when building the tree for our Huffman code, and so on.

With arithmetic coding, we can also assume static probabilities. However, we can also do adaptive arithmetic coding, where we change the probability model as we go. A good way to do this is for our probability model to assume that the probability $$p_x$$ of the symbol $$x$$ after we have already processed text $$T$$ is $$$ p_x = \frac{\text{Count}(x, T) + 1}{\sum_{y \in A} \big(\text{Count}(y, T) + 1\big)}$$$ $$$= \frac{\text{Count}(x, T) + 1}{\sum_{y \in A} \big(\text{Count}(y, T)\big) + |A|} $$$ where $$A$$ is the alphabet, and $$\text{Count}(a, T)$$ simply returns the count of how many times the character $$a$$ occurs in $$T$$. Note that if we didn't have the $$+1$$ in the numerator and in the sum in the denominator, we would assume a probability of zero to anything we haven't seen before, and be unable to encode it.

(We can either say that the end-of-message token is in the alphabet $$A$$, or, more commonly, assign "probabilities" to all $$x$$ using the above formula and some probability $$p_{EOM}$$ to the end of message, and then renormalise by dividing all $$p_x$$ by $$1 + p_{EOM}$$.)

How do we decode this? At the start, the assumed distribution is simply uniform over the alphabet (except maybe for $$p_{EOM}$$). We can decode the first symbol using that distribution, then update the distribution and decode the next, and so on. It's quite elegant.

What isn't elegant is implementing this with standard number systems in most programming languages. For any non-trivial message length, arithmetic coding is going to need very precise floating point numbers, and you can't trust floating point precision very far. You'll need some special system, likely an arbitrary-precision arithmetic library, to actually implement arithmetic coding.

Prefix-free arithmetic coding

The above description of arithmetic coding is not a prefix-free code. We generally want prefix-free codes, in particular because it means we can decode it symbol by symbol as it comes in, rather than having to wait for the entire message to come through. Note also that often in practice it is uncertain whether or not there are more bits coming; consider a patchy internet connection with significant randomness between packet arrival times.

The simple fix for this is that instead of encoding a number as any sequence of binary string that maps onto the right segment of the number line between 0 and 1, you impose an additional requirement on it: whatever binary bits you add onto the number, it is still within the range.

Lempel-Ziv coding

Huffman coding integrated the probability model and the encoding. Arithmetic coding still uses an (at least implicit) probability model to encode, but in a way that makes it possible to update as we encode. Lempel-Ziv encoding, and its various descendants, throw away the entire idea of having any kind of (explicit) probability model. We will look at the original version of this algorithm.

Encoding

Skip all that Huffman coding nonsense of carefully rationing the shorter codewords for the most likely symbols, and simply decide on some codeword length $$d$$ and give every character in the alphabet a codeword of that length. If your alphabet is again $${a, r, t, \text{EOM}}$$ (we'll include the end-of-message character from the start this time), and $$d = 3$$, then the codewords you define are literally as simple as $$$a \mapsto 000 $$$ $$$r \mapsto 001 $$$ $$$t \mapsto 010 $$$ $$$\text{EOM} \mapsto 011$$$ If we used this code, it would be a disaster. We have four symbols in our alphabet, so the maximum entropy of the distribution is $$\log_2 4 = 2$$ bits, and we're spending 3 bits on each symbol. With this encoding, we increase the length by at least 50%. Instead of your compressed file being uploaded in 4 seconds, it now takes 6.

However, we selected $$d=3$$, meaning we have $$2^3 = 8$$ slots for possible codewords of our chosen constant length, and we've only used 4. What we'll do is follow these steps as we scan through our text:

Read one symbol past the longest match between the following text and a codeword we've defined. Therefore what we now have is a string $$Cx$$, where we have a code for $$C$$ already of length $$|C|$$, $$x$$ is a single character, and $$Cx$$ is a prefix of the remaining text.
Add $$C$$ to the code we're forming, to encode for the first $$|C|$$ characters of the remaining text.
If there is space among the $$2^d$$ possible codewords we have available: let $$n$$ be the binary representation of the smallest possible codeword not yet associated with a code, and define $$Cx \mapsto n$$ as a new codeword.

Here is an example of the encoding process, showing the emitted codewords on the left, the original definitions on the top, the new definitions on the right, and the message down the middle:

Decoding

A boring way to decode is to send the codeword list along with your message. The fun way is to reason it out as you go along, based on your knowledge of the above algorithm and a convention that lets you know which order the original symbols were added to the codeword list (say, alphabetically, so you know the three bindings in the top-left). An example of decoding the above message:

Source coding theorem

The source coding theorem is about lossy compression. It is going to tell us that if we can tolerate a probability of error $$\delta$$, and if we're encoding a message consisting of a lot of symbols, unless $$\delta$$ is very close to 0 (lossless compression) or 1 (there is nothing but error), it will take about $$H(X)$$ bits per symbol to encode the message, where $$X$$ is the random variable according to which the symbols in the message have been drawn. Since it means that entropy turns up as a fundamental and surprisingly constant limit when we're trying to compress our information, this further justifies the use of entropy as a measure of information.

We're going to start our attempt to prove the source coding theorem by considering a silly compression scheme. Observe that English has 26 letters, but the bottom 10 (Z, Q, X, J, K, V, B, P, Y, G) are slightly less than 10% of all letters. Why not just drop them? Everthn is still comprehensile without them, and ou can et awa with, for eample, onl 4 inary its per letter rather than 5, since ou're left with ust 16 letters.

Given an alphabet $$A$$ from which our random variable $$X$$ takes values, define the $$\delta$$-sufficient subset $$S_\delta$$ of $$A$$ to be the smallest subset of $$A$$ such that $$P(x \in S_\delta) \geq 1 - \delta$$ for $$x$$ drawn from $$X$$. For example, if $$A$$ is the English alphabet, and $$\delta = 0.1$$, then $$S_\delta$$ is the set of all letters except Z, Q, X, J, K, V, B, P, Y, and G, since the other letters have a combined probability of over $$1 - 0.1 = 0.9$$, and any other subset containing more than $$0.9$$ of the probability mass contains must contain more letters.

Note that $$S_\delta$$ can be formed by adding elements from $$A$$, in descending order of probability, into a set until the sum of probabilities of elements in the set exceeds $$1 - \delta$$.

Next, define the essential bit content of $$X$$, denoted $$H_\delta(X)$$, as $$$ H_\delta(X) = \log 2 |S_\delta|. $$$ In other words, $$H_\delta(X)$$ is the answer to "how many bits of information does it take to point to one element in $$S_\delta$$ (without being able to assume the distribution is anything better than uniform)?". $$H_\delta(X)$$ for $$\text{English alphabet}_{0.1}$$ is 4, because $$\log_2 |{E, T, A, O, I, N, S, H, R, D, L, U, C,M, W, F}| = \log_2 16 = 4$$. It makes sense that this is called "essential bit content".

We can graph $$H_\delta(X)$$ against $$\delta$$ to get a pattern like this:

Where it gets more interesting is when we extend this definition to blocks. Let $$X^n$$ denote the random variable for a sequence of $$n$$ independent identically distributed samples drawn from $$X$$. We keep the same definitions for $$S_\delta$$ and $$H_\delta(X)$$; just remember that now $$S$$ is a subset of $$A^n$$ (where the exponent denotes Cartesian product of a set with itself; i.e. $$A^n$$ is all possible length-$$n$$ strings formed from that alphabet). In other words, we're throwing away the least common length-$$N$$ letter strings first; ZZZZ is out the window first if $$n = 4$$, and so on.

We can plot a similar graph as above, except we're plotting $$\frac{1}{n} H_\delta(x)$$ on the vertical axis to get per-symbol entropy, and there's a horizontal line around the entropy of English letter frequencies:

(Note that the entropy per letter of English drops to only 1.3 if we stop modelling each letter as drawn independently from the others around it, and instead have a model with a perfect understanding of which letters occur together.)

The graph above shows the plot of $$\frac{1}{n}H_\delta(x)$$ against $$\delta$$ for a random variable $$X^n$$ for $$n=1$$ (blue), $$n=2$$) (orange), and $$n=3$$ (green). We see that as $$n$$ increases, the lines become flatter, and the middle portions approach the black line that shows the entropy of the English letter frequency distribution. What you'd see if we continued plotting this graph for larger values of $$n$$ (which might happen for example if you bought me a beefier computer) is that this trend continues; specifically, that there is a value $$n$$ large enough that the graph of $$\frac{1}{n}H_\delta(x)$$ is as close as we want to the black line for the entire length of it, except for an arbitrarily small part near $$\delta = 0$$ and $$\delta = 1$$. Mathematically, we can pick an $$\epsilon > 0$$ such that for $$0 < \delta < 1$$ there exists a positive integer $$n_0$$ such that for all $$n \geq n_0$$, $$$ \left| \frac{1}{n}H_\delta(X^n) - H(X) \right| \leq \epsilon. $$$ Now remember that $$\frac{1}{n}H_\delta(X^n)=\frac{1}{n}\log |S_\delta|$$ was the essential bit content per symbol, or, in other words, the number of bits we need per symbol to represent $$X^n$$ (with error probability $$\delta$$) in the simple coding scheme where we assign an equal-length binary number to each element in $$S_\delta$$ (but hold on: aren't there better codes than ones where all elements in $$S_\delta$$ get an equal-length representation? yes, but we'll see soon that not by very much). Therefore what the above equation is saying is that we can encode $$X^n$$ with error chance $$\delta$$ using a number of bits per symbol that differs from the entropy $$H(X)$$ by only a small constant $$\epsilon$$. This is the source coding theorem. It is a big deal, because we've shown that entropy is related to the number of bits per symbol we need to do encoding in a lossy compression scheme.

(You can get to a similar result with lossless compression schemes where, instead of throwing away the ability to encode all sequences not in $$S_\delta$$ and just accepting the inevitable error, you instead have an encoding scheme where you reserve one bit to indicate whether or not an $$x^n$$ drawn from $$X^n$$ is in $$S_\delta$$, and if it is you encode it like above, and if it isn't you encode it using $$\log |A|^n$$ bits. Then you'll find that the probability of having to do the latter step is small enough that $$\log |A|^n > \log |S_\delta|$$ doesn't matter very much.)

Typical sets

Before going into the proof, it is useful to investigate what sorts of sequences $$x^n$$ we tend to pull out from $$X^n$$ for some $$X$$. The basic observation is that most $$x^n$$ are going to be neither the least probable nor the most probably out of all $$x^n$$. For example, "ZZZZZZZZZZ" would obviously be an unusual set of letters to draw at random if you're selecting them from English letter frequencies. However, so would "EEEEEEEEEE". Yes, this individual sequence is much more likely than "ZZZZZZZZZZ" or any other sequence, but there is only one of them, so getting it would still be surprising. To take another example, the typical sort of result you'd expect from a coin loaded so that $$P(\text{"heads"}) = 0.75$$ isn't runs of only heads, but rather an approximately 3:1 mix of heads and tails.

The distribution of letter counts follows a multinomial distribution (the generalisation of the binomial distribution). Therefore (if you think about what a multinomial distribution is, or if you know that the mean is $$n p_{x_i}$$ for the $$i$$th variable) in $$x^n$$ we'd expect roughly $$np_e$$ of the letter e, $$np_z$$ of the letter z, and so on - and $$np_e \ll n$$ even though $$p_e > p_L$$ for all $$L$$ in the alphabet. Slightly more precisely (if you happen to know this fact), the variance of variable $$x_i$$ is $$np_{x_i}(1-p_{x_i})$$, implying that the standard deviation grows only in proportion to $$\sqrt{n}$$, so for large $$n$$ it is very rare to get an $$x^n$$ with counts of $$x_i$$ that differ wildly from the expected count $$np_{x_i}$$.

Let's define a notion of "typicality" for a sequence $$x^n$$ based on this idea of it being unusual if $$x^n$$ is either a wildly likely or wildly unlikely sequence. The median sequence has $$np_{x_i}$$ of each variable, so has probability $$$ P(x^n) = p_{x_1}^{np_{x_1}}p_{x_2}^{np_{x_2}} \ldots p_{x_n}^{np_{x_n}} $$$ which in turn has a Shannon information content of $$$

\log P(x^n) = -\sum_i np_{x_i} \log p_{x_i} = n H(X) $$$ Oh look, entropy pops up again. How surprising.

Now we make the following definition: a sequence $$x^n$$ is $$\epsilon$$-typical if its information content per symbol is $$\epsilon$$-close to $$H(X)$$, that is $$$ \left| - \frac{1}{n}\log{P(x^n)} - H(X) \right| <\epsilon. $$$ Define the typical set $$T_{n\epsilon}$$ to be the set of length-$$n$$ sequences (drawn from $$X^n$$) that are $$\epsilon$$-typical.

$$T_{n\epsilon}$$ is a small subset of the set $$A^n$$ of all length-$$n$$ sequences. We can see this through the following reasoning: for any $$x^n \in T_{n\epsilon}$$, $$\frac{1}{n} \log P(x^n) \approx H(X)$$ which implies that $$$ P(x^n) \approx 2^{-nH(X)} $$$ and therefore that there can only be roughly $$2^{nH(X)}$$ such sequences; otherwise their probability would add up to more than 1. In comparison, the number of possible sequences $$|A^n| = 2^{n \log |A|}$$ is significantly larger, since $$\log |A| \leq H(X)$$ for any random variable $$X$$ with alphabet / outcome set $$A$$ (with equality if $$X$$ has a uniform distribution over $$A$$).

The typical set contains most of the probability

Chebyshev's inequality states that $$$ P((X-\mathbb{E}[X])^2 \geq a) \leq \frac{\sigma^2}{a} $$$ where $$\sigma^2$$ is the variance of the random variable $$X$$, and $$a \geq 0$$. It is proved here (search for "Chebyshev").

Earlier we defined the $$\epsilon$$-typical set as $$$ T_{n\epsilon} = \left\{ x^n \in A^n \,\text{ such that } \, \left| -\frac{1}{n}\log P(X^n) - H(X) \right| < \epsilon \right\}. $$$ Note that $$$ \mathbb{E}\left[-\frac{1}{n}\log P(X^n)\right] = -\frac{1}{n} \sum \log P(X_i)$$$ $$$ = -\mathbb{E}[\log P(X_i)]$$$ $$$ = H(X_i) = H(X) $$$ by using independence of the $$X_i$$ making up $$X^n$$ in the first step, the law of large numbers ($$\lim_{n \to \infty} \frac{1}{n} \sum_i X_i = \mathbb{E}[X]$$) in the second, and the fact that all $$X_i$$ are independent draws of the same random variable $$X$$ in the third.

Therefore, we can now rewrite the typical set definition equivalently as $$$ T_{n\epsilon} = \left\{ x^n \in A^n \,\text{ such that } \, \left( -\frac{1}{n}\log P(x^n) - H(X) \right)^2 < \epsilon^2 \right\}$$$ $$$= \left\{ x^n \in A^n \,\text{ such that } \, \left( Y - \mathbb{E}[Y] \right)^2 < \epsilon^2 \right\} $$$ for $$Y = -\frac{1}{n} \log P(X^n)$$, which is in the right form to apply Chebyshev's inequality to get a probability of belonging to this set, except for the fact that the sign is the wrong way around. Very well - we'll instead consider the set of sequences $$\bar{T}_{n\epsilon} = A^n - T_{n\epsilon}$$ (i.e. all length-$$n$$ sequences that are not typical) instead, which can be defined as $$$ \bar{T}_{n \epsilon} = \left\{ x^n \in A^n \,\text{ such that } \, (Y - \mathbb{E}[Y])^2 \geq \epsilon^2 \right\} $$$ and use Chebyshev's inequality to conclude that $$$ P((Y - \mathbb{E}[Y])^2 \geq \epsilon^2) \leq \frac{\sigma_Y^2}{\epsilon^2} $$$ where $$\sigma_Y^2$$ is the variance of $$Y= -\frac{1}{n} \log P(X^n)$$. This is exciting - we have a bound on the probability that a sequence is not in the typical set - but we want to link this probability to $$n$$ somehow. Let $$Z = -\log P(X)$$, and note that $$Y$$ can be written as the average of many draws from $$Z$$. Therefore $$$ \mathbb{E}[Z] = -\frac{1}{n} \sum_i \log P(X) = -\frac{1}{n} \log P(X^n) = \mathbb{E}[Y] $$$ and since $$Y = \frac{1}{n} \sum_i Z_i$$, the variance of $$Y$$, $$\sigma_Y^2$$, is equal to $$\frac{1}{n} \sigma_Z^2$$ (a basic law of how variance works that is often used in statistics). We can substitute this into the expression above to get $$$ P((Y-\mathbb{E}[Y])^2 \geq \epsilon^2) \leq \frac{\sigma_Z^2}{n\epsilon^2}. $$$ The probability on the left-hand side is identical to $$P((-\frac{1}{n} \log P(X^n) - H(X) )^2 \geq \epsilon^2)$$, which is the probability of the condition that $$X^n$$ is not in the $$\epsilon$$-typical set $$T_{n\epsilon}$$, which gives us our grand result $$$ P(X^n \in T_{n\epsilon}) \ge 1 - \frac{\sigma_Z^2}{n\epsilon^2}. $$$ $$\sigma_Z^2$$ is the variance of $$\log P(X^n)$$; it depends on the particulars of the distribution and is probably hell to calculate. However, what we care about is that if we just crank up $$n$$, we can make this probability as close to 1 as we like, regardless of what $$\sigma_Z^2$$ is, and regardless of what we set as $$\epsilon$$ (the parameter for how wide the probability range for the typical set).

The key idea is this: asymptotically, as $$n \to \infty$$, more and more of the probability mass of possible length-$$n$$ sequences is concentrated among those that have a probability of between $$2^{-n(H(X)+\epsilon)}$$ and $$2^{-n(H(x) - \epsilon)}$$, regardless of what (positive real) $$\epsilon$$ you set. This is known as the "asymptotic equipartition property" (it might be more appropriate to call it an "asymptotic approximately-equally-partitioning property" because it's not really an "equipartition", since depending on $$\epsilon$$ these can be very different probabilities, but apparently that was too much of a mouthful even for the mathematicians).

Finishing the proof

As a reminder of where we are: we stated without proof $$$ \left| \frac{1}{n}H_\delta(X^n) - H(X) \right| < \epsilon. $$$ and noted that this is an interesting result that also gives meaning to entropy, since we see that it's related to how many bits it takes for a naive coding scheme to express $$X^n$$ (with error probability $$\delta$$).

Then we went on to talk about typical sets, and ended up finding that the probability that an $$x^n$$ drawn from $$X^n$$ lies in the set $$$ T_{n \epsilon} =\left\{ x^n \in A^n \,\text{ such that } \, \left| -\frac{1}{n}\log P(X^n) - H(X) \right| < \epsilon \right\}. $$$ approaches 1 as $$n \to \infty$$, despite the fact that $$T_{n\epsilon}$$ has only approximately $$2^{nH(X)}$$ members, which, for distributions of $$X$$ that are not very close to the uniform distribution over the alphabet $$A$$, is a small fraction of the $$2^{n \log |A|}$$ possible length-$$n$$ sequences.

Remember that $$H_\delta(X^n) = \log |S_\delta|$$, and $$S_\delta$$ was the smallest subset of $$A^n$$ such that it contains sequences whose probability sums to at least $$1 - \delta$$. This is a bit like the typical set $$T_{n\epsilon}$$, which also contains sequences making up most of the probability mass. Note that $$T_{n\epsilon}$$ is less efficient; $$S_\delta$$ optimally contains all sequences with probability greater than some threshold, whereas $$T_{n\epsilon}$$ generally omits the highest-probability sequences (settling instead for sequences of the same probability as most sequences that are drawn from $$X^n$$). Therefore $$$ H_\delta(X^n) \leq \log |T_{n\epsilon}| $$$ for an $$n$$ that depends on what $$\delta$$ and $$\epsilon$$ we want. Now we can get an upper bound on $$H_\delta(X^n)$$ if we can upper-bound $$|T_{n\epsilon}|$$. Looking at the definition, we see that the probability of a sequence $$X^n$$ must obey $$$ 2^{n(H(X) - \epsilon)} < P(X^n) < 2^{n(H(X) + \epsilon)}. $$$ $$T_{n\epsilon}$$ has the largest number of elements if all elements have the lowest possible probability $$p$$, and if that is the case it has at most $$1/p$$ of such lowest-probability elements since the probabilities cannot add to more than one, which implies $$|T_{n\epsilon}| < 2^{n(H(x)+\epsilon)}$$. Therefore $$$ H_\delta(X^n) \leq \log |T_{n\epsilon}| < \log(2^{n(H(X)+e)}) = n(H(X) + \epsilon) $$$ and we have a bound $$$ H_\delta(X^n) < n(H(X) + \epsilon). $$$ If we can now also find the bound $$n(H(X) + \epsilon) < H_\delta(X^n)$$, we've shown $$|\frac{1}{n} H_\delta(X^n) - H(X)| < \epsilon$$ and we're done. The proof of this bound is a proof by contradiction. Imagine that there is an $$S'$$ such that $$$ \frac{1}{n} \log |S'| \leq H - \epsilon $$$ but also $$$ P(X^n \in S') \geq 1 - \delta. $$$ We want to show that $$P(X^n \in S')$$ can't actually be that large. For the other bound, we used our typical set successfully, so why not use it again? Specifically, write $$$ P(X^n \in S') = P(X^n \in S' \cap T_{n\varepsilon}) + P(X^n \in S' \cap \bar{T}_{n\varepsilon}) $$$ where $$\bar{T}_{n\varepsilon}$$ is again $$A^n - T_{n\varepsilon}$$, and noting that our constant $$\varepsilon$$ for $$T$$, is not the same as our constant $$\epsilon$$ in the bound. We want to set an upper bound on this probability; for that to hold, we need to make the terms on the right-hand side as large as possible. For the term, this is if $$S' \cap T_{n\varepsilon}$$ is as large as it can be based on the bound on $$|S'|$$, i.e. $$2^{n(H(X)-\epsilon)}$$, and each term in it has the maximum probability $$2^{-n(H(X)-\varepsilon)}$$ of terms in $$T_{n\varepsilon}$$. For the second term, this is if $$S' \cap \bar{T}_{n \epsilon}$$ is restricted only by $$P(X^n \in \bar{T}_{n\varepsilon}) \leq \frac{\sigma^2}{n\epsilon^2}$$, which we showed above. (Note that you can't have both of these conditions holding at once, but this does not matter since we only want to show a non-strict inequality.) Therefore we get $$$ P(X^n \in S') \leq 2^{n(H(X) - \epsilon)} 2^{-n(H(X)+\varepsilon)} + \frac{\sigma^2}{n\epsilon^2} \ = 2^{-n(\epsilon + \varepsilon)} + \frac{\sigma^2}{n\epsilon^2} $$$ and we see that since $$\epsilon, \varepsilon > 0$$, and as we're dealing with the case where $$n \to \infty$$, this probability is going to go to zero in the limit. But we had assumed $$P(X^n \in S') \geq 1 - \delta$$ - so we have a contradiction unless we don't assume that, which means $$$ n(H(X) - \epsilon) < H_\delta(X^n). $$$ Combining this with the previous bound, we've now shown $$$ H(X) - \epsilon < \frac{1}{n} H_\delta(X^n) < H(X) + \epsilon $$$ which is the same as $$$ \left|\frac{1}{n}H_\delta(X) - H(X)\right| < \epsilon $$$ which is the source coding theorem that we wanted to prove.

Information theory 1

2022-06-20T16:27:00.004+01:00

5044 words, including equations (~30min)

This is the first in a series of posts about information theory. A solid understanding of basic probability (random variables, probability distributions, etc.) is assumed. This post covers:

what information and entropy are, both intuitively and axiomatically
(briefly) the relation of information-theoretic entropy to entropy in physics
conditional entropy
joint entropy
KL distance (also known as relative entropy)
mutual information
some results involving the above quantities
the point of source coding and channel coding

Future posts cover source coding and channel coding in detail.

What is information?

How much information is there in the number 14? What about the word "information"? Or this blog post? These don't seem like questions with exact answers.

Imagine you already know that someone has drawn a number between 0 and 15 from a hat. Then you're told that the number is 14. How much additional information have you learned? A first guess at a definition for information might be that it's the number of questions you need to ask to become certain about an answer. We don't want arbitrary questions though; "what is the number?" is very different from "is the number zero?". So let's say that it has to be a yes-no question.

You can represent a number within some specific range as a series of yes-no questions by writing it out in base-2. In base-2, 14 is 1110. Four questions suffice: "is the leftmost base-2 digit a 0?", etc. The number of base-$$B$$ digits required to represent a number $$n$$ is $$\lceil\log_B n\rceil$$, where $$\lceil x \rceil$$ means the smallest integer greater than or equal to $$x$$ (i.e., rounding up). Now maybe there should be some sense in which we can allow pointing at a number in the range 0 to 16 to have a bit more information than pointing at a number from 0 to 15, even though we can't literally ask 4.09 yes-no questions. So we might try to define our information measure as $$\log n$$ (in whatever base because changing which base we're doing logs in would only change the answer by a constant factor anyways, but let's just say it's base-2 to maintain the correspondence to yes-no questions), where $$n$$ is the number of outcomes that the thing we now know was selected from.

Now let's say there's a shoe box we've picked up from a store. There are a gazillion things that could be inside the box, so $$n$$ is something huge. However, it seems that if we open the box and find a new pair of sneakers, we are less surprised than if we open the box and find the Shroud of Turin. We'd like to make some types of contain quantitatively more information than others.

The standard sort of thing you do in this kind of situation is that you bring in probabilities. With drawing a number out of a hat, we have a uniform distribution where the probability for each outcome is $$p = 1/ n$$. So therefore we might as well have written that information content is equivalent to $$\log \frac{1}{p}$$, and gotten the same answer in that question. Since presumably the probability of your average shoe box containing sneakers is higher than the probability of it containing the Shroud of Turin, with this revised definition we now sensibly get that the latter gives us more information (because $$\log \frac{1}{p}$$ is a decreasing function of $$p$$). Note also that $$\log \frac{1}{p}$$ is the same as $$- \log p$$; we will usually use the latter form. This is called the Shannon information. To be precise:

The (Shannon) information content of seeing a random variable $$X$$ take a value $$x$$ is $$$-\log p_x$$$ where $$p_x$$ is the probability that $$X$$ takes value $$x$$.
We can see the behaviour of the information content of an event as a function of its probability here:

Axiomatic definition

The above derivation was so hand-wavy that it wasn't even close to being a derivation.

When discovering/inventing the concept of Shannon information, Shannon started from the idea that the information contained in seeing an event is a function of that event's probability (and nothing else). Then he required three further axioms to hold for this function:

If the probability of an outcome is 1, it contains no information. This makes sense - if you already know something with certainty, then you can't get more information by seeing it again.
The information contained in an event is a decreasing function of its probability of happening. Again, this makes sense: seeing something you think is very unlikely is more informative than seeing something you were pretty certain was already going to happen.
The information contained in seeing two independent events is the sum of the information of seeing them separately. We don't want to have to apply some obscure maths magic to figure out how much information we got in total from seeing one dice roll and then another other.

The last one is the big hint. The probability of seeing random variable (RV) $$X$$ take value $$x$$ and RV $$Y$$ take value $$y$$ is $$p_x p_y$$ if $$X$$ and $$Y$$ are independent. We want a function, call it $$f$$, such that $$f(p_x p_y) = f(p_x) + f(p_y)$$. This is the most important property of logarithms. You can do some more maths to really demonstrate that is the logarithms with some base are the only function that fit this definition, or you can just guess that it's a $$\log$$ and move on. We'll do the latter.

Entropy

Entropy is the flashy term that comes up in everything from chemistry to .zip files to the fundamental fact that we're all going to die. It is often introduced as something like "[mumble mumble] a measure of information [mumble mumble]".

It is important to distinguish between information and entropy. Information is a function of an outcome (of a random variable), for example the outcome of an experiment. Entropy is a function of a random variable, for example an experiment before you see the outcome. Specifically,

The entropy $$H(X)$$ is the expected information gain from a random variable $$X$$: $$$ H(X) = \underset{x_i \sim X}{\mathbb{E}}\Big[-\log P(X=x_i)\Big] \ = -\sum_i p_{x_i} \log p_{x_i} $$$ ($$\underset{x_i \sim X}{\mathbb{E}}$$ means the expected value when value $$x_i$$ is drawn from the distribution of RV $$X$$. $$P(X=x_i)$$, alternatively denoted $$p_{x_i}$$ when $$X$$ is clear from context, is the probability of $$X$$ taking value $$x_i$$.)

(Why is entropy denoted with an $$H$$? I don't know. Just be thankful it wasn't a random Greek letter.)

Imagine you're guessing a number between 0 and 15 inclusive, and the current state of your beliefs is that it is as likely to be any of these numbers. You ask "is the number 9?". If the answer is yes, you've gained $$-\log_2 \frac{1}{16} = \log_2 16 = 4$$ bits of information. If the answer is no, you've gained $$-\log_2 \frac{15}{16} = \log_2 16 - \log_2 15 = 0.093$$ bits of information. The probability of the first outcome is 1/16 and the probability of the second is 15/16, so the entropy is $$\frac{15}{16} \times 4 + \frac{1}{16} \times 0.093 = 0.337$$ bits.

In contrast, if you ask "is the number smaller than 8?", you always get $$-\log_2 \frac{8}{16} = \log_2{2} = 1$$ bit of information, and therefore the entropy of the question is 1 bit.

Since entropy is expected information gain, whenever you prepare a random variable for the purpose of getting information by observing its value, you want to maximise its entropy.

The closer a probability distribution is to a uniform distribution, the higher its entropy. The maximum entropy of a distribution with $$n$$ possible outcomes is the entropy of the uniform distribution $$U_n$$, which is $$$ H(U_n) = -\sum_i p_{u_i} \log p_{u_i} = -\sum_i \frac{1}{n} \log \frac{1}{n} \ = -\log \frac{1}{n} = \log n $$$ (This can be proved easily once we introduce some additional concepts.)

A general and very helpful principle to remember is that RVs with uniform distributions are most informative.

The above definition of entropy is sometimes called Shannon entropy, to distinguish it from the older but weaker concept of entropy in physics.

Entropy in physics

The physicists' definition of entropy is a constant times the logarithm of the number of possible states that correspond to the observable macroscopic characteristics of a thermodynamic system: $$$ S=k_B \ln W $$$ where $$k_B$$ is the Boltzmann constant, $$\ln$$ is used instead of $$\log_2$$ because physics, and $$W$$ is the number of microstates. (Why do physicists denote entropy with the letter $$S$$? I don't know. Just be glad it wasn't a random Hebrew letter.)

In plain language: it is proportional to the Shannon entropy of finding out what is the exact configuration of bouncing atoms of the hot/cold/whatever box you're looking, out of all the ways the atoms could be bouncing inside that box given that the box is hot/cold/whatever, assuming that all those ways are equally likely. It is less general than the information theoretic entropy in the sense that it assumes a uniform distribution.

Entropy, either the Shannon or the physics version, seems abstract; random variables, numbers of microstates, what? However, $$S$$ as defined above has very real physical consequences. There's an important thermodynamics equation relating a change in entropy $$\delta S$$, a change in heat energy $$\delta Q$$, and temperature $$T$$ for a reversible process with the equation $$T\delta S = \delta Q$$, which sets a lower bound on how much energy you need to discover information (i.e., reduce the number of microstates that might be behind the macrostate you observe). Getting one bit of information means that $$\delta S$$ is $$k_B \ln 2$$ (from the definition of $$S$$), so at temperature $$T$$ kelvins we need $$k_B T \ln 2 \approx 9.6 \times 10^{-24} \times T$$ joules. This prevents arbitrarily efficient computers, and saves us from problems like Maxwell's demon. (Maxwell's demon is a thought experiment in physics: couldn't you violate the principle of increasing entropy (a physics thing) by building a box with a wall cutting it in half with a "demon" (some device) that lets slow particles pass left-to-right only and fast particles right-to-left, thus separating particles by temperature and reducing the number of microstates corresponding to the configuration of atoms inside the box? No, because the demon needs to expend energy to get information.)

Finally, is there an information-theoretic analogue of the second law of thermodynamics, which states that the entropy of a system always increases? You have to make some assumptions, but you can get to something like it, which I will sketch out in very rough detail and without explaining the terms (see Chapter 4 of Elements of Information Theory for the details). Imagine you have a probability distribution on the state space of a Markov chain. Now it is possible to prove that given any two such probability distributions, the distance between them (as measured using relative entropy; see below) is non-increasing. Now assume it also happens to be the case that the stationary distribution of the Markov chain is uniform (the stationary distribution is the probability distribution over states such that if every state sends out its probability mass according to the transition probabilities, you get back to the same distribution). We can consider an arbitrary probability distribution over the states, and compare it to the unchanging uniform one, and use the result that the distance between them is non-increasing to deduce that an arbitrary probability distribution will tend towards the uniform (= maximal entropy) one.

Reportedly, von Neumann (a polymath whose name appears in any mid-1900s mathsy thing) advised Shannon thus:

"You should call [your concept] entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage."

Intuition

We've snuck in the assumption that all information comes in the form of:

You first have some quantitative uncertainty over a known set of possible outcomes, which you specify in terms of a random variable $$X$$.
You find out the value that $$X$$ has taken.

There's a clear random variable if you're pulling numbers out of a hat: the possible values of $$X$$ are the numbers written on the pieces of paper in the hat, and they all have equal probability. But where is the random variable when the piece of information you get is, say, the definition of information? (I don't mean here the literal characters on the screen - that's a more boring question - but instead the knowledge about information theory that is now (hopefully) in your brain). The answer would have to be something like "the random variable representing all possible definitions of information" (with a probability distribution that is, for example, skewed towards definitions that include a $$\log$$ somewhere because you remember seeing that before).

This is a bit tricky to think about, but we see that even in this kind of weird case you can specify some kind of set and probabilities over that set. Fundamentally, knowledge (or its lack) is about having a probability distribution over states. Perfect knowledge means you have probability $$1.00$$ on exactly one state of how something could be. If you're very uncertain, you have a huge probability distribution over an unimaginably large set of states (for example, all possible concepts that might be a definition of information). If you've literally seen nothing, then you're forced to rely on some guess for the prior distribution over states, like all those pesky Bayesian statisticians keep saying.

More quantities

Conditional entropy

Entropy is a function of the probability distribution of a random variable. We want to be able to calculate the entropies of the random variables we encounter.

A common combination of random variables we see is $$X$$ given $$Y$$, written $$X | Y$$. The definition is $$$ P(X = x \, |\, Y = y) = \frac{P(X = x \,\land\, Y = y)}{P(Y=y)}. $$$ It is a common mistake to think that $$H(X|Y) = -\sum_i P(X = x_i | Y = y) \log P(X = x_i | Y = y)$$. What is it then? Let's just do the algebra: $$$ H(X|Y) = -\underset{x \sim X|Y, y \sim Y}{\mathbb{E}} \big( \log P(X=x|Y=y) \big) $$$ from the definition of the entropy as the expectation of the Shannon information content, and then by algebra: $$$ H(X|Y) = -\underset{x \sim X|Y, y \sim Y}{\mathbb{E}} \big[ \log P(X=x|Y=y) \big]$$$ $$$ = -\sum_{y \in \mathcal{Y}} P(Y=y) \sum_{x \in \mathcal{X}} P(X=x | Y=y) \log P(X=x \,|\, Y = Y)$$$ $$$ = -\sum_{y \in \mathcal{Y}, \, x \in \mathcal{X}}P(X=x\,\land\, Y = y) \log P(X=x \,|\, Y = Y) $$$ where $$\mathcal{X}$$ and $$\mathcal{Y}$$ are simply the sets of possible values of $$X$$ and $$Y$$ respectively. In a trick beloved of bloggers everywhere tired of writing up equations as $$\LaTeX$$, the above is often abbreviated $$$ \sum_{y \in \mathcal{Y}, \, x \in \mathcal{X}} p(x,y) \log p(x|y) $$$ where we use $$p$$ as a generic notation for "probability of whatever; random variables left implicit".

The conditional entropy $$X|Y$$ for a random variable $$X$$ given the value of another random variable $$Y$$, is written $$H(X|Y)$$ and defined as $$$ H(X|Y) = - \sum_{y \in \mathcal{Y}, \, x \in \mathcal{X}} p(x,y) \log p(x|y) $$$ which is lazier notation for $$$ -\sum_{y \in \mathcal{Y}, \, x \in \mathcal{X}}P(X=x\,\land\, Y = y) \log P(X=x \,|\, Y = Y). $$$ and also equal to $$$ -\sum_{y \in \mathcal{Y}, \, x \in \mathcal{X}} p(x,y) \log \frac{p(x, y)}{p(y)} $$$ It is most definitely not equal to $$\sum_{y \in \mathcal{Y}, \, x \in \mathcal{X}} p(x | y) \log p(x | y)$$.

Conditional entropy is a measure of how much information we expect to get from a random variable assuming we've already seen another one. If the RVs $$X$$ and $$Y$$ are independent, the answer is that $$H(X|Y) = H(X)$$. If the value of $$Y$$ implies a value of $$X$$ (e.g. "percentage of sales in the US" implies "percentage of sales outside the US"), then $$H(X|Y) = 0$$, since we can work out what $$X$$ is from seeing what $$Y$$ is.

Joint entropy

Now if $$H(X|Y)$$ is how much expected surprise there is left in $$X$$ after you've seen $$Y$$, then $$H(X|Y) + H(Y)$$ would sensibly be the total expected surprise in the combination of $$X$$ and $$Y$$. We write $$H(X,Y)$$ for this combination. If we do the algebra, we see that $$$ H(X,Y) = H(X|Y) + H(Y) $$$ $$$ = -\sum_{y \in \mathcal{Y}, \, x \in \mathcal{X}} p(x,y) \log \frac{p(x, y)}{p(y)} - \sum_{y \in \mathcal{Y}} p(y) \log p(y) $$$ $$$= -\left(\sum_{y \in \mathcal{Y}, \, x \in \mathcal{X}} p(x,y) \log p(x, y)\right) + \left( \sum_{y \in \mathcal{Y}, \,x\in \mathcal{X}} p(x,y) \log p(y)\right) -\left( \sum_{y \in \mathcal{Y}} p(y) \log p(y)\ \right) $$$ $$$= -\left(\sum_{y \in \mathcal{Y}, \, x \in \mathcal{X}} p(x,y) \log p(x, y)\right)$$$ = H(Z) $$$ if $$Z$$ is the random variable formed of the pair $$(X, Y)$$ drawn from the joint distribution over $$X$$ and $$Y$$.

Kullback-Leibler divergence, AKA relative entropy

"Kullback-Leibler divergence" is a bit of a mouthful. It is also called KL divergence, KL distance, or relative entropy. Intuitively, it is a measure of the distance between two probability distributions. For probability distributions represented by functions $$p$$ and $$q$$ over the same set $$\mathcal{X}$$, it is defined as $$$ D(p\,||\,q) = \sum_{x \in \mathcal{X}} p(x) \log \left(\frac{p(x)}{q(x)}\right). $$$ It's not a very good distance function; the only property of a distance function it meets is that it's non-negative. It's not symmetric (i.e. $$D(p \,||\, q) \ne D(q \,||\, p)$$) as you can see from the definition (especially considering how it breaks when $$q(x) = 0$$ but not if $$p(x) = 0$$). However, it has a number of cool interpretations, including how many bits you expect to lose on average if you build a code assuming a probability distribution $$q$$ when it's actually $$p$$, and how many bits of information you get in a Bayesian update from distribution $$q$$ to distribution $$p$$. It is also a common loss function in machine learning. The first argument $$p$$ is generally some better or true model, and we want to know how far away $$q$$ is from it.

Why the uniform distribution maximises entropy

The KL divergence gives us a nice way of proving that the uniform distribution maximises entropy. Consider the KL divergence of an arbitrary probability distribution $$p$$ from the uniform probability distribution $$u$$: $$$ D(p \,||\, u ) = \sum_{x \in \mathcal{X}} p(x) \log \left(\frac{p(x)}{q(x)}\right) $$$ $$$= \sum_{x \in \mathcal{X}} \big( p(x) \log p(x)\big) - \sum_{x \in \mathcal{X}} \big(p(x) \log q(x) \big) $$$ $$$= -H(X) - \sum_{x \in \mathcal{X}} p(x) \log \frac{1}{|\mathcal{X}|} $$$ $$$= H(X) - H(U) $$$ where $$\mathcal{X}$$ is the set of values over which $$p$$ and $$u$$ have non-zero values, $$X$$ is a random variable distributed according to $$p$$, and $$U$$ is a random variable distributed according to $$u$$ (i.e. uniformly). This is the same thing as $$$ H(X) = H(U) + D(p \,||\,u) $$$ which implies that we can write the entropy of a random variable as the entropy of a uniform random variable over a set of the same size, plus the KL distance between the distribution of $$X$$ and the distribution of the uniform random variable. Also, since all three quantities in the above equation are guaranteed to be non-negative, this implies that $$$ H(X) \leq H(U) $$$ and therefore that the uniform random variable has higher entropy than any other random variable over the same number of outcomes.

Mutual information

Earlier, we saw that $$H(X, Y) = H(X|Y) + H(Y) = H(X) + H(Y|X)$$. As a picture:

There's an overlapping region, representing the information you get no matter which of $$X$$ or $$Y$$ you look at. We call this the mutual information, a refreshingly sensible name, and denote it $$I(X;Y)$$, somewhat less sensibly. One way to find it is $$$ I(X;Y) = H(X,Y) - H(X|Y) - H(Y|X)$$$ $$$= - \sum_{x,y} p(x,y) \log p(x,y) \,+\, \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(y)} \,+\, \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)}$$$ $$$= \sum_{x,y} p(x,y) \big( \log p(x,y) - \log p(x) - \log p(y) \big)$$$ $$$= \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)p(y)}. $$$ Does this look familiar? Recall the definition $$$ D(p\,||\,q) = \sum_{x \in \mathcal{X}} p(x) \log \left(\frac{p(x)}{q(x)}\right). $$$ What we see is that $$$ I(X;Y) = D(p(x, y) \, || \, p(x) p(y)), $$$ or in other words that the mutual information between $$X$$ and $$Y$$ is the "distance" (as measured by KL divergence) between the probability distributions $$p(x,y)$$ - the joint distribution between $$X$$ and $$Y$$ - and $$p(x) p(y)$$, the joint distribution that $$X$$ and $$Y$$ would have if $$x$$ and $$y$$ were drawn independently.

If $$X$$ and $$Y$$ are independent, then these are the same distribution, and their KL divergence is 0.

If the value of $$Y$$ can be determined from the value of $$X$$, then the joint probability distribution of $$X$$ and $$Y$$ is a table where for every $$x$$, there is only one $$y$$ such that $$p(x,y) > 0$$ (otherwise, there would be a value $$x$$ such that there is uncertainty about $$Y$$). Let the function mapping an $$x$$ to the singular $$y$$ such that $$p(x,y) > 0$$ be $$f$$. Then $$$ I(X;Y) = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)p(y)}$$$ $$$= \sum_y p(y) \sum_{x | f(x) = y} p(x|y) \log \frac{p(x, f(x))}{p(x)p(y)}. $$$ Now $$p(x, f(x)) = p(x)$$, because there is no $$y \ne f(x)$$ such that $$p(x, y) \ne 0$$. Therefore we get that the above is equal to $$$ \sum_y p(y) \sum_{x | f(x) = y} p(x|y) \log \frac{p(x)}{p(x)p(y)}\ = - \sum_y p(y) \sum_{x | f(x) = y} p(x|y) \log p(y), $$$ and since $$\log p(y)$$ does not depend on $$x$$, we can sum out the probability distribution to get $$$ -\sum_y p(y) \log p(y) = H(Y). $$$ In other words, if $$Y$$ can be determined from $$X$$, then the expected information that $$X$$ gives about $$Y$$ is the same as the expected information given by $$Y$$.

We can graphically represent the relations between $$H(X)$$, $$H(Y)$$, $$H(X|Y)$$, $$H(Y|X)$$, $$H(X,Y)$$, and $$I(X;Y)$$ like this:

Having this image in your head is the single most valuable thing you can do to improve your ability to follow information theoretic maths. Just to spell it out, here are some of the results you can read out from it: $$$H(X,Y) = H(X) + H(Y|X) $$$ $$$H(X,Y) = H(X|Y) + H(Y) $$$ $$$H(X,Y) = H(X|Y) + I(X;Y) + H(Y|X) $$$ $$$H(X,Y) = H(X) + H(Y) - I(X;Y) $$$ $$$H(X) = I(X;Y) + H(Y|X)$$$ This diagram is also sometimes drawn with Venn diagrams:

Data processing inequality

A Markov chain is a series of random variables such that the $$(n+1)$$th is only directly influenced by the $$n$$th. If $$X \to Y \to Z$$ is a Markov chain, it means that all effects $$X$$ has on $$Z$$ are through $$Y$$.

The data processing inequality states that if $$X \to Y \to Z$$ is a Markov chain, then $$$ I(X; Y) \geq I(X; Z). $$$ This should be pretty intuitive, since the mutual information $$I(X;Y)$$ between $$X$$ and $$Y$$, which have a direct causal link between them, shouldn't be higher than that between $$X$$ and the more-distant $$Z$$, which $$X$$ can only influence through $$Y$$.

A special case is the Markov chain $$X \to Y \to f(Y)$$, where $$X$$ is, say, what happened in an abandoned parking lot at 3am, $$Y$$ is the security camera footage, and $$f$$ is some image enhancing process (more generally: any deterministic function of the data $$Y$$). The data processing inequality tells us that $$$ I(X; Y) \geq I(X; f(Y)). $$$ In essence, this means that any function you try to apply to some data $$Y$$ you have about some event $$X$$ cannot increase the information about the event that is available. Any enhancing function can only make it easier to spot some information about the event that is already present in the data you have about it (and the function might very plausibly destroy some). If all you have are four pixels, no amount of image enhancement wizardry will let you figure out the perpetrator's eye colour.

The proof (for the general case of $$X \to Y \to Z$$) goes like this: consider $$I(X; Y,Z)$$ (that is, the mutual information between knowing $$X$$ and knowing both $$Y$$ and $$Z$$). Now consider the different values in Venn diagram form:

$$I(X; Y, Z)$$ corresponds to all areas within the circle representing $$X$$ that are also within at least one of the circle for $$Y$$ or $$Z$$. If we knew both $$Y$$ and $$Z$$, this "bite" is how much would be taken out of the uncertainty $$H(X)$$ of $$X$$.

We see that the red lined area is $$I(X; Y|Z)$$ (the information shared between $$X$$ and the part of $$Y$$ that remains unknown if you know $$Z$$), and likewise the green hatched area is $$I(X; Y; Z)$$ and the blue dotted area is $$I(X;Z|Y)$$. Since the red-lined and green-hatched areas together are $$I(X;Y)$$, and the green-hatched and blue-dotted areas together are $$I(X;Z)$$, we can write both $$$ I(X; \,Y,Z) = I(X;\,Y) + I(X;\,Z|Y)$$$ $$$I(X; \,Y,Z) = I(X;\,Z) + I(X;\,Y|Z) $$$ But hold on - $$I(X;Z|Y)=0$$ by the definition of a Markov chain, since no influence can pass from $$X$$ to $$Z$$ without going through $$Y$$, meaning that if we know everything about $$Y$$, nothing more we can learn about $$Z$$ will tell us anything more about $$X$$.

Since that term is zero, we have $$$ I(X; \; Y) = I(X; \; Z) + I(X; \, Y|Z) $$$ and since mutual information must be non-negative, this in turn implies $$$ I(X;Y) \geq I(X;Z). $$$

Two big things: source & channel coding

Much of information theory concerns itself with one of two goals.

Source coding is about data compression. It is about taking something that encodes some information, and trying to make it shorter without losing the information.

Channel coding is about error correction. It is about taking something that encodes some information, and making it longer to try to make sure the information can be recovered even if some errors creep in.

The basic model that information theory deals with is the following:

We have some random variable $$Z$$ - the contents of a text message, for example - which we encode under some coding scheme to get a message consisting of a sequence of symbols that we send over some channel - the internet, for example - and then hopefully recover the original message. The channel can be noiseless, meaning it transmits everything perfectly and can be removed from the diagram, or noisy, in which case some there is a chance that for some $$i$$, the $$X_i$$ sent into the channel differs from the $$Y_i$$ you get out.

Source coding is about trying to minimise how many symbols you have to send, while channel coding is about trying to make sure that $$\hat{Z}$$, the estimate of the original message, really ends up being the original message $$Z$$.

A big result in information theory is that for the above model, it is possible to separate the source coding and the channel coding, while maintaining optimality. The problems are distinct; regardless of source coding method, we can use the same channel method and still do well, and vice versa. Thanks to this result, called the source-channel separation theorem, source and channel coding can be considered separately. Therefore, our model can look like this:

(We use $$X^n$$ to refer to a random variable representing a length-$$n$$ sequence of symbols)

Both source and channel coding consist of:

a central but tricky theorem giving theoretical bounds and motivating some definitions
a bunch of methods that people have invented for achieving something close to those theoretical bounds in practice

Next see the source coding post and the channel coding post.

Death is bad

2021-10-17T23:14:00.000+01:00

3.5k words (about 12 minutes)

Sometime in the future, we might have the technology to extend lifespans indefinitely and make people effectively immortal. When and how this might happen is a complicated question that I will not go into. Instead, I will take heed of Ian Malcolm in Jurassic Park, who complains that "your scientists were so preoccupied with whether or not they could that they didn't stop to think if they should".

This is (in my opinion rather surprisingly) a controversial question.

The core of it is this: should people die?

Often the best way to approach a general question is to start by thinking about specific cases. Imagine a healthy ten-year old child; should they die? The answer is clearly no. What about yourself, or your friends, or the last person you saw on the street? Wishing for death for yourself or others is almost universally a sign of a serious mental problem; acting on that desire even more so.

There are some exceptions. Death might be the best option for a sick and pained 90-year-old with no hope of future healthy days. It may well be (as I've seen credibly claimed in several places) that the focus on prolonging lifespan even in pained terminally ill people is excessive. "Prolong life, whatever the cost" is a silly point of view; maximising heartbeats isn't what we really care about.

However, now imagine a pained, dying, sick person who has a hope of surviving to live many healthy happy days – say a 40-year-old suffering from cancer. Should they die? No. You would hope that they get treatment, even if it's nauseating fatiguing painful chemotherapy for months on end. If there is no cure, you'd hope that scientists somewhere invent it. Even if it does not happen in time for that particular person, at least it will save others in the future, and eliminate one more horror of the world. It would be a great and celebrated human achievement.

What's the difference between the terminally ill 90-year-old and the 40-year-old with a curable cancer? The difference is technology. We have the technology to cure some cancers, but we don't have the technology to cure the many ageing-related diseases. If we did, then even if the treatment is expensive or difficult, we would hope – and consider it a moral necessity – for both of them to get it, and hope that they both go on living for many more years.

No one dies of time. You are a complex process running on the physical hardware of your brain, which is kept running by the machine that is the rest of your body. You die when that machine breaks. There is no poetic right time when you close your eyes and get claimed by time, there is only falling to one mechanical fault or another.

People (or conscious beings in general) matter, and their preferences should be taken seriously – this is the core of human morality. What is wrong in the world can be fixed – this is the guiding principle of civilisation since the Enlightenment.

So, should people die? Not if they don't want to, which (I assume) for most people means not if they have a remaining hope of happy, productive days.

Counterarguments

The idea that death is something to be defeated, like cancer, poverty, or smallpox, is not a common one. Perhaps there's some piece of the puzzle that is missing from the almost stupidly simple argument above?

One of the most common counterarguments is overpopulation (perhaps surprisingly; environmentalist concerns have clearly penetrated very deep into culture despite not being much of a thing before the 1970s). The argument goes like this: if we solve death, but people keep being born, there will be too many people on Earth, leading to environmental problems, and eventually low quality of life for everyone.

The object-level point (I will return to what I consider more important meta-level points later) is that demographic predictions have a tendency to be wrong, especially about the future (as the Danish (?) saying goes). Malthus figured out pre-industrial demographics just as they came to an end with the industrial revolution. In the 1960s, there were warnings of a population explosion, which fizzled out when it turned out that the demographic transition (falling birth rates as countries develop) is a thing. Right now the world population is expected to stabilise at less than 1.5x the current size, and many developed countries are dealing with problems caused by shrinking populations (which they strangely refuse to fix through immigration).

Another concern are the effects of having a lot of old people around. What about social progress – how would the development of women's rights have been realised if you had a bunch of 19th century misogynists walking around in their top hats? What sort of power imbalances and Gini coefficients would we reach if Franklin Delano Roosevelt could continue cycling through high-power government roles indefinitely, or Elon Musk had time to profit from the colonisation of Mars? What happens to science when it can no longer advance (as Max Planck said) one funeral at at time?

(There is even an argument that life extension technology is problematic because the rich will get it first. This is an entirely general and therefore entirely worthless argument, since it applies to all human progress: the rich got iPhones first – clearly smartphones are a problematic technology, etc., etc. If you're worried about only the rich having access to it for too long, the proper response is to subsidise its development so that the period when not everyone has access to it is as short as possible.)

These are valid concerns that will definitely test the abilities of legislators and voters in the post-death era. However, they can probably be overcome. I think people can be brought around surprisingly far on social and moral attitudes without killing anyone. Consider how pre-2000 almost anyone's opinions would have made them a near-pariah today; many of those people still exist and it would hard to write them off as a total loss. Maybe some minority of immortal old people couldn't cope with all the Pride Parades – or whatever the future equivalent is – marching past their windows and they go off to start some place of their own with sufficient top hat density; then again, most countries have their own conservative backwater region already. If they start going for nukes, that's more of an issue, but not more so than Iran.

As for imbalances of power and wealth, it might require a few more taxes and other policies (the expansion of term limits to more jobs?), but given the strides that equalising policy-making has made it seems hard to argue there is a fundamental impossibility.

And what about all the advantages? A society of the undying might well be far more long-term oriented, mitigating one of the greatest human failures. After all, how often do people bemoan that 70-year-old oil executives just don't care because they won't be around to see the effects of climate change?

What about all the collective knowledge that is lost? Imagine if people in 2050 could hear World War II veterans reminding them of what war really is. Imagine if John von Neumann could have continued casually inventing fields of maths at a rate of about two per week instead of dying at age 53 (while absolutely terrified of his approaching death). Imagine if we could be sure to see George R. R. Martin finish A Song of Ice and Fire.

Also, concerns like overpopulation and Elon Musk's tax plan just seem small in comparison to the literal eradication of death.

Imagine proposing a miracle peace plan to the cabinets of the Allied countries in the midst of World War II. The plan would end the war, install liberal governments in the Axis powers, and no one even has to nuke a Japanese city. (If John von Neumann starts complaining about not getting to test his implosion bomb design, give him a list of unsolved maths problems to shut him up.) Now imagine that the reaction is somewhere between hesitance and resistance, together with comments like "where are we going to put all the soldiers we've trained?", "what about the effects on the public psyche of a random abrupt end without warning?", and "how will we make sure that the rich industrialists don't profit too much from all the suddenly unnecessary loans that they've been given?" At this point you might be justified in shouting: "this war is killing fifteen million people per year, we need to end it now".

The situation with death is similar, except it's over fifty million per year rather than fifteen. (See this chart for breakdown by cause – you'll see that while currently-preventable causes like infectious diseases kill millions, ageing-related ones like heart disease, cancer, and dementia are already the majority.)

Thought experiments

To make the question more concrete, we can try thought experiments. Imagine a world in which people don't die. Imagine visitors from that world coming to us. Would they go "ah yes, inevitable oblivion in less than a century, this is exactly the social policy we need, thanks – let us go run back home and implement it"? Or would they think of our world like we do of a disease-stricken third-world country, in dire need of humanitarian assistance and modern technology?

It's hard to get into the frame of mind of people who live in a society that doesn't hand out automatic death sentences to everyone at birth. Instead, to evaluate whether raising life expectancies to 200 makes sense even given the environmental impacts, we can ask whether a policy of killing people at age 50 to reduce population pressures would be even better than the current status quo – if both an increase and decrease in life expectancies is bad, this is suspicious because it implies we're at the optimum by chance. Or, since the abstract question (death in general) is always harder than more concrete ones, imagine withholding a drug that manages heart problems in the elderly on overpopulation grounds.

You might argue that current life expectancies are optimal. This is a hard position to defend. It seems like a coincidence that the lifespan achievable with modern technology is exactly the "right" one. Also, neither you nor society should not make that choice for other people. Perhaps some people get bored of life and readily step into coffins at age 80; many others want nothing more than to keep living. People should get what they want. Forcing everyone to conform to a certain lifespan is a specific case of forcing everyone to conform to a certain lifestyle; much moral progress in the past century has consisted of realising that this is bad.

I think it's also worth emphasising one common thread in the arguments against solving death: they are all arguments about societal effects. It is absolutely critical to make sure that your actions don't cause massive negative externalities, and that they also don't amount to defecting in prisoner's dilemma or the tragedy of the commons. However, it is also absolutely critical that people are happy and aren't forced to die, because people and their preferences/wellbeing are what matters. Society exists to serve the people who make it up, not the other way around. Some of the worst moral mistakes in history come from emphasising the collective, and identifying good and harm in terms of effects on an abstract collective (e.g. a nation or religion), rather than in terms of effects on the individuals that make it up. Saying that everyone has to die for some vague pro-social reason is the ultimate form of such cart-before-the-horse reasoning.

Why care about the death question?

There are several features that make the case against death, and people's reactions to it, particularly interesting.

Failure of generalisation

First: generalisation. I started this post using specific examples before trying to answer the more general question. I think the popularity of death is a good example of how bad humans are at generalising.

When someone you know dies, it is very clearly and obviously a horrible tragedy. The scariest thing that could happen to you is probably either your own death, the death of people you care about, or something that your brain associates with death (the common fears: heights, snakes, ... clowns?).

And yet, make the question more abstract – think not about a specific case (which you feel in your bones is a horrible tragedy that would never happen in a just world), but about the general question of whether people should die, and it's like a switch flips: a person who would do almost anything to save themselves or those they care about, who cares deeply about suffering and injustice in the world, is suddenly willing to consign five times the death toll of World War I to permanent oblivion every single year.

Stalin reportedly said that a single death is a tragedy, but a million is only a statistic. Stalin is wrong. A single death is a tragedy, and a million deaths is a million tragedies. Tragedies should be stopped.

People These Days

Second: today, we're pretty good at ignoring and hiding death. This wasn't always the case. If you're a medieval peasant, death is never too far away, whether in the form of famine or plague or Genghis Khan. Death was like an obnoxious dinner guest: not fun, but also just kind of present in some form or another whether you invited them or not, so out of necessity involved in life and culture.

Today, unexpected death is much rarer. Child mortality globally has declined from over 40% (i.e. almost every family had lost a child) in 1800 to 4.5% in 2015, and below 0.5% in developed countries. Famines have gone from something everyone lives through to something that the developed world is free from. War and conflict have gone from common to uncommon. Much greater diseases and accidents can be successfully treated. As a result of all these positive trends, death is less present in people's minds.

As I don't have my culture critic license yet, I won't try to make some fancy overarching points about how People These Days Just Don't Understand and how our Materialistic Culture fails to prepare people to deal with the Deep Questions and Confront Their Own Mortality. I will simply note that (a) death is bad, (b) we don't like thinking about bad things, and (c) sometimes not wanting to think about important things causes perverse situations.

Confronting problems

Why do people not want to think that death is bad? I think one central reason is that death seems inevitable. It's tough to accept bad things you can't influence, and much easier to try to ignore them. If at some point you have to confront it anyways, one of the most reassuring stories you can tell is that it has a point. Imagine if over two hundred thousand years, generation after generation of humans, totalling some one hundred billion lives, was born, grew up, developed a rich inner world, and then had that world destroyed forever by random failures, evolution's lack of care for what happens after you reproduce, and the occasional rampaging mammoth. Surely there must be some purpose for it, some reason why all that death is not just a tragedy? Perhaps we aren't "meant" to live long, whatever that means, or perhaps it's all for the common good, or that "death gives meaning to life". Far more comforting to think that then to acknowledge that a hundred billion human lives and counting really are gone forever because they were unlucky enough to be born before we eradicated smallpox, or invented vaccines, or discovered antibiotics, or figured out how to reverse ageing.

Assume death is inevitable. Should you still recognise the wrongness of it?

I think yes, at least if you care about big questions and doing good. I think it's important to be able to look at the world, spot what's wrong about it, and acknowledge that there are huge things that should be done but are very difficult to achieve.

In particular, it's important to avoid the narrative fallacy (Nassim Taleb's term for the human tendency to want to fit the world to a story). In a story, there's a start and an end and a lesson, and the dangers are typically just small enough to be defeated. Our universe has no writer, only physics, and physics doesn't care about hitting you with an unsolvable problem that will kill everyone you love. If you want to increase the justness of the world, recognising this fact is an important starting point.

Taxes

Is death inevitable? In considering this question, it's important once again to remember that death is not a singular magical thing. Your death happens when something breaks badly enough that your consciousness goes permanently offline.

Things, especially complex biological machines produced by evolution, can break in very tricky ways. But what can break can be fixed, and people who declare technological feats impossible have a bad track record. The problem might be very hard: maybe we have to wait until we have precision nano-bots that can individually repair the telomeres on each cell, or maybe there is no effective general solution to ageing and we face an endless grind of solving problem after problem to extend life/health expectancies from 120 to 130 to 140 and so forth. Then again, maybe someone leaves out a petri dish by accident in a lab and comes back the next day to the fountain of youth, or maybe by the end of the century no one is worrying about something as old-fashioned as biology.

There's also the possibility of stopgap solutions, like cryonics (preserving people close to death by vitrifying them and hoping that future technology can revive them). Cryonics is currently in a very primitive state – no large animals successfully having been put through it – but there's a research pathway of testing on increasingly complex organs and then increasingly large animals that might eventually lead to success if someone bothered to pour resources into it.

There is no guarantee when this is happening. If civilisation is destroyed by an engineered pandemic or nuclear war before then, it will never happen.

Of course, in the very long run we face more fundamental problems, like the heat death of the universe. Literally infinite life is probably physically impossible; maybe this is reassuring.

Predictions and poems

I will make three predictions about the eventual abolition of death.

First, many people will resist it. They might see it as conflicting with their religious views or as exacerbating inequality, or just as something too new and weird or unnatural.

Second, when the possibility of extending their lifespan stops being an abstract topic and becomes a concrete option, most people will seize it for themselves and their families.

This is a common path for technologies. Lightning rods and vaccines were first seen by some as affronts to God's will, but eventually it turns out people like not burning to death and not dying of horrible diseases more than they like fancy theological arguments. Most likely future generations will discover that they like not ageing more than they like appreciating the meaning of life by definitely not having one past age 120.

Finally, future people (if they exist) will probably look back with horror on the time when everyone died against their will within about a century.

Edgar Allen Poe wrote a poem called "The Conqueror Worm", about angels crying as they watch a tragic play called "Man", whose (anti-)hero is a monstrous worm that symbolises death. If we completely ignore what Poe intended with this, we can misinterpret one line to come to a nice interpretation of our own. The poem declares that the angels are watching this play in the "lonesome latter years". Clearly this refers to a future post-scarcity, post-death utopia, and the angels are our wise immortal descendants reflecting on the bad old days, when people were "mere puppets [...] who come and go / at the bidding of vast formless things" like famine and war and plague and death. The "circle [of life] ever returneth in / To the self same spot [= the grave]", and so the "Phantom [of wisdom and fulfilled lives] [is] chased for evermore / By a crowd that seize it not".

Death is a very poetic topic, and other poems need less (mis)interpretation. Edna St. Vincent Millay's "Dirge Without Music" is particularly nice, while Dylan Thomas gives away the game in the title: "Do not go gentle into that good night".

Short reviews: biographies

2021-09-30T21:23:00.001+01:00

Books reviewed (all by Walter Isaacson):
The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race (2021)
Steve Jobs: The Exclusive Biography (2011)
Benjamin Franklin: An American Life (2004)

3.5k words (about 12 minutes)

Why read biographies? If you want stories of people and interesting characters, fiction is better. If you want general, big truths, then you're probably better off reading the many non-fiction books that are about abstract truths and far-ranging concepts rather than the particulars of a single person's life.

Consider, for a moment, designing an algorithm for a problem. The classic way to do this is to think hard about the problem, and then write down a specific series of steps that take you from inputs to (hopefully the correct) outputs. In contrast, the machine learning method is to use statistical methods on a long list of examples to make a model that (hopefully) approximates the mapping between inputs and outputs.

Reading explicit abstract arguments is like the first method. Like explicit algorithm design, it comes with some nice properties – it's very clear exactly how it generalises and when it's applicable – to the point where it's easy to scoff at the less explicit methods: "it's just a black box that our pile of statistics spits out" / "it's just anecdotes about someone's life".

However, much like machine learning methods can extract subtle lessons from a long list of examples, I think there is implicit knowledge contained in the long list of detail about someone's life that you find in a biography (at least if you read about people who did interesting things in their life – but then again, if there's a biography of someone ...). Once you've read the details of how CRISPR was invented, Apple jump-started, or compromises reached at the1787 American Constitutional Convention, I think your model of how science, business, and politics work in the real world is improved in many subtle ways.

(Note that this argument also applies to reading history.)

And of course, since biographies deal strongly with character, there is an element of the novel-like thrill of watching things happen to people.

Walter Isaacson's biographies

I've read four of Walter Isaacson's biographies. Their subjects are Albert Einstein, Jennifer Doudna, Steve Jobs, and Benjamin Franklin.

The Einstein one I read years ago, and don't remember much detail about. It did earn a 6 out of 7 on my books spreadsheet though.

The Jennifer Doudna biography is the weakest. The main reason is that we don't get too much insight into Doudna herself or the way she carried out her scientific work, leaving Isaacson to spend many pages on other things: overviews of other players in the development of the gene-editing tool CRISPR that are more journalistic than biographical, and descriptions of the biology that are limited by Isaacson's lack of biological expertise (at least when compared to the best popular biology writing, like Richard Dawkins' in The Selfish Gene). Hand-wringing over James Watson's controversies takes up an alarming amount of space that is only partly justified by Watson's role as a childhood inspiration for Doudna. There's also a long section about the struggles behind the allocation of the CRISPR Nobel Prize (awarded in 2020) that is clearly balanced and thoroughly researched, but simply less interesting to me than similar segments in the Jobs or Franklin biographies, where the stakes are the fate of companies or nations, rather than who gets a shiny medal.

My guess is that these faults stem mainly from the more limited material Isaacson had access to. Albert Einstein and Benjamin Franklin are both among the most researched individuals in history. To the extent that Steve Jobs is behind, the interviews Isaacson personally conducted seem to have plugged the gap.

Doudna is still an inspiring person. She also has the enviable advantage of not being dead, and therefore may yet do even more and become the subject of further biographies. If you're interested in biotech, including the business side, or scientific careers that may one day win Nobel Prizes, the biography may well be worth reading.

Steve Jobs

A god-like experimenter who wants to figure out what traits make tech entrepreneurs succeed may proceed something like this: create a bunch of people with extreme strengths in some areas and extreme weaknesses in others, release them into the world to start companies, and see which extreme strengths can balance out which extreme weaknesses. Such an experiment might well create Steve Jobs.

Take one weakness: Jobs's emotional volatility and, for lack of a better word, general nastiness in some circumstances, including things from extremely harsh criticism of employees' work to horrible table manners at restaurants. This isn't unique to Jobs either: look at the Wikipedia pages for Bill Gates and Jeff Bezos, and you'll find that they brighten their subordinates' work days with such productive witticisms as "that's the stupidest thing I've ever heard" and "why are you ruining my life?" respectively.

Does this show that behaviour up to and including verbal abuse is a forgivable flaw, or even beneficial, in tech CEOs?

First, though verbal abuse is neither productive nor right, a culture of vigorous debate is a distinct thing with incredible benefits, and the idea that it serves only to hurt and marginalise is not just a misguided generalisation but sometimes diametrically wrong. The best example is Daniel Ellsberg recounting an anecdote from his early times at RAND Corporation in The Doomsday Machine (an unrelated book; my review here):

Rather than showing irritation or ignoring my comment [that he made at the first meeting], Herman Kahn, brilliant and enormously fat, sitting directly across the table from me, looked at me soberly and said, "You're absolutely wrong."

A warm glow spread through my body. This was the way my undergraduate fellows on the editorial board of the Harvard Crimson (mostly Jewish, like Herman and me) had spoken to each other; I hadn't experienced anything like it for six years. At King's College, Cambridge, or in the Society of Fellows, arguments didn't remotely take this gloves-off, take-no-prisoners form. I thought, "I've found a home."

Steve Jobs admittedly goes overboard with this. For example, people who worked with him had to learn that "this is shit" meant "that's interesting, could you elaborate and make the case for your idea further?". This is not just unnecessarily rude, but also unclear communication. The general impression that Isaacson gives is also not that Jobs was combative as a thought-out strategy, but rather that this was just his style of interaction.

I suspect that the famous combativeness of many tech CEOs is not itself a useful trait, but instead adjacent to several other traits that are, in particular disagreeableness (in the sense of willing to disagree with others and not feel pressure to conform) and perhaps also caring deeply about the product.

Consider another extreme Jobs trait: strange diets, and (in his youth), a belief that he didn't need to shower because of his dieting. This went so far that of the people Isaacson interviews about Jobs's youth, including those who hadn't seen him for decades, almost every one mentions something like "yeah, he stank". Yet while some leap to defend and (worse yet) emulate Jobs's verbal nastiness, presumably on grounds of its correlation with his success, far fewer do the same for his dieting and showering habits. (What conformists!)

I think the more general lesson is that Jobs was extreme in a lot of ways, including in the strength of his opinions and beliefs, and in not having a filter between them and his actions. He gets into eastern mysticism and goes off to India to become a monk. He gets into dieting and starts eating only fruit rather than just reading lifestyle magazines and half-heartedly trying diets for a week like most people might. He gets it into his head that the corner of a Mac isn't rounded enough and declares that in no uncertain terms.

So is that the key then: have firm convictions? We've gone from a maladaptive cliché to a trite one – and still not a very helpful one. Steve Jobs, with his "reality distortion field", may have been an expert at persuading people, but even he can't persuade reality to be another way. Even slightly wrong convictions tend to have nasty collisions with reality.

(It's worth noting that rather than being a stickler for one position or solution, Jobs tended to yo-yo back and forth between extremes, only slowly converging on a decision – something that often confused others at Apple until they learned to use a rolling average of his recent positions.)

The critical part, of course, was that Steve Jobs was right about a lot of things, despite several serious missteps (especially in regards to making over-expensive computers that no one wants to pay for). I think Jobs's success provides evidence that even in aesthetic matters, success has a surprisingly strong component of being actually right. And Jobs, who was all-around very bright despite not being a master of the technical side, seems to have mastered this.

Of course, the story of Jobs's success – which came in spite of his emotional volatility, and tendency to wish away problems rather than facing them – does not entirely fit the idea that success comes in large part from having well-calibrated beliefs about the world and going about achieving them in reasonable and rational ways.

I think there are three things worth keeping in mind.

First, it may well be that most successful people are successful "at random" (i.e. without having a rational strategy for achieving what they want to achieve), but that the probability of achieving your goals given that you have well-calibrated beliefs and a rational reality-accommodating plan is still very much higher than the probability of achieving them given any other strategy. That is, if $S$ is the event of being very successful (by some definition), $R$ the event that you follow a rational strategy and maintain well-calibrated beliefs and generally practice thought patterns that won't get you downvoted on LessWrong, $\neg R$ the complement of that event, $P(\neg R|S)$ can be high (i.e. most successful people became successful in not particularly smart ways), while $P(S|R)$ can be much higher than $P(S|\neg R)$ (following a rational strategy still gives you by far the best chances of success).

Second, Jobs's life illustrates the principle that you only have to be very right a small number of times – just like in general most of the return, especially in anything risky, comes from a small number of bets. He failed at managing, even when working under another CEO who had been brought in specifically to babysit him, to the extent that he was kicked out of his own company. He failed to build successful hardware after founding NeXT. However, he was really right about product design, and that was enough.

Third, though he did get away with ignoring many uncomfortable truths by simply willing them away, eventually reality hit back. He delayed dealing with the cancer threat when he was first told of it, and he trusted alternative treatments. The combination may well have killed him.

Benjamin Franklin

Benjamin Franklin was a newspaper publisher, writer, postmaster, ambassador, political leader, and scientist. He invented the lightning rod and realised that electric charge came in both a positive and negative form (and gave those names to them, as temporary ones until "[English] philosophers give us better").

He was one of the first or most influential pioneers of many other things as well; to take a random example, he thought up the idea of matched funding for a charitable project (and was quite proud of it too: "I do not remember any of my political maneuvers the success of which gave me at the time more pleasure, or that in after thinking about it I more easily excused myself for having made use of cunning").

More generally, he clearly enjoyed numbers and detail:

[...H]e loved immersing himself in minutiae and trivia in a manner so obsessive that it might today be described as geeky. He was meticulous in describing every technical detail of his inventions, be it the library arm, stove, or lightning rod. In his essays, ranging from his arguments against hereditary honors to his discussions of trade, he provided reams of detailed calculations and historical footnotes. Even in his most humorous parodies, such as his proposal for the study of farts, the cleverness was enhanced by his inclusion of mock-serious facts, trivia, calculations, and learned precedents

Do-gooders with time machines could do worse than giving him access to a spreadsheet program.

One of the best descriptions of Franklin's personality comes from Isaacson's comparison of him with John Adams (when they were both in Paris, late in Franklin's life):

Adams was unbending and outspoken and argumentative, Franklin charming and taciturn and flirtatious. Adams was rigid in his personal morality and lifestyle, Franklin famously playful. Adams learned French by poring over grammar books and memorizing a collection of funeral orations; Franklin (who cared little about the grammar) learned the language by lounging on the pillows of his female friends and writing them amusing little tales. Adams felt comfortable confronting people, whereas Franklin preferred to seduce them, and the same was true of the way they dealt with nations.

One striking things when reading about 18th century events is the informality and nepotism. For example, to become postmaster of the colonies, Franklin spent significant money on having a friend lobby on his behalf in London, and upon obtaining the position gave out cushy jobs to his son, brothers, brother's stepson, sister's son, and two of his wife's relatives.

Not only that, but the border between truth and fiction was also hazy in the press. Articles could be, without any differentiating label, either factual, obviously satirical, satirical in a way that takes a clever reader to spot, or outright hoaxes. Likewise Franklin often wrote and published letters to his own newspaper under pseudonyms, with various levels of disguise ranging from clearly transparent to purposefully anonymous (this, however, was normal, as it was often seen as unworthy of gentlemen to write such letters under their own names).

In other ways, the 18th century, and 18th century Franklin in particular, were surprisingly modern and liberal. Franklin took a very reasonable and liberal stance on the freedom of press:

“It is unreasonable to imagine that printers approve of everything they print. It is likewise unreasonable what some assert, That printers ought not to print anything but what they approve; since […] an end would thereby be put to free writing, and the world would afterwards have nothing to read but what happened to be the opinions of printers.”

He still exercised judgement over what he printed. When deciding whether to print something that violated his principles for money, he (reportedly) went through a process that many modern newspaper editors and Facebook engineers could well take to heart:

To determine whether I should publish it or not, I went home in the evening, purchased a twopenny loaf at the baker’s, and with the water from the pump made my supper; I then wrapped myself up in my great-coat, and laid down on the floor and slept till morning, when, on another loaf and a mug of water, I made my breakfast. From this regimen I feel no inconvenience whatever. Finding I can live in this manner, I have formed a determination never to prostitute my press to the purposes of corruption and abuse of this kind for the sake of gaining a more comfortable subsistence.

The 18th century offers some perspective about hostile politics too. After describing an extremely personal and angry election campaign (which Franklin lost), Isaacson writes:

Modern election campaigns are often criticized for being negative, and today’s press is slammed for being scurrilous. But the most brutal of modern attack ads pale in comparison to the barrage of pamphlets in the 1764 [Pennsylvania] Assembly election. Pennsylvania survived them, as did Franklin, and American democracy learned that it could thrive in an atmosphere of unrestrained, even intemperate, free expression. As the election of 1764 showed, American democracy was built on a foundation of unbridled free speech. In the centuries since then, the nations that have thrived have been those, like America, that are most comfortable with the cacophony, and even occasional messiness, that comes from robust discourse.

Isaacson points out that Franklin's popularity has come and gone, and explains this by making him the symbol of one side of a cultural and political dichotomy: tolerance and compromise rather than dogmatism and crusading, pragmatism rather than romanticism, social mobility rather than class and hierarchy, and secular material success over religious salvation. Thus, while immensely popular in the latter part of his life and after his death, once the Romantic Era got underway, he became seen as shallow, thrifty, and lacking in passion. For example, Franklin appears in Herman Melville's novel Israel Potter, a work that sounds like the most confusing Harry Potter fan-fiction of all time, as a precursor to today's shallow self-help gurus.

A perfect example of the type of cunning that made some people call him shallow comes from his time as a frontier commander. To get soldiers to attend worship services, he had the chaplain give out the daily rum rations right after the service. "Never were prayers more generally and punctually attended", Franklin proudly wrote.

Or: at the signing of the Declaration of Independence, John Hancock solemnly declared "There must be no pulling different ways; we must all hang together". Franklin reportedly responded, with a wit but not solemnity worthy of the historic occasion: "Yes, we must, indeed, all hang together, or most assuredly we shall all hang separately".

This oscillation between romantically-minded eras finding him shallow and business-minded eras finding him the godfather of all self-help gurus and thrifty entrepreneurs has continued to this day. It is true that his aphorism collections, as documented in his famous Poor Richard's Almanac, are more clever than insightful; that he was no moral philosopher; and that his virtue-cultivating efforts were often patchy. However, they are part of a crucial process: the separation of morality from theology during the Enlightenment, which "Franklin was [the] avatar" of. Franklin's foundational personal maxim, which he often repeated, is perhaps the single sentence that pre-modern religious countries most need to hear: “The most acceptable service to God is doing good to man".

The romanticists' criticisms are based on truths. Though sociable, founding and participating in many societies, his personal relationships tended to be intellectual but distant. Interestingly, despite his vast achievements, Franklin does not show signs of a deep unyielding inner ambition; he seems to have been driven by vague instincts to be useful, a sense of pride (which he tried to dull throughout his life), curiosity, and a delight in tinkering, planning, and organising. To his sister in 1771 he wrote "[...] I am much disposed to like the world as I find it, and to doubt my own judgment as to what would mend it" – a remarkable sentiment from the pen of someone who, not many years later, would be playing a key role in a revolution. And though even past the age of 75 he achieved a few minor things, like being instrumental in securing France's alliance to America, signing the peace treaty between the US and Britain, shaping the US Constitution, and being the head of Pennsylvania's government, he happily wiled away many of his latter days playing cards with only the occasional twinge of guilt. He specifically justified this in part based on a belief in the afterlife: "You know the soul is immortal; why then should you be such a niggard of a little time, when you have a whole eternity before you?"

However, even these traits seem to have made him exactly what America needed. He was a skilled diplomat in France partly because of his easy-going nature and lack of naked ambition. At the Constitutional Convention of 1787, he often hosted the (much younger) other leading revolutionaries at his house to talk about things in a less formal setting and soften their stances, and generally advocated tolerance and compromise. Isaacson cleverly summarises:

Compromisers may not make great heroes, but they do make democracies.

Perhaps the best known summary of Franklin's life is Turgot's epigram that "he snatched lightning from the sky and the sceptre from tyrants". Franklin himself had a go at this: he wrote an autobiography – then a rare form of book – and also proposed a cheeky epitaph for himself, including an exhortation to wait for a "new and more elegant edition [of him], revised and corrected by the Author".

He didn't just summarise himself, though. He also unwittingly wrote perhaps the pithiest summary of the spirit of the entire Enlightenment project, and consequently of the driving spirit of human progress since then. It was in a letter Franklin wrote to his wife, after narrowly escaping a shipwreck on the English coast in 1757:

Were I a Roman Catholic, perhaps I should on this occasion vow to build a chapel to some saint; but as I am not, if I were to vow at all, it should be to build a lighthouse.

Lambda calculus

2021-04-25T21:52:00.002+01:00

7.8k words, including equations (about 30 minutes)

This post has also been published here.

This post is about lambda calculus. The goal is not to do maths with it, but rather to build up definitions within it until we can express non-trivial algorithms easily. At the end we will see a lambda calculus interpreter written in the lambda calculus, and realise that we're most of the way to Lisp.

But first, why care about lambda calculus? Consider four different systems:

A Turing machine – that is, a machine that:
- works on an infinite tape of cells from which a finite set of symbols can be read and written, and always points at one of these cells;
- has some set of states it can be in, some of which are termed "accepting" and one of which is the starting state; and
- given a combination of current state and current symbol on the tape, always does an action consisting of three things:
  - writes some symbol on the tape (possibly the same that was already there),
  - transitions to some some state (possibly the same it is already in), and
  - moves one cell left or right on the tape.
The lambda calculus ( $\lambda$ -calculus), a formal system that has expressions that are built out of an infinite set of variable names using $\lambda$ -terms (which can be thought of as anonymous functions) and applications (analogous to function application), and a few simple rules for shuffling around the symbols in these expressions.
The partial recursive functions, constructed by function composition, primitive recursion (think bounded for-loops), and minimisation (returning the first value for which a function is zero) on three basic sets of functions:
- the zero functions, that take some number of arguments and return 0;
- a successor function that takes a number and returns that number plus 1; and
- the projection functions, defined for all natural numbers $a$ and $b$ such that $a \geq b$ as taking in $a$ arguments and returning the $b$ th one.
Lisp, a human-friendly axiomatisation of computation that accidentally became an extremely good and long-lived programming language.

The big result in theoretical computer science is that these can all do the same thing, in the sense that if you can express a calculation in one, you can express it in any other.

This is not an obvious thing. For example, the only thing lambda calculus lets you do is create terms consisting of symbols, single-argument anonymous functions, and applications of terms to each other (we'll look at the specifics soon). It's an extremely simple and basic thing. Yet no matter how hard you try, you can't make something that can compute more things, whether it's by inventing programming languages or building fancy computers.

Also, if you try to make something that does some sort of calculation (like a new programming language), then unless you keep it stupidly simple and/or take great care, it will be able to compute anything (at least in la-la-theory-land, where memory is infinite and you don't have to worry about practical details, like whether the computation finishes before the sun going nova).

Physicists search for their theory of everything. The computer scientists already have many, even though they've been at it for a lot less time than the physicists have: everything computable can be reduced to one of the many formalisms of computation. (One of the main reasons that we can talk about "computability" as a sensible universal concept is that any reasonable model makes the same things computable; the threshold is easy to hit and impossible to exceed, so computable versus not is an obvious thing to pay attention to.)

To talk about the theory of computation properly, we need to look at at least one of those models. The most well-known is the Turing machine. Turing machines have several points in their favour:

They are the easiest to imagine as a physical machine.
They have clear and separate notions of time (steps taken in execution) and space (length of tape used).
They were invented by Alan Turing, who contributed to breaking the Enigma code during World War II, before being unjustly persecuted for being gay and tragically dying of cyanide poisoning at age 41.

In contrast, compare the lambda calculus:

It is an abstract formal system arising out of a failed attempt to axiomatise logic.
There are many execution paths for a non-trivial expression.
It was invented by Alonzo Church, who lived a boringly successful life as a maths professor at Princeton, had three children, and died at age 92.

(Turing and Church worked together from 1936 to 1938, Church as Turing's doctoral advisor, after they independently proved the impossibility of the halting problem. At the same time and also working at Princeton were Albert Einstein, Kurt Gödel, and John von Neumann (who, if he had had his way, would've hired Turing and kept him from returning to the UK).)

However, the lambda calculus also has advantages. Its less mechanistic and more mathematical view of computation is arguably more elegant, and it has less things: instead of states, symbols, and a tape, the current state is just a term, and the term also represents the algorithm. It abstracts more nicely – we will see how we can, bit by bit, abstract out elements and get something that is a sensible programming language, a project that would be messier and longer with Turing machines.

Turing machines and lambda calculus are the foundations of imperative and functional programming respectively, and the situation between these two programming paradigms mirrors that between TMs and $\lambda$ -calculus: one is more mechanistic, more popular, and more useful when dealing with (stateful) hardware; the other more mathematical, less popular, and neater for abstraction-building.

Lambda trees

Now let's define exactly what a lambda calculus term is.

We have an infinite set of variables $x_1, x_2, x_3, ...$ , though for simplicity we will use any lowercase letter to refer to them. Any variable is a valid term. Note that variables are just symbols – despite the word "variable", there is no value bound to them.

We have two rules for building new terms:

$\lambda$ -terms are formed from a variable $x$ and a term $M$ , and are written $(\lambda x. M)$ .
Applications are formed from two terms $M$ and $N$ , and are written $(M N)$ .

These terms, like most things, are trees. I will mostly ignore the convention of writing out horrible long strings of $\lambda$ s and variables, only partly mitigated by parenthesis-reducing rules, and instead draw the trees.

(When it appears in this post, the standard notation appears slightly more horrible than usual because, for simplicity, I neglect the parenthesis-reducing rules (they can be confusing at first).)

Here are a few examples of terms, together with standard representations:

This representation makes it clear that we're dealing with a tree where nodes are either variables, lambda terms where the left child is the argument and the right child is the body, or applications. (I've circled the variables to make clear that the argument variable in a $\lambda$ -term has a different role than a variable appearing elsewhere.)

It's not quite right to say that a $\lambda$ -term is a function; instead, think of $\lambda$ -terms as one representation of a (mathematical) function, when combined with the reduction rule we will look at soon.

If we interpret the above terms as representations of functions, we might rewrite them (in Pythonic pseudocode) as, from left to right:

lambda x -> x (i.e., the identity function) (lambda is a common keyword for an anonymous function in programming languages, for obvious reasons).
(lambda f -> f(y))(lambda x -> x) (apply a function that takes a function and calls that function on y to the identity function as an argument).
x(y)

Reduction

Execution in lambda calculus is driven by something that is called $\beta$ -reduction, presumably because Greek letters are cool. The basic idea of $\beta$ -reduction is this:

Pick an application (which I've represented by orange circles in the tree diagrams).
Check that the left child of the application node is a \lambda-term (if not, you have to reduce it to a $\lambda$ -term before you can make that application).
Replace the variable in the left child of the $\lambda$ -term with the right child of the application node wherever it appears in the right child of the $\lambda$ -term, and then replace the application node with the right child of the $\lambda$ -term.

In illustrated form, on the middle example above, using both tree diagrams and the usual notation:

(The notation $M[N/x]$ means substitute the term $N$ for the variable $x$ in the term $M$ ; the general rule for $\beta$ -reduction is that given $((\lambda x. M) N)$ , you can replace it with $M[N/x]$ , subject to some details that we will mostly skip over shortly.)

In our example, we end up with another application term, so we can reduce it further:

In our Pythonic pseudocode, we might represent this as an execution trace like the following:

(lambda f -> f(y))(lambda x -> x)

-->

(lambda x -> x)(y)

-->

Reduction is not always so simple, even if there's only a single choice of what to reduce. You have to be careful if the same variable appears in different roles, and rename if necessary. The core rule is that within the tree rooted at a $\lambda$ -term that takes an argument $x$ , the variable $x$ always means whatever was given to that $\lambda$ -term, and never anything else. An $x$ bound in one $\lambda$ -term is distinct from an $x$ bound in another $\lambda$ -term.

The simplest way to get around problems is to make your first variable $x_1$ and, whenever you need a new one, call it $x_i$ where $i$ is one more than the maximum index of any existing variable. Unfortunately humans aren't good at remembering the difference between $x_9$ and $x_{17}$ , and humans like conventions (like using $x$ for generic variables, $f$ for things that will be $\lambda$ -terms, and so forth). Therefore we sometimes have to think about name collisions.

The principle that lets us out of name collision problems is that you can rename variables as you want (as long as distinct variables aren't renamed to the same thing). The name for this is $\alpha$ -equivalence (more Greek letters!); for example $(\lambda x .x)$ and $(\lambda y. y)$ are $\alpha$ -equivalent.

There are, of course, detailed rules for how to deal with name collisions when doing $\beta$ -reductions, but you should be fine if you think about how variable scoping should sensibly work to preserve meaning (something you've already had to reason about if you've ever programmed). (A helpful concept to keep in mind is the difference between free variables and bound variables – starting from a variable and following the path up the tree to the parent node, does it run through a $\lambda$ -node with that variable as an argument?)

An example of a name collision problem is this:

We can't do this because the $x$ in the innermost $\lambda$ -term on the left must mean whatever was passed to it, and the $y$ whatever was passed to the outer $\lambda$ -term. However, our reduction leaves us with an expression that applies its argument to itself. We can solve this by renaming the $x$ within the inner $\lambda$ -term:

The general way to think of lambda calculus term is that they are partitioned in two ways into equivalence classes:

The first, rather trivial, set of equivalence classes is treating all $\alpha$ -equivalent terms as the same thing. "Equivalent" and $\alpha$ -equivalent are usually the same thing when we're talking about the lambda calculus; it's the "structure" of a term that matters, not the variable names.
The second set of equivalence classes is treating everything that can be $\beta$ -reduced into the same form as equivalent. This is less trivial – in fact, it's undecidable in the general case (as we will see in the post about computation theory).

That's it

Yes, really, that's all you need. There exists a lambda calculus term that beats you in chess.

You might ask: but hold on a moment, we have no data – no numbers, no pairs, no lists, no strings – how can we input chess positions into a term or get anything sensible as an answer? We will see later that it's possible to encode data as lambda terms. The chess-playing term would accept some massive mess of $\lambda$ -terms encoding the board configuration as an input, and after a lot of reductions it would become a term encoding the move to make – eventually checkmate, against you.

Before we start abstracting out data and more complex functions, let's make some simple syntax changes and look at some basic facts about reduction.

Some syntax simplifications

The pure lambda calculus does not have $\lambda$ -terms that take more than one argument. This is often inconvenient. However, there's a simple mapping between multi-argument $\lambda$ -terms and single-argument ones: instead of a two-argument function, say, just have a function that takes in an argument and returns a one argument function that takes in an argument and returns a result using both arguments.

(In programming language terms, this is currying.)

In the standard notation, $(\lambda x.(\lambda y. M))$ is often written $(\lambda xy.M)$ . Likewise, we can do similar simplifications on our trees, remembering that this is a syntactic/visual difference, rather than introducing something new to the lambda calculus:

Once we've done this change, the next natural simplification to make is to allow one application node to apply many arguments to a $\lambda$ -term with "many arguments" (remember that it actually stands for a bunch of nested normal single-argument $\lambda$ -terms):

(The corresponding simplification in the standard syntax is that $(M \, A \, B\, C)$ means $(((M \, A)\, B)\, C)$ . In a standard programming language, this might be written M(A)(B)(C); that is, applying A to M to get a function that you apply to B, yielding another function that you apply to C. Sanity check: what's the difference between $((M \, A) \, B)$ and $(M \, (A \, B))$ ?)

Some facts about reduction

$\beta$ -normal forms

A $\beta$ -normal form can be thought of as a "fully evaluated" term. More specifically, it is one where this configuration of nodes does not appear in the tree (after multi-argument $\lambda$ s and applications have been compiled into single-argument ones), where $M$ and $N$ are arbitrary terms:

Intuitively, if such a term does appear, then the reduction rules allow us to reduce the application (replacing this part of the tree with whatever you get when you substitute $N$ in place of $x$ within $M$ ), so our term is not fully reduced yet.

Terms without a $\beta$ -normal form

Does every term have a $\beta$ -normal form? If you've seen computation theory stuff before, you should be able to answer this immediately without considering anything about the lambda calculus itself.

The answer is no, because reducing to a $\beta$ -normal form is the lambda calculus equivalent of an algorithm halting. Lambda calculus has the same expressive power as Turing machines or any other model of computation, and some algorithms run forever, so there must exist lambda calculus terms that you can keep reducing without ever getting a $\beta$ -normal form.

Here's one example, often called $\Omega$ :

Note that even though we use the same variable $x$ in both branches, the variable means a different thing: in the left branch it's whatever is passed as an input to the left $\lambda$ -term – one reduction step onwards, that $x$ stands for the entire right branch, which has its own $x$ . In fact, before we start reducing, we will do an $\alpha$ -conversion on the right branch (a pretentious way of saying that we will rename the bound variable).

Now watch:

After one reduction step, we end up with the same term (as usual, we are treating $\alpha$ -equivalent terms as equivalent; the variable could be $x$ or $y$ or $å$ for all we care).

Ambiguities with reduction

Does it matter how we reduce, or does every reduction path eventually lead to a $\beta$ -normal form, assuming that one exists in the first place? If you haven't seen this before, you might want to have a go at this before reading on.

Here's one example of a tricky term:

Imagine that $M$ has a $\beta$ -normal form, and $\Omega$ is as defined above and therefore can be reduced forever. If we start by reducing the application node, in a moment $\Omega$ and all its loopiness gets thrown away, and we're left with just $M$ , since the $\lambda$ -term takes two arguments and returns the first. However, if we start by reducing $\Omega$ , or are following a strategy like "evaluate the arguments before the application", we will at some point reduce $\Omega$ and get thrown in for a loop.

We can take a broader view here. In any programming language – I will use Lisp notation because it's the closest to lambda calculus – if we have a function like (define func (lambda (x y) [FUNCTION BODY])), and a function call like (func arg1 arg2) , the evaluator has a choice of what it does. The simplest strategies are to either:

Evaluate the arguments – arg1 and arg2– first, and then inside the function func have x and y bound to the results of evaluating arg1 and arg2 respectively. This is called call-by-value, and is used by most programming languages.
Bind x and y inside func to be the unevaluated values of arg1 and arg2, and evaluate arg1 and arg2 only upon encountering them in the process of evaluating func. This is called call-by-name. It's rare to see it in programming languages (an exception being that it's possible with Lisp macros), but functional languages like Haskell often have a variant, call-by-need or "lazy evaluation", where the values of arg1 and arg2 are only executed when needed, but once executed the results are memoized so that the execution only needs to happen once.

Call-by-value reduces what you can express. Imagine trying to define your own if-function in a language with call-by-value:

(define IF
  (lambda (predicate consequent alternative)
    (if predicate
        consequent    ; if predicate is true, do this
        alternative)) ; if predicate is false, do this instead

(note that IF is the new if-function that we're trying to define, and if is assumed to be a language primitive.)

Now consider:

(define factorial
  (lambda (n)
    (IF (= n 0)
        1
        (* n
           (factorial (- n 1))))))

You call (factorial 1), and for the first call the program evaluates the arguments to IF:

(= 1 0)
1
(* 1 (factorial 0))

The last one needs the value of (factorial 0), so we evaluate the arguments to the IF in the recursive call:

(= 0 0)
1
(* 1 (factorial -1))

... and so on. We can't define IF as a function, because in call-by-value the alternative gets evaluated as part of the function call even if predicate is false.

(Most languages solve this by giving you a bunch of primitives and making you stick with them, perhaps with some fiddly mini-language for macros built in (consider C/C++). In Lisp, you can easily write macros that use all of the language features, and therefore extend the language by essentially defining your own primitives that can escape call-by-value or any other potentially limiting language feature.)

It's the same issue with our term $((\lambda xy.x) \, M \, \Omega)$ above: call-by-value goes into a silly loop because one of the arguments isn't even "meant to" be evaluated (from our perspective as humans with goals looking at the formal system from the outside).

Lambda calculus does not impose a reduction/"evaluation" order, so we can do what we like. However, this still leaves us with a problem: how do we know if our algorithm has gone into an infinite loop, or we just reduced terms in the wrong order?

Normal order reduction

It turns out that always doing the equivalent of call-by-name – reducing the leftmost, outermost term first – saves the day. If a $\beta$ -normal form exists, this strategy will lead you to it.

Intuitively, this is because with call-by-name, there is no "unnecessary" reduction. If some arguments in some call are never used (like in our example), they never reduce. If we start reducing an expression while doing leftmost/outermost-first reduction, that reduction must be standing in the way between us and a successful reduction to $\beta$ -normal form.

Formally: ... the proof is left as an exercise for the reader.

Church-Rosser theorem

The Church-Rosser theorem is the thing that guarantees we can talk about unique $\beta$ -normal forms for a term. It says that:

Letting $\Lambda$ be the set of terms in the lambda calculus, $\rightarrow_\beta$ the $\beta$ -reduction relation, and $\twoheadrightarrow_\beta$ its reflexive transitive closure (i.e. $M \twoheadrightarrow_\beta N$ iff there exist zero or more terms $P_1$ , $P_2$ , ... such that $M \rightarrow_\beta P_1 \rightarrow_\beta ... \rightarrow_\beta P_n \rightarrow_\beta N$ ), then:

For all $M \in \Lambda$ , $M \rightarrow_\beta A$ and $M \rightarrow_\beta B$ implies that there exists $X \in \Lambda$ such that $A \twoheadrightarrow_\beta X$ and $B \twoheadrightarrow_\beta X$ .

Visually, if we have reduction chains like the black part, then the blue part must exist (a property known as confluence or the "diamond property"):

Therefore, even if there are many reduction paths, and even if some of them are non-terminating, for any two different starting $\beta$ -reductions we can make, we will not lose the existence of a reduction path to any $X$ . If $X$ is some $\beta$ -normal form reachable from $M$ , we know that any other reduction path that reaches a $\beta$ -normal form must have reached $X$ .

The fun begins

Now we will start making definitions within the lambda calculus. These definitions do not add any capabilities to the lambda calculus, but are simply conveniences to save out having to draw huge trees repeatedly when we get to doing more complex things.

There are two big ideas to keep in mind:

There are no data primitives in the lambda calculus (even the variables are just placeholders for terms to get substituted into, and don't even have consistent names – remember that we work within $\alpha$ -equivalence). As a result, the general idea is that you encode "data" as actions: the number 4 is represented by a function that takes a function and an input and applies the function to the input 4 times, a list might be encoded by a description of how to iterate over it, and so on.
There are no types. Nothing in the lambda calculus will stop you from passing a number to a function that expects a function, or visa versa. There exist typed lambda calculi, but they prevent you from doing some of the cool things with combinators that we'll see later in this post.

Pairs

We want to be able to associate two things into a pair, and then extract the first and second elements. In other words, we want things that work like this:

(fst (pair a b)) == a
(snd (pair a b)) == b

The simplest solution starts like this:

Now we can get the first of a pair by doing ((pair x y) first). If we want the exact semantics above, we can define simple helpers like

fst = (lambda p
        (p first))

(i.e. $\text{fst} = (\lambda p. (p \, \text{first}))$ ), and

snd = (lambda p
        (p second))

since now (snd (pair x y)) reduces to ((pair x y) second) reduces to y.

Lists

A list can be constructed from pairs: [1, 2, 3] will be represented by (pair 1 (pair 2 (pair 3 False))) (we will define False later). If $l_1$ , $l_2$ , and $l_3$ are the list items, a length element list looks like this:

We might also represent the same list like this instead:

This second representation makes it trivial to define things like a reduce function: ([1, 2, 3] 0 +) would return 0 plus the sum of the list [1, 2, 3], if [1, 2, 3] is represented as above. However, this representation would also make it harder to do other list operations, like getting all but the first element of a list, whereas our pair-based lists can do this trivially ((snd l) gets you all but the first element of the list l).

Numbers & arithmetic

Here are how the numbers work (using a system called Church numerals):

Since giving a function $f$ to a number $n$ (also a function) gives a function that applies $f$ to its input $n$ times, a lot of things are very convenient. Say you have this function to add one, which we'll call succ (for "successor"):

(Considering the above definition of numbers: why does it work?)

Now what is (42 succ)? It's a function that takes an argument and adds 42 to it. More generally, ((n succ) m) gives you m+n. However, there's also a more straightforward way to represent addition, which you can figure out from noticing that all we have to do to add m to n is to compose the "apply f" operation m more times to n, something we can do simply by calling (m f) on n, once we've "standardised" n to have the same f and x as in the $\lambda$ -term that represents m (that is why we have the (n f x) application, rather than just n):

Now, want multiplication? One way is to see that we can define (mult m n) as ((n (adder m)) 0), assuming that (adder m) returns a function that adds m to its input. As we saw, that can be done with (m succ), so:

(mult m n) =
((n (m succ))
 0)

There's a more standard way too:

The idea here is simply that (n f) gives a $\lambda$ -term that takes an input and applies f to it $n$ times, and when we call m with that as its first argument, we get something that does the $n$ -fold application $m$ times, for a total of $mn$ times, and now all that remains is to pass the x to it.

A particularly neat thing is that exponentiation can be this simple:

Why? I'll let the trees talk. First, using the definition of n as a Church numeral (which I will underline in the trees below), and doing one $\beta$ -reduction, we have:

This does not look promising – a number needs to have two arguments, but we have a $\lambda$ -term taking in one. However, we'll soon see that the x in the tree on the right actually turns out to be the first argument, f, in the finished number. In fact, we'll make that renaming right away (since we're working under $\alpha$ -equivalence), and continue reducing (below we've taken the bottom-most m and expanded it into its Church numeral definition):

At this point, the picture gets clearer: the next thing we'd reduce is the lambda term at the bottom applied to m, but that's just going to do the lambda term (which applies f $m$ times) $m$ more times. We'll have done 2 steps, and gotten up to $m^2$ nestings of f. By the time we've done the remaining $n-1$ steps, we'll have the representation of $m^n$ ; the $n-1$ more applications between our bottom-most and topmost lambda term will reduce away, while the stack of applications of f increases by a factor of $m$ each time.

What about subtraction? It's a bit complicated. Okay, how about just subtraction by one, also known as the pred (predecessor) function? Also tricky (and a good puzzle if you want to think about it). Here's one way:

Church numerals make it easy to add, but not subtract. So instead, here's what we do. First (box 1), we make a pair like [0 0]. Next (polygon 2), we have a function that takes a pair p=[a b] and creates a new pair [b (succ b)], where succ is the successor function (one plus its input). Repeated application of this function on the pair in box 1 looks like this: [0 0], [0 1], [1 2], [2 3], and so on. Thus we see that if we start from [0 0] and apply the function in polygon 2 $n$ times (box 3), the first element of the pair is (the Church numeral for) $n-1$ , and the second element is $n$ , and we can simply call fst to get that first element.

As we saw before, we can define subtraction as repeated application of pred:

(minus m n) =
((n pred) m)

There's an alternative to Church numerals that's found in the more general Scott encoding. The advantages of Church vs Scott numerals, and their relative structures, are similar to the relative merits and structures of the two types of lists we discussed: one makes many operations natural by exploiting the fact that everything is a function, but also makes "throwing off a piece" (taking the rest/snd of a list, or subtracting one from a number) much harder.

Booleans, if, & equality

You might have noticed that we've defined second as $(\lambda x y. y)$ , and 0 as $(\lambda f x. x)$ . These two terms are a variable-renaming away from each other, so they are $\alpha$ -equivalent. In other words, second and 0 are same thing. Because we don't have types, which is which depends only on our interpretation of the context it appears in.

Now let's define a True and False. Now False is kind of like 0, so let's just say they're also the same thing. The opposite of $(\lambda x y. y)$ is $(\lambda x y. x)$ , so let's define that to be True.

What sort of muddle have we landed ourselves in now? Quite a good one, actually. Let's define (if p c a) to be (p c a). If the predicate p is True, we select the consequent c, because (True c a) is exactly the same as (first c a) is clearly c. Likewise, if p is False, then we evaluate the same thing as (second c a) and end up with the alternative a.

We will also want to test whether a number is 0/False (equality in general is hard in the lambda calculus, so what we end up with won't be guaranteed to work with things that aren't numbers). A simple way is:

eq0 =
(lambda x
  (x (lambda y
       False)
     True))

If x is 0, it's the same as second and will act as a conditional and pick out True. If it's not zero, we assume that it's some number $n$ , and therefore will be a function that applies its first argument $n$ times. Applying $(\lambda y.\text{False})$ any non-zero amount of times to anything will return False.

Fixed points, combinators, and recursion

The big thing missing from the definitions we've put on top of the lambda calculus so far is recursion. Every lambda term represents an anonymous function, so there's no name within a $\lambda$ -term that we can "call" to recurse.

Rather than jumping in straight to recursion, we're going to start with Russell's paradox: does a set that contains all elements that are not in the set contain itself? Phrased mathematically: what the hell is $R = \{x \,|\,x\notin R\}$ ?

In computation theory, sets are often specified by a characteristic function: a function that is always defined if the set is computable, and returns true if an element is in the set and false otherwise.

In the lambda calculus (which was originally supposed to be a foundation for logic), here's a characteristic function for the Russell set $R$ :

(where not can be straightforwardly defined on top of our existing definitions as (not b) = (b False True)).

This $\lambda$ -term takes in an element x, assumes that x is the (characteristic function for) the set itself, and asks: is it the case that x is not in the set? Call this term R, and consider (R R): the left R is working as the (characteristic function of) the set, and the right R as the element whose membership of the set we are testing.

Evaluating:

So we start out saying (R R), and in one $\beta$ -reduction step we end up saying (not (R R)) (just as, with Russell's paradox, it first seems that the set must contain itself, because the set is not in itself, but once we've added the set to itself then suddenly it shouldn't be in itself anymore). One more step and we get, from (R R), (not (not (R R))). This is not ideal as a foundation for logic.

However, you might realise something: the not here doesn't play any role. We can replace it with any arbitrary f. In fact, let's do that, and create a simple wrapper $\lambda$ -term around it that lets us pass in any f we want:

Now let's look at the properties that $Y$ has:

(Y \, f) \rightarrow_\beta (f \, (Y \, f)) \rightarrow_\beta (f \, (f \, (Y \, f))) \rightarrow_\beta ...

$Y$ is called the Y combinator ("combinator" is a generic term for a lambda calculus term with no free variables). It is part of the general class of fixed-point combinators: combinators $X$ such that $(X \, f) = (f \, (X\,f))$ . (Turing invented another one: $\Theta = (A \, A)$ , where $A$ is defined as $(\lambda x y. (y \,(x\, x\, y)))$ .)

A fixed-point combinator gives us recursion. Imagine we've almost written a recursive function, say for a factorial, except we've left a free function parameter for the recursive call:

(lambda f x
  (if (eq0 x)
      1
      (mult x
            (f (pred x)))))

(Also, take a moment to appreciate that we can already do everything necessary except for the recursion with our earlier definitions.)

Call the previous recursion-free factorial term F, and consider reducing ((Y F) 2) (where -BETA-> stands for one or more $\beta$ -reductions):

((Y F)
 2)

-BETA->

((F (Y F))
 2)

-BETA->

((lambda x
   (if (eq0 x)
       1
       (mult x
             ((Y F) (pred x)))))
 2)

-BETA->

(if (eq0 2)
    1
    (mult 2
          ((Y F) (pred 2))))

-BETA->

(mult 2
      ((Y F)
       1))

-BETA->

(mult 2
      ((F (Y F))
       1))

-BETA->

(mult 2
      ((lambda x
         (if (eq0 x)
             1
             (mult x
                   ((Y F) (pred x)))))
       1))

-BETA->
...
-BETA->

(mult 2
      (mult 1
            1))

-BETA->

2

It works! Get a fixed-point combinator, and recursion is solved.

Primitive recursion

The definition of the partial recursive functions (one of the ways to define computability, mentioned at the beginning) involves something called primitive recursion. Let's implement that, and along the way look at fixed-point combinators from another perspective.

Primitive recursion is essentially about implementing bounded for-loops / recursion stacks, where "bounded" means that the depth is known when we enter the loop. Specifically, there's a function $f$ that takes in zero or more parameters, which we'll abbreviate as $\overline{P}$ . At 0, the value of our primitive recursive function $h$ is $f(\overline{P})$ . At any integer $x+1$ for $x \geq 0$ , $h(\overline{P}, x+1)$ is defined as $g(\overline{P}, x, h(\overline{P}, x))$ : in other words, the value at $x+1$ is given by some function of:

fixed parameter(s) $\overline{P}$ ,
how many more steps there are in the loop before hitting the base case ( $x$ ), and
the value at $x$ (the recursive part).

For example, in our factorial example there are no parameters, so $f$ is just the constant function 1, and $g(x, r) = (x + 1) \times r$ , where $r$ is the recursive result for one less, and we have $x+1$ because (for a reason I can't figure out – ideas?) $g$ takes, by definition, not the current loop index but one less.

Now it's pretty easy to write the function for primitive recursion, leaving the recursive call as an extra parameter (r) once again, and assuming that we have $\lambda$ -terms F and G for $f$ and $g$ respectively:

(lambda r P x
  (if (eq0 x)
      (F P)
      (G P (pred x) (r P (pred x)))))

Slap a $Y$ in front, and we take care of the recursion and we're done.

The fixed point perspective

However, rather than viewing this whole "slap in the $Y$ " business as a hack for getting recursion, we can also interpret it as a fixed point operation.

A fixed point of a function $f$ is a value $x$ such that $x = f(x)$ . The fixed points of $f(x)=x^2$ are 0 and 1. In general, fixed points are often useful in maths stuff and there's a lot of deep theory behind them (for which you will have to look elsewhere).

Now $Y$ (or any other fixed point combinator) has the property that $(Y f) =_\beta (f \, (Y\, f))$ (remember that the equivalent of $f(x)$ is written $(f \,x)$ in the lambda calculus). In other words, $Y$ is a magic wand that takes a function and returns its fixed point (albeit in a mathematical sense that is not very useful for explicitly finding those fixed points).

Taking once again the example of defining primitive recursion, we can consider it as the fixed point problem of finding an $h$ such that $h = \Phi_{f,g}(h)$ , where $\Phi_{f,g}$ is a function like the following, where F and G are the lambda calculus representations of $f$ and $g$ respectively:

(lambda h
  (lambda P x
    (if (eq0 x)
        (F P)
        (G P (pred x) (h P (pred x)))))))

That is, $\Phi_{f,g}$ takes in some function h, and then returns a function that does primitive recursion – under the assumption that h is the right function for the recursive call.

Imagine it like this: when we're finding the fixed point of $f(x)= x^2$ , we're asking for $x$ such that $x=x^2$ . We can imagine reaching into the set of values that $x$ can take (in this case, the real numbers), plugging them in, and seeing that in most cases the equation $x=x^2$ is false, but if we pick out a fixed point it becomes true. Similarly, solving $h=\Phi_{f,g}(h)$ is the problem of considering all possible functions $h$ (and it turns out all computable functions can be enumerated, so this is, if anything, less crazy than considering all possible real numbers), and requiring that plugging in $h$ into $\Phi_{f,g}$ gives back $h$ . For almost any function that we plug in, this equation will be nonsense: instead of doing primitive recursion, on the first call to h $\Phi_{f,g}$ will do some crazy call that might loop forever or calculate the 17th digit of $\pi$ , but if it's picked just right, $h$ and $\Phi_{f,g}(h)$ will happen to be the same thing. Unlike in the algebraic case, it's very difficult to iteratively improve on your guess for $h$ , so it's hard to think of how to use this weird way of defining the problem of finding $h$ to actually find it.

Except hold on – we're working in the lambda calculus, and fixed point combinators are easy: call $Y$ on a function and we have its fixed point, and, by the reasoning above, that is the recursive version of that function.

The lambda calculus in lambda calculus

There's one final powerful demonstration of a computation model's expressive power that we haven't looked at: being able to express itself. The most well-known case is the universal Turing machine, and those crop up a lot when you're thinking about computation theory.

Now there exists a trivial universal lambda term: $(\lambda \,f\,a\,.\,(f \,a))$ takes $f$ , the lambda representation of some function, and an argument $a$ , and returns the lambda calculus representation of $f$ applied to $a$ . However, this isn't exactly fair, since we've just forwarded all the work onto whatever is interpreting the lambda calculus. It's like noting that an eval function exists in a programming language, and then writing on your CV that you've written an evaluator for it.

Instead, a "fair" way to define a universal lambda term is to build on the data specifications we have to define a representation of variables, lambda terms, and application terms, and then writing more definitions within the lambda calculus until we have a reduce function.

This is what I've done in Lambda Engine. The definitions specific to defining the lambda calculus within the lambda calculus start about halfway down this file. I won't walk through the details here (see the code and comments for more detail), but the core points are:

We distinguish term types by making each term a pair consisting of an identifier and then the data associated with it. The identifier for variables/ $\lambda$ s/applications is a function that takes a triple and returns the 1st/2nd/3rd member of it (this is simpler than tagging them with e.g. Church numerals, since testing numerical equality is complicated). The data is either a Church numeral (for variables) or a pair of a variable and a term ( $\lambda$ -terms) or a term and a term (applications).
We need case-based recursion, where we can take in a term, figure out what it is, and then perform a call to a function to handle that term and pass on the main recursive function to that handler function (for example, because when substituting in a application term, we need to call the main substitution function on both the left and right child of the application). The case-based recursion functions (different ones for the different number of arguments required by substitution and reduction) take a triple of functions (one for each term type) and exploit the fact that the identifier of a term is a function that picks some element from the triple (in this case, we call the identifier on the handler function triple to pick the right one).
We have helper functions for to build our term types, extract out parts, and test for whether something is a $\lambda$ -term (exploiting the fact that the first element of the pair that a lambda term is is the "take the 2nd thing from a triple" function).
With the above, we can define substitution fairly straightforwardly. Note that we need to test Church numeral equality, which requires a generic Church numeral equality tester, which is a slow function (because it needs to recurse and take a lot of predecessors).
For reduction, the main tricky bit is doing it in normal order. This means that we have to be able to tell whether the left child in an application term is reducible before we try to reduce the right child (e.g. the left child might eventually reduce to a function that throws away its argument, and the right child might be a looping term like $\Omega$ ). We define a helper function to check whether something reduces, and then can write reduce-app and therefore reduce. For convenience we can define a function n-reduce that calls reduce an expression n times, simply by exploiting how Church numerals work (((2 reduce) x) is (reduce (reduce x)), for example).

What we don't have:

Variable renaming. We assume that terms in this lambda calculus are written so that a variable name (in this case, a Church numeral) is never reused.
Automatically reducing to $\beta$ -normal form. This could be done fairly simply by writing another function that calls itself with the reduce of its argument until our checker for whether something reduces is false.
Automatically checking whether we're looping (e.g. we've typed in the definition of $\Omega$ ).

The lambda calculus interpreter in this file has all three features above. You can play with it, and the lambda-calculus-in-lambda-calculus, by downloading Lambda Engine (and a Racket interpreter if you don't already have one) and using one of the evaluators in this file.

Towards Lisp

Let's see what we've defined in the lambda calculus so far:

pair
lists
fst
snd
True
False
if
eq0
numbers
recursion

This is most of what you need in a Lisp. Lisp was invented in 1958 by John McCarthy. It was intended as an alternative axiomatisation for computation, with the goal of not being too complicated to define while still being human friendly, unlike the lambda calculus or Turing machines. It borrows notation (in particular the keyword lambda) from the lambda calculus and its terms are also trees, but it is not directly based on the lambda calculus.

Lisp was not intended as a programming language, but Steve Russell (no relation to Bertrand Russell ... I'm pretty sure) realised you could write machine code to evaluate Lisp expressions, and went ahead and did so, making Lisp the second-oldest programming language. Despite its age, Lisp is arguably the most elegant and flexible programming language (modern dialects include Clojure and Racket).

One way to think of what we've done in this post is that we've started from the lambda calculus – an almost stupidly simple theoretical model – and made definitions and syntax transformations until we got most of the way to being able to emulate Lisp, a very usable and practical programming language. The main takeaway is, hopefully, an intuitive sense of how something as simple as the lambda calculus can express any computation expressible in a higher-level language.

Nuclear power is good

2021-03-27T22:21:00.011+00:00

(Alternative title: burning things considered harmful)

5k words (about 17 minutes)

If you want usable energy, you need to use the forces between particles.

The weakest force is gravity, but if you happen to be near a gigantic amount of material (e.g. the Earth) with an uneven surface that has stuff flowing down it (e.g. water in a river), we can still use it to generate power. This insight gives us hydropower, which delivers about 16% of the world's electricity. The main downside is that because of how weak gravity is, dams have to be large and environmentally disruptive to generate useful power.

Moving to stronger forces, we have chemical interactions between atoms. In the form of burning fossil fuels, rearranging chemical bonds produces 66% of the world's electricity. The main downside is how weak chemical bonds are, and therefore how much matter has to be processed (i.e. burned) to produce energy. A lot of matter means a lot of waste products. Despite decades of work on possible safe waste-management strategies (e.g. carbon capture and storage), we still outrageously keep dumping over thirty billion tons of carbon dioxide into the atmosphere every year, with massive effects on the climate that will potentially last thousands of years, while also producing a long list of other harmful waste products that kill a lot of people per year.

Thankfully, atoms aren't atomic: we can rearrange atoms and get energy densities that blow puny chemistry out of the water. Currently 11% of the world's electricity comes from directly doing this. We're still playing catch up to God, who, in His infinite wisdom, saw it fit to create a universe where just about 100% of energy production is nuclear.

Our nearest God-sanctioned nuclear reactor is the sun. Harnessing the sun's light and heat gives us another 1% of the world's electricity; a slightly more indirect route where we first wait for the sun's heat to stir up the air gives us another 3.5%. An even more indirect route is letting the sun's light fall on plants so that they create chemical bonds that we can burn for power; this gives us another 2%. The most indirect route of all is to use the chemical bonds created by sunlight that fell on extinct plants hundreds of millions of years ago, which is what we're really doing when we burn fossil fuels. So actually it's all nuclear, with the only difference being how many hoops you jump through first.

The current state of nuclear power is that we can harness only fission (splitting atoms) for controlled energy production. Fusion (combining atoms) is potentially an even better technology: it requires less exotic materials, produces less dangerous waste, and is literally star-power. However, it takes extreme energies to get power out of fusion, and the only way we've found how to do that is to blow up a (fission-based) nuclear bomb in a very controlled way that squeezes the stuff we want to fuse to create an even bigger bang. Technically we could use this for power – say, we build a massive underground chamber where we set off hydrogen bombs (the common name for a bomb that uses nuclear fusion) every once in a while to vaporise vast amounts of water into steam and then drive a generator – but let's just say there would be some difficulties. (Though, surprisingly, mostly economic and political ones rather than technical ones – this idea was seriously studied in the 1970s as Project Pacer.)

Controlled fusion power is in the works, but it's the poster child for technologies that are always twenty years away. At the moment scientists are playing around with lasers that have 25 times the power of the entire world's electricity generation (though only for a few picoseconds at a time) and magnets almost strong enough to levitate a frog* to bring it about, but don't expect commercial fusion power in the next decade at least.

(*Levitating a frog takes a field of about 16 Teslas, according to research that won an Ig Nobel Prize in 2000, compared to ITER's 13 Tesla field.)

Fusion is definitely a technology that we should develop. However, as J. Storrs Hall writes in Where is my flying car? (my review here):

"As a science fiction and technology fan, for most of my life I had been squarely in the “just you wait until we get fusion” camp. Then I was forced to compare the expected advantages fusion would bring to the ones we already had with fission. Fuel costs are already negligible. The process is already clean, with no emissions. Even though the national [US] waste repository at Yucca Mountain has been blocked by activists since it was designated in 1987 and never opened, fission produces so little waste that all our power plants have operated the entire period by basically sweeping it into the back closet."

We have already invented a miracle clean power source. And, surprise surprise, we should really use it.

The human case for nuclear power

Every year, there are almost five million deaths attributable to air pollution, a bit less than 1 in 10 of all deaths in the world, or one every six seconds. Since it's a bit tricky to know what counts as an "attributable death" in the case of some risk factor, here's another measure: almost 150 million years of health-weighted life are lost every year because of air pollution. The health effects of air pollution are right up there with the other biggest killers like high blood pressure, smoking, and obesity.

The biggest causes of air pollution are energy generation, traffic, and (especially in poor countries) heating. Getting global averages for power generation deadliness is hard, but doing some very rough estimation, more than one-tenth but less than one-third of air pollution deaths are directly related to power generation, for a total number in the hundreds of thousands per year. Imagine three Chernobyl-scale disasters a week, and you're in the right ballpark.

(There is major disagreement over the actual Chernobyl death toll. When making comparisons in this post, I use the number 4000. About 30 people died directly during the disaster; several thousand may die in the long run according to the best consensus estimates, though if you assume the contested linear no-threshold model (which seems to be the main crux of the debate) you can get numbers in the tens of thousands. If you want to be maximally pessimistic, you can multiply Chernobyl impact comparisons by 10, but you'll find this doesn't materially change the conclusions.)

Which power sources cause these deaths? There's some disagreement over the exact numbers, but here's a chart for European energy production from Our World in Data:

(One terawatt-hour (3.6 petajoules) is roughly the annual energy consumption of 20 000 Europeans.)

The chart above has European numbers. In particular for fossil fuel sources, there's a lot of country-specific variation due to environmental regulations and population density: the paper that the above chart is largely based on mentions 77 deaths/TWh as a reasonable figure for a regulation-compliant Chinese coal plant, while this article says that 280 deaths/TWh is possible for coal.

Why do solar and wind produce any deaths at all? Both occasionally involve dangerous construction work (rooftop solar / tall wind turbines). In fact, if you look at recent decades (i.e., not including Chernobyl) and use the low-end estimates, solar and wind are deadlier than nuclear.

The estimates for hydropower can also swing a bit depending on whether or not you include the deadliest electricity generation disaster in history: the 1975 Banqiao Dam failure, which may have killed hundreds of thousands of people. Since 1965, hydropower has produced about 130 000 TWh; depending on which death toll estimate you believe, Banqiao single-handedly raises the deaths per TWh for hydropower by between 0.2 and 2. Compare this with nuclear power, which has produced about 92 000 TWh over the same timeframe; the long-term death estimates for Chernobyl add 0.04 to the deaths/TWh count for nuclear.

(The total generation numbers are based on the raw data behind this and this graph, which you can download from the links. The nuclear number in the above chart is based on this paper, which Our World in Data says already includes Chernobyl, though I can't see where they add that in.)

The bottom line is that hydropower accidents are more common, more deadly, and higher variance than nuclear accidents, even though both power sources have produced comparable amounts of energy in recent decades.

Okay, actually that isn't the real bottom line. The real bottom line is this: when it comes to the human impacts of electricity generation, there are things that involve burning (fossil fuels & biomass), and then there is everything else, and the latter category is much much better. Also, if you absolutely must burn something, do not burn coal.

What has nuclear specifically done so far? One study finds that it has saved 1.8 million lives by reducing air pollution, or about 4 years of the world's current malaria death rate.

What could it have done? Until the mid-1970s, the adoption of nuclear power was accelerating. Assume this trend had continued until today, and nuclear had replaced fossil fuels only (an optimistic assumption, but one that doesn't change the numbers much because renewables are a pretty small percentage). Under these assumptions, one study estimates that nuclear would now account for over half of the world's energy production, and a total of 9.5 million deaths would have been avoided – as much as if you saved everyone who would otherwise have died of cancer in the past year. Even if nuclear adoption had only been linear, 4.2 million deaths could have been avoided, the same number as saving everyone who has died in war since 1970 (the war deaths number is from the raw data behind this chart).

Therefore: in terms of the number of lives saved, keeping the nuclear power industry growing would have very likely been at least as good as achieving world peace in 1970.

Since these numbers are enormous, and involve difficult-to-estimate unknowns, here's something more concrete: Germany's decision in 2011 to get rid of nuclear is costing an average of 1100 lives per year (working paper; article).

The environmental case for nuclear power

Climate change is a big problem, but the scale of it as an environmental problem is better known than the scale of air pollution as a health problem, so I won't go into the statistics on its impact.

Nuclear power is obviously good for the climate. Here's a chart, based on this, which is summarised in a more readable format here:

The black bars span the range between the minimum and maximum numbers. The red dot is the median.

I've converted the numbers from the traditional grams of CO2 equivalent per kWh to tons of CO2 equivalent per TWh, to be consistent with the death rates graph above, and for easier conversion to national/international CO2 statistics (which are generally expressed in tons of CO2 – unless its tons of carbon, in which case you divide by the ratio of carbon's mass in CO2, which is 12/44 or about 0.27).

(If you're wondering where hydropower is: it's median is right around concentrated solar, but in some cases, especially in tropical climates, the reservoirs created by dams can release a lot of methane, making the maximum CO2-equivalent emissions for hydropower over twice as bad as coal and, more importantly, completely ruining my pretty chart.)

So far, the use of nuclear power is estimated to have reduced cumulative CO2 emissions to date by 64 billion tons, a bit less than two years of the world's total CO2 emissions at current rates. The same study linked in the previous section estimates that, had nuclear power grown at a steady linear rate, this number would be doubled, and if the accelerating trend in nuclear power adoption had continued, there would be 174 billion tons less CO2 in the atmosphere. We would have saved more emissions than we would have if we had made every car in the world emission free since 1990.

The problems

In Enlightenment Now (my review here), Steven Pinker writes:

"It’s often said that with climate change, those who know the most are the most frightened, but with nuclear power, those who know the most are the least frightened."

So why aren't the arguments against nuclear power enough to frighten those who know about it?

The short version: more nuclear power would save millions of lives from air pollution and be a big help in solving climate change. When these are the benefit, you need a hell of a drawback before the scales start tilting the other way.

The long version:

Radiation & accidents

(Radiation units are confusing. Activity, straightforwardly defined as the number of atoms that undergo decay per second, is measured in becquerels (Bq). The amount of radiation energy absorbed per kilogram of matter is measured in grays (Gy), which therefore have units of joules per kilogram. Measuring biological effects is harder, because the type of radiation and what tissue it hits both matter. If you adjust for the type of radiation by multiplying the absorbed dose in grays by some factor (scaled so that gamma rays have a factor 1), you get something called equivalent dose, which is measured in sieverts (Sv). If you also adjust for which tissue type was hit by multiplying by more estimated factors, you get effective dose, which is also measured in sieverts. If you want to get a sense of scale for radiation dose numbers, here's a good chart and here's a good table.)

In normal operation, a nuclear power plant produces significantly less radiation than a coal power plant (this is because everything radioactive is contained in a nuclear power plant, while coal power plants pump fly ash into the air). Neither is a significant dose.

In accidents, nuclear power plants can release insane amounts of radioactivity. Insane amounts of radiation are dangerous. However, the reaction to radiation risks is often out of proportion to the true risk – the Fukushima evacuations are considered excessive in hindsight, as argued in this study, though you probably don't need to make a study to guess it from this chart:

(In the long run, some more cancer deaths are expected to trickle in.)

It is critically important to remember the above statistics on health effects, and not let yourself be biased by vivid stories about horrible individual events. The fear of nuclear accidents is similar to the fear of flying rather than driving: statistically one is much safer, but one is much easier to fear because when things go wrong, it comes in more story-worthy packages.

In particular: it is not the case that nuclear power is safer only because accidents are rare and therefore get left out of statistics; nuclear power would be overwhelmingly safer than fossil fuels even if there were a Chernobyl going off every year. As I said above, hydropower accidents are more common, more deadly, and higher variance, so any argument based on disaster risk that bans nuclear would also ban hydropower.

Nuclear proliferation

Nuclear power is good, but nuclear weapons are bad. It would be bad if the spread of civilian nuclear power technology lead to nuclear proliferation. There is some overlap in technology, but neither civilian materials nor technologies automatically lead to weapons. The uranium used in power plants is typically only enriched to 3-5%, compared to more than 85% for weapons-grade uranium and 0.7% in natural uranium (though if you have uranium enrichment infrastructure, you can run it for more cycles than usual and let the enrichment levels slowly creep up – Iran has done this). There are also international agreements that prevent enrichment, and alternative nuclear technologies, like using thorium instead of uranium, with less weapon potential. Finally, a country trying to build nuclear weapons probably won't be stopped by a lack of a civilian industry; consider North Korea.

Terrorism and war risks

Another risk to consider is that nuclear power plants might be targeted by terrorists, or even by hostile nations, potentially leading to Chernobyl-scale disasters. This is a risk, but it's an acceptable one. Consider what it would mean if "hundreds or thousands of people could be killed if a determined and resourceful hostile actor targeted this piece of infrastructure" were a reason to not build some piece of infrastructure – we'd have to ban skyscrapers, airplanes, dams, water treatment plants, and so forth. Also considering the security that's (rightfully) present at nuclear power plants, it would probably take a 9/11-level of execution to do it, and the observed rate for 9/11-level events over a time interval of length T is, well, 1/T if the interval includes 9/11 and otherwise 0.

It is true that a complex civilisation has a lot of fragile points and someone should be thinking hard about minimising this kind of risk, and that nuclear power plants are a good example because the effects are expensive and long-lasting if an attack is successful. But as an argument against nuclear power, it proves too much.

Nuclear waste

Nuclear waste is awkward to deal with, but it's far from the worst sort of industrial waste we deal with – consider the over thirty billion tons of carbon dioxide we've dumped into the atmosphere over the past year, or the various horrible things that coal plants spew out that cause dozens of Chernobyl-equivalents per year.

Nuclear waste is not some miracle substance that effortlessly seeps everywhere and kills whatever it touches. Until 1993, countries (mostly the USSR and UK), were dumping nuclear waste into the ocean. This is rightly banned these days, but you can observe that we still have oceans; in fact, the the environmental impacts have so far been negligible except for somewhat higher concentrations of some nasty isotopes exactly at the site.

In general, nuclear waste is a serious problem that has to be solved somehow, but solutions exist (currently, Finland's Onkalo repository is the closest to being operational). Though the timescale is long, it is not different in principle from some existing disposal methods for nasty things like mercury and arsenic.

Is it responsible to leave behind dangerous waste for future generations? It's far more responsible than leaving them with the almost astronomical amounts of CO2 emissions that a single kilogram of uranium prevents.

Future people looking back at our century won't despair about a few warm rocks deep underground. They'll despair at all the silent air pollution deaths, at how far we let climate change get, and at how much sooner we could've reached their living standards had we made better use of our technology. Then they'll travel on nuclear-powered airplanes to distant hiking grounds, and tell scare stories around an (artificial!) campfire about the barbarian past when we burned things for energy and piped the waste products straight into the atmosphere.

Uranium is limited

First, we have 200 years worth of economically accessible uranium reserves. This is more than for fossil fuels, with the additional benefit that burning through the remaining uranium won't wreck the climate and kill millions.

Second, we have alternatives to uranium, like thorium.

Thirdly, there are hundreds of times more uranium dissolved in the oceans than there is on land (and this uranium exists in equilibrium, so if you take it out, more will leach out of the seabed to replace it, a fact that might lead a pedant to call nuclear power renewable). Even though the concentrations are tiny, because of the energy density of uranium, at modern reactor efficiencies there's still half a megajoule of usable nuclear energy in the uranium in a single cubic metre of seawater, enough to power the lightbulb in my room for over five hours. As a result, extracting it is a project that is taken surprisingly seriously, and is surprisingly close to being economically viable, though some people are very skeptical.

Nuclear power is unnatural

Wrong: a few billion years ago a spontaneous natural nuclear reactor ran for a few hundred thousand years under what is now Gabon.

Using the best estimates for its running time and power output, even if this is the only natural reactor that ever formed, the energy it produced is several times higher than that of all human civilian nuclear power to date (both numbers are in the hundreds of petajoules range). Of sustained nuclear fission energy in our planet's history, more has been natural than artificial.

Nuclear is overpowered, so where is it?

Nuclear power is an almost overpowered technology. The reason why comes down to physics: an energy source based on nuclear reactions has extreme power density, and, all else being equal, the higher your power density, the less fuel you need, the less waste products you produce, and the cleaner your power plant is overall. Not surprisingly, nuclear power turns out to be – along with solar and wind – the cleanest and safest power source we have.

In Where is My Flying Car?, J. Storrs Hall gives some vivid facts to demonstrate the power and efficiency of nuclear: a wind turbine uses more lubricating oil per energy generated than a nuclear power plant uses uranium, and while the 7.5 TJ of energy a Boeing 747 burns through during a flight weighs 200 tons and costs a third of a million dollars when delivered as chemical fuel, getting the equivalent energy from nuclear takes 100 grams of reactor-grade uranium and costs 10 dollars.

So where is it? The simple reason is that it's either illegal (like in Italy), being phased out (like in Germany), or highly regulated and/or expensive. It wasn't always so:

Source: Where is my Flying Car?, by J. Storrs Hall.

The above graph shows the price per kilowatt of US nuclear power plants. The green line is the trend line before the Department of Energy was established in 1977. Note also that the Three Mile Island accident was in 1979, and, despite no one being hurt, this was a turning point for the US nuclear industry.

When the price of a technology starts increasing, it's not the natural learning curve of the technology at work. It's a regulatory choice. And while you obviously should regulate nuclear power, we're not doing it right.

J. Storrs Hall explains the cost increases:

"Nuclear power is probably the clearest case where regulation clobbered the learning curve. Innovation is strongly suppressed when you’re betting a few billion dollars on your ability to get a license to operate the plant. Besides the obvious cost increases due to direct imposition of rules, there was a major side effect of forcing the size of plants up (fewer licenses); fewer plants were built and fewer ideas tried. That also meant a greater cost for transmission (about half the total, according to my itemized bill), since plants are further from the average customer."

There is some hope that the tide is turning. New startups like NuScale are working on small modular reactors that might greatly reduce prices. Of course, in addition to difficulties with funding, and the not-so-easy task of building a literal nuclear reactor, they've spent years jumping through regulatory hurdles and are not expected to produce power until 2029. So-called fourth-generation reactors are also being worked on, and there's always the hope we eventually get fusion.

But we're not going to get the benefits of cheap and plentiful nuclear power unless we stop treating it like it's the Antichrist.

Hall, never one to pass up the opportunity for a dramatic touch, quotes John Steinbeck's The Grapes of Wrath to sum up the sadness of our attitude to nuclear power:

“And men with hoses squirt kerosene on the oranges, and they are angry at the crime, angry at the people who have come to take the fruit. A million people hungry, needing the fruit—and kerosene sprayed over the golden mountains.

[...]

There is a crime here that goes beyond denunciation. There is a sorrow here that weeping cannot symbolize. There is a failure here that topples all our success. The fertile earth, the straight tree rows, the sturdy trunks, and the ripe fruit. And children dying of pellagra must die because a profit cannot be taken from an orange. And coroners must fill in the certificate—died of malnutrition—because the food must rot, must be forced to rot.”

More generally, human civilisation need to get better at making decisions about technology. We shouldn't deny ourselves safe clean energy, but we should start working on mitigating the harms from actually scary technologies, like nuclear weapons, and make sure that new technologies like biotech and AI are used safely. Oh, and have I mentioned that burning things is bad for climate and health, and we should stop doing it?

A metaphor

I mentioned earlier that nuclear power and fossil fuels are like flying and driving. One of them is obviously safer, but the other seems scarier because the lizard-derived part of our brains can't multiply. Objecting to nuclear power on safety grounds but tolerating fossil fuels is like texting about how scared you are to board a plane while driving yourself to the airport. Let's make this metaphor more concrete, and hopefully create a memorable image.

The world consumes about 20 000 TWh per year as electricity (about one-eight of total energy use – lots is used directly for transportation and heat). Let's compare this to making a drive across Europe that starts in Lisbon and ends in Tallinn. Each kilometre we travel represents a bit less than 5 TWh of energy towards our 20 000 TWh goal. Let's say walking is wind/solar/geothermal, biking is hydropower, flying is nuclear, and driving is fossil fuels.

(The numbers for fossil fuel related deaths below are significant underestimates of the global average, because, like the chart above, they're based on the European data in this study. Regulations are looser and population densities higher in many developing countries that make up most of the world's air pollution deaths. I was not able to find a good estimate of the global average, and besides, these numbers are terrifying enough as they are.)

First we walk some 450 km, ending north-west of Madrid, and then bike 650 km, just barely taking us into France. We're a bit careless and somehow we've manage to shove a hundred people off wind turbines along the way. Oops.

By this point we're getting tired of walking and biking, but thankfully there's a flight to Paris. The pilot has a bad day and lands on top of a crowd, flattening another hundred people.

We really hate flying, so we refuse all the other offers that the airline companies try to sell us. Instead we step out of the Paris airport, rent a car, and start carelessly careening down the remaining 2600 km.

Gas takes us approximately to Berlin, a distance of about 1000 km. During this entire distance we run over a pedestrian at every block (roughly 1 per 80 metres), killing some 10 000 people in total.

We're in a real hurry to get to Poland, where the traffic rules get even more lenient and we can start burning coal. The final leg of the journey from Berlin to the Polish border is powered by oil and isn't long, but still results in as many lethal hit-and-runs as the entire journey before it.

At the Polish border, we reach coal. From this point on, we text about the dangers of nuclear waste as we mow down one pedestrian every 8 metres for the entire rest of the coal-powered trip to Estonia (also burning some other nasty things too). Driving at a reckless 120 km/h whatever road we're on, we go run through four pedestrians a second – you'll hear a rapid thwack-thwack-thwack-thwack noise as the bodies hit the windshield – but it still takes 13 hours to make the trip. By the time we reach the Lithuanian border, the bodies of our victims, packed as tightly as possible, fill four Olympic swimming pools. Each of the three Baltic countries we drive through before reaching Tallinn fills another one.

Oh, and also every kilometre driven in our car had fifty times the environmental impact of flying.

Thank god we didn't fly: imagine how horrible it would be if another pilot had had a bad day.

The world makes this trip every year to meet our growing energy needs. We're getting fitter and walking a bit longer every year, as we should. But whenever someone suggests flying instead of driving, our collective response is: "What?! But that's so risky!"

Let's fly.

RELATED:

Technological progress

2021-03-25T16:12:00.005+00:00

4k words (about 13 minutes)

In this post, I've collected some thoughts on:

why technological progress probably matters more than you'd immediately expect;
what models we might try to fit to technological progress;
whether technological progress is stagnating; and
what we should hope future technological progress to look like.

Technological progress matters

The most obvious reason why technological progress matters is that it is the cause for the increase in human welfare after the industrial revolution, which, in moral terms at least, is the most important thing that's ever happened. "Everything was awful for a long time, and then the industrial revolution happened" isn't a bad summary of history. It's tempting to think that technology was just one factor working with many others, like changing politics and moral values, but there are strong cases to be made that a changed technological environment, and the economic growth it enabled, were the reasons for political and moral changes in the industrial era. Given this history, we should expect that more technological progress will be important for increasing human welfare in the future too (though not enough on its own – see below). This applies both to people in developed countries – we are not at utopia yet, after all – as well as those in developing countries, who are already seeing vast benefits from information technology making development cheaper, and would especially benefit from decreases in the price of sustainable energy generation.

Then there are more subtle reasons to think that technological progress doesn't get the attention it deserves.

First, it works over long time horizons, so it is especially subject to all the kinds of short-termism that plague human decision-making.

Secondly, lost progress isn't visible: if the Internet hadn't been invented, very few would realise what they're missing out on, but try taking it away now and you might well spark a war. This means that stopping technological progress is politically cheap, because likely no one will realise the cost of what you've done.

Finally, making the right decisions about technology is going to decide whether or not the future is good. Debates about technology often become debates about whether we should be pessimistic or optimistic about the impacts of future technology. This is rarely a useful framing, because the only direct impact of technology is to let us make more changes to the world. Technology shouldn't be understood as a force automatically pulling the distribution of future outcomes in a good or bad direction, but as a force that blows up the distribution so that it spans all the way from an engineered super-pandemic that kills off humanity ten years from now to an interstellar civilisation of trillions of happy people that lasts until the stars burn down. Where on this distribution we end up on depends in large part on the decisions we collectively make about technology. So, how about we get those decisions right?

But first, how should we even think about technological progress?

Modelling technological progress

Some people think that technological progress is stagnating relative to historical trends, and that, for example, we should have flying cars by now. To be able to answer this question, we need some model of what technological progress should be like. I can think of three general ones.

The first one I'll name the Kurzweilian model, after futurist Ray Kurzweil, who's made a big deal about how the intuitive linear model of technological progress is wrong, and history instead shows technological progress is exponential – the larger your technological base, the easier it is to invent new technologies, and hence a graph of anything tech-related should be a hockey-stick curve shooting into the sky.

The second I'll call the fruit tree model, after the metaphor that once the "low-hanging fruit" are picked off, progress gets harder. The strongest case for this model is in science; the physics discoveries you can make by watching apples fall down have (very likely) long since been picked off. However, it's not clear similar arguments should apply to technology. Perhaps we can model inventing a technology as finding a clever way to combine a number of already known parts into a new thing, and hence the number of possible inventions as would be an increasing function of the number of things already invented, since this gives more combinations. For example, even if progress in pure aviation is slow, when we invent new things like lightweight computers we can combine the two to get drones. I haven't seen anyone propose a model to explain why the fruit tree model makes sense for technology in particular.

The third model is that technological change is mostly random. Any particular technological base satisfies the prerequisites for some set of inventions. Once invented, a new technology goes through an S-curve of increasing adoption and development, before reaching widespread adoption and a mature form. Sometimes there are many inventions just within reach, and you get an innovation burst, like the mid-20th century one when television, cars, passenger aircraft, nuclear weapons, birth control pills, and rocketry are all simultaneously going through the rapid improvement and adoption phase. Sometimes there are no plausible big inventions for very long periods of time, for example in medieval times.

Here's an Our World in Data graph (source and interactive version here) showing more-or-less-S-curves for the adoption of a bunch of technologies:

(One can try to imagine an even more general model to unify the three models above, though we're getting to fairly extreme abstraction levels. Nevertheless, for the fun of it: let's model each technology as a set of prerequisite technologies, and assume there's a subset of technology-space that makes up the sensible technologies, and some cost function that describes how hard it is to go from a set of technologies to a given new technology (so infinity if all prerequisites of the new one aren't contained in the known set). Then slow progress would be modelled as the set of sensible ideas and the cost function being such that from any particular set of known technologies, there are only a few sensible ideas with prerequisites only in the known set, and these have high costs. Fast progress is the opposite. In the Kurzweilian model, the subspace of sensible ideas is in some sense uniform, so that the fraction of the $2^{|K|}$ possible prerequisite combinations for a known technology set $K$ that are contained within the sensible set does not go down with the cardinality of $K$ , and also we require the cost function to not increase too rapidly as the complexity of the technologies grow. In the fruit tree model, the cost function increases, and possibly the frequency of sensible technologies becomes sparser as you get into the more complex parts of technology-space. In the random model, the cost function has no trend, and a lot of the advancements happen when a "key technology" is discovered that is the last unknown prerequisite for a lot of sensible technologies in technology-space.)

(Question: has anyone drawn up a dependency tree of technologies across many industries (or even one large one), or some other database where each technology is linked to a set of prerequisites? That would be an incredible dataset to explore.)

In Where is my Flying Car?, J. Storrs Hall introduces his own abstraction of a civilisation's technology base that he calls the "technium": imagine some high-dimensional space representing possible technologies, and imagine a blob in this space representing existing technology. This blob expands as our technological base expands, but not uniformly: imagine some gradient in this space representing how hard it is to make progress in a given direction from a particular point, which you can visualise as a "terrain" which the technium has to move along as it expands. Some parts of the terrain are steep: for example, given technology that lets you make economical passenger airplanes moving at near the speed of sound, it takes a lot to progress beyond that because crossing the speed of sound is difficult. Hence the "aviation cliffs" in the image below; the technium is pressing against it, but progress will be slow:

(Image source: my own slides for an EA Cambridge talk.)

In other cases, there are valleys, where once the technium gets a toehold in it, progress is fast and the boundaries of what's possible gush forwards like a river breaking a dam. The best example is probably computing: figure out how to make transistors smaller and smaller, and suddenly a lot of possibilities open up.

We can visualise the three models above in terms of what we'd expect the terrain to look like as the technium expands further and further:

(Or maybe a better model would be one where the gradient is always be positive, with 0 gradient meaning effortless progress?)

In the Kurzweilian model, the terrain gets easier and easier the further out you go; in the fruit tree it's the opposite; if there is no pattern, then we should expect cliffs and valleys and everything in between, with no predictable trend.

Hall comes out in favour of what I've called the random model, even going as far as to speculate that the valleys might follow a Zipf's law distribution. He concisely summarises the major valleys of the past and future:

"The three main phases of technology that drove the Industrial Revolution were first low-pressure steam engines, then machine tools, and then high-pressure engines enabled by the precision that the machine tools made possible. High-pressure steam had the power-to-weight ratios that allowed for engines in vehicles, notably locomotives and steamships. The three major, interacting, and mutually accelerating technologies in the twenty-first century are likely to be nuclear, nanotech (biotech is the “low-pressure steam” of nanotech), and AI, coming together in a synergy I have taken to calling the Second Atomic Age."

Personally, my views have shifted away from somewhat Kurzweilian ones and towards the random model, with the main factors being that the technological stagnation debate has made me less certain that the historical data fits a Kurzweilian trend, and that since there are no clear answers to whether there is a general pattern, it's sensible to shift the distribution of my beliefs towards the model that doesn't require assuming the truth of a general pattern. However, given some huge valleys that seem to be out there – AI is the obvious one, but also nanotechnology, which might bring physical technology to Moore's law -like growth rates – it is possible that the difference between the Kurzweilian and random model looks largely academic in the next century.

Is technology stagnating?

Now that we have some idea of how to think about technological progress, we are better placed to answer the question of whether it has stagnated: if the fruit tree model is true we should expect a slowdown, whereas if the extreme Kurzweilian model is true, a single trend line that's not going to break past the top of the figure in the next decade is a failure. Even so, this question is very confusing; economists debate about total factor productivity (a debate I will stay out of), and in general it's hard to know what could have been.

However, it does seem true that compared to the mid-20th century, the post-1970 era has seen breakthroughs in fewer categories of innovation. Consider:

1920-1970:
- cars
- radio
- television
- antibiotics
- the green revolution
- nuclear power
- passenger aviation
- chemical space travel
- effective birth control
- radar
- lasers
1970-2020:
- personal computers
- mobile phones
- GPS
- DNA sequencing
- CRISPR
- mRNA vaccines

Of course, it's hard to compare inventions and put them in categories – is lumping everything computing-related as largely the same thing really fair? – but some people are persuaded by such arguments, and a general lack of big breakthroughs in big physical technologies does seem true. (Though might soon change, since the clean energy, biotech, and space industries are making rapid progress.)

Why is this? If we accept the fruit tree model, there's nothing to be explained. If we accept the random one, we can explain it as a fluke of the shape of the idea space terrain that the technium is currently pressing into. To quote Hall again:

"The default [explanation for technological stagnation] seems to have been that the technium has, since the 70s, been expanding across a barren high desert, except for the fertile valley of information technology. I began this investigation believing that to be a likely explanation."

This, I think, is a pretty common view, and is a sensible null hypothesis for the lack of other evidence. We can also imagine variations, like the existence of a huge valley in the form of computing drawing all the talent that would otherwise have gone into pushing the technium forwards in other places. However, Hall rather dramatically concludes that this

"[...] is wrong. As the technium expanded, we have passed many fertile Gardens of Eden, but there has always been an angel with a flaming sword guarding against our access in the name of some religion or social movement, or simply bureaucracies barring entry in the name of safety or, most insanely, not allowing people to make money."

Is this ever actually the case? I think there is a case where a feasible (and economic, environmental, and health-improving) technology has been blocked: nuclear power, as I discuss here. We should therefore amend our model of the technium: not only does it have to contend with the cliffs inherent in the terrain, but sometimes someone comes along and builds a big fat wall on the border, preventing either development, deployment, or both.

In diagram form:

Are there other cases? Yes – GMOs, as I discuss in this review. There have also been some harmful technologies that have been controlled; for example biological and chemical weapons of mass destruction are more-or-less kept under control by two treaties (the Biological Weapons Convention and the Chemical Weapons Convention). However, such cases seem to be the exception, since the overall history is one of technology adoption steamrolling the luddites, from the literal Luddites to George W. Bush's attempts to limit stem cell research.

There are also cases where we put a lot of effort into expanding the technium in a specific direction (German subsidies for solar power are one successful example). We might think of this as adding stairs to make it easier to climb a hill.

How much of the technium's progress (or lack thereof) is determined by the terrain's inherent shape, and how much by the walls and stairs that we slap onto it? I don't know. The examples above show that as a civilisation we sometimes do build important walls in the technium terrain, but arguments like those Hall presents in Where is my Flying Car? are not strong enough to make me update my beliefs to thinking that this is the main factor determining how the technium expands. If I had to make a very rough guess, I'd say that though there is variation based on area (e.g. nuclear and renewable energy have a lot of walls and stairs respectively; computing has neither), overall the inherent terrain has at least several times the effect size on the decadal timescale. The power balance seems heavily dependent on the timescale too – George W. Bush can hold back stem cells for a few years, but imagine the sort of measures it would have taken to delay steam engines for the past few hundred years.

How should we guide technological progress?

How much should we try to guide technological progress?

A first step might be to look at how good we've been at it in the past, so that we get a reasonable baseline for likely future performance. Our track record is clearly mixed. On one hand, chemical and biological weapons of mass destruction have so far been largely kept under control, though under a rather shoestring system (Toby Ord likes to point out that the Biological Weapons Convention has a smaller budget than an average McDonald's), and subsidies have helped solar and wind to become mature technologies. On the other hand, there are over ten thousand nuclear weapons in the world and they don't seem likely to go away anytime soon (in particular, while New START was recently extended, Russia has a new ICBM coming into service this year and the US is probably going to go ahead with their next-generation ICBM project, almost ensuring that ICBMs – the most strategically volatile nuclear weapons – continue existing for decades more). We've mostly stopped ourselves benefiting from safe and powerful technologies like nuclear power and GMOs for no good reason. More recently, we've failed to allow human challenge trials for covid vaccines, despite massive net benefits (vaccine safety could be confirmed months faster, and the risk to healthy participants is lower than a year at some jobs), an army of volunteers, and broad public support.

Imagine your friend was really into picking stocks, and sure, they once bought some AAPL, but often they've managed to pick the Enrons and Lehman Brothers of the world. Would your advice to them be more like "stay actively involved in trading" or "you're better off investing in an index fund and not making stock-picking decisions"?

Would things be better if we had tried to steer technology less? We'd probably be saving money and the environment (and third-world children) by eating far more genetically engineered food, and air pollution would've claimed millions fewer lives because nuclear power would've done more to displace coal. Then again, we'd probably have significantly less solar power. (Also, depending on what counts as steering technology rather than just reacting to its misuses, we might include the eventual bans on lead in gasoline, DDT, and chloroflourocarbons as major wins.) And maybe without the Biological Weapons Convention becoming effective in 1975, the Cold War arms race would've escalated to developing even more bioweapons than the Soviets already did (for more depth, read this), and an accidental leak might've released a civilisation-ending super-anthrax.

So though we haven't been particularly good at it so far, can we survive without steering technological progress in the future? I made the point above that technology increases the variance of future outcomes, and this very much includes in the negative direction. Maybe hypersonic glide vehicles make the nuclear arms race more unstable and eventually result in war. Maybe technology lets Xi Jinping achieve his dream of permanent dictatorship, and this model turns out to be easily exportable and usable by authoritarians in every country. Maybe we don't solve the AI alignment problem before someone goes ahead and builds one, and the result is straight from Nick Bostrom's nightmares. And what exactly is the stable equilibrium in a world where a 150€ device that Amazon will drone-deliver to anyone in the world within 24 hours can take a genome and print out bacteria and viruses that have it?

This fragility is highlighted in a 2002 paper by Nick Bostrom, who shares the view that the technium can't be reliably held back, at least to the extent that some dangerous technologies might require:

"If a feasible technology has large commercial potential, it is probably impossible to prevent it from being developed. At least in today’s world, with lots of autonomous powers and relatively limited surveillance, and at least with technologies that do not rely on rare materials or large manufacturing plants, it would be exceedingly difficult to make a ban 100% watertight. For some technologies (say, ozone-destroying chemicals), imperfectly enforceable regulation may be all we need. But with other technologies, such as destructive nanobots that self-replicate in the natural environment, even a single breach could be terminal."

The solution is what he calls differential development:

"[We can affect] the rate of development of various technologies and potentially the sequence in which feasible technologies are developed and implemented. Our focus should be on what I want to call differential technological development: trying to retard the implementation of dangerous technologies and accelerate implementation of beneficial technologies, especially those that ameliorate the hazards posed by other technologies." [Emphasis in original]

(See here for more elaboration on this concept and variations.)

For example:

"In the case of nanotechnology, the desirable sequence would be that defense systems are deployed before offensive capabilities become available to many independent powers; for once a secret or a technology is shared by many, it becomes extremely hard to prevent further proliferation. In the case of biotechnology, we should seek to promote research into vaccines, anti-bacterial and anti-viral drugs, protective gear, sensors and diagnostics, and to delay as much as possible the development (and proliferation) of biological warfare agents and their vectors. Developments that advance offense and defense equally are neutral from a security perspective, unless done by countries we identify as responsible, in which case they are advantageous to the extent that they increase our technological superiority over our potential enemies. Such “neutral” developments can also be helpful in reducing the threat from natural hazards and they may of course also have benefits that are not directly related to global security."

One point to emphasise is that the dangerous technology probably can't be held back indefinitely. One day, if humanity continues advancing (as it should), it will be easy to create deadly diseases, build self-replicating nanobots, or spin up a superintelligent computer program in the way that you'd spin up a Heroku server today. The only thing that will save us if the defensive technology (and infrastructure, and institutions) are in place by then. In The Diamond Age, Neal Stephenson imagines a future where there are defensive nanobots in the air and inside people that are constantly on patrol against hostile nanobots. I can't help but think that this is where we're heading. (It's also the strategy our bodies have already adopted to fight off organic nanobots like viruses.)

This is not how we've done technology harm mitigation in the past. Guns are kept in check through regulation, not by everyone wearing body armour. Sufficiently tight rules on, say, what gene sequences you can put into viruses or what you can order your nanotech universal fabricator to produce will almost certainly be part of the solution and go a long way on their own. However, a gun can't spin out of control and end humanity; an engineered virus or self-replicating nanobot might. And as we've seen, our ability to regulate technology isn't perfect, so maybe we should have a backup plan.

The overall picture therefore seems to be that our civilisation's track record at tech regulation is far from perfect, but the future of humanity may soon depend on it. Given this, perhaps it's better that we err on the side of too much regulation – not because it's probably going to be beneficial, but because it's a useful training ground to build up the institutional competence we're going to need to tackle the actually difficult tech choices that are heading our way. Better to mess up regulating Facebook and – critically – learn from it, than to make the wrong choices about AI.

It won't be easy to make the leap from a civilisation that isn't building much nuclear power despite being in the middle of a climate crisis to one that can reliably ensure we survive even when everyone and their dog plays with nanobots. However, an increase in humanity's collective competence at making complex choices about technology is something we desperately need.

RELATED:

Review: Where is my Flying Car?
Nuclear power is good
Review: Seeds of Science – GMOs are also good

Review: Where is my Flying Car?

2021-03-21T16:52:00.005+00:00

Book: Where is my Flying Car?: A Memoir of Future Past, by J. Storrs Hall (2018)
Words: 9.3k (about 31 minutes)

In the 50s and 60s, predictions of the future were filled with big physical technical marvels: spaceships, futuristic cities, and, most symbolically, flying cars. The lack of flying cars has become a cliche, whether as a point about the unpredictability of future technological progress, or a joke about hopeless techno-optimism.

For J. Storrs Hall, flying cars are not a joke. They are a feasible technology, as demonstrated by many historical prototypes that are surprisingly close to futurists' dreams, and practical too: likely to be more expensive than cars, yes, but providing many times more value to owners.

So, where are they?


Above: not a joke. (Public domain, original here)

The central motivating force behind Where is my Flying Car? is the disconnect between what is physically possible with modern science, and what our society is actually achieving. The immediate objection to such points is to say: "well, of course some engineer can imagine a world where all this fancy technology is somehow economically feasible and widespread, but in the real world everything is more complicated, and once you take these complications into account there's no surprising failure".

Hall's objection is that everything was going fine until 1970 or so.

Many people complain that technological progress has slowed. Flying cars, of course, but also: airliner cruising speeds have stagnated, the space age went on hiatus, cities are still single-level flat designs with traffic, nuclear power stopped replacing fossil fuels, and nanotechnology (in the long run, the most important technology for building anything) is growing slowly. Peter Thiel sums this up by saying "we wanted flying cars, instead we got 140 characters".

It's not just technology. There's an entire website devoted to throwing graphs at you about trends that changed around 1970 (and selling you Bitcoin on the side), and, while a bunch of it is Spurious Correlations material, they include enough important things, like a stagnation in median wages, that it's worth thinking about.

Perhaps the most fundamental indicator is that the energy available per person in the United States was increasing exponentially (a trend Hall names the Henry Adams curve), until, starting around 1970, it just wasn't:

Is this just because the United States is an outlier in energy use statistics? No; other developing countries have plateaued too, with the exception of Iceland and Singapore:

(Source: Our World in Data, one of the best websites on the internet. You can play around with an interactive version of this chart here.)

Hall tries to estimate what percentage of future predictions in some technical area have come true as a function of the energy intensity of the technology, and finds a strong inverse correlation: in less energy intensive areas (e.g. mobile phones) we've over-achieved relative to futurists' predictions, while the opposite is true with energy intensive big machines (e.g. flying cars). (This is necessarily very subjective, but Hall at least says he did not change any of his estimates after seeing the graph.)

Of course, we have to contrast the stagnation in some areas with enormous advancements during the same time. The most obvious example is computing, something that futurists generally missed. In biotechnology, the price of DNA sequencing has dropped exponentially and in just the past few years we've gotten powerful tools like CRISPR and mRNA vaccines. Meanwhile the average person is now twice as rich as in 1970, and life expectancy has increased by 15 years (and the numbers are not much lower if we restrict our attention just to developed countries).

Perhaps we should be content; maybe Peter Thiel should stop complaining now that we have 280 characters? After all, the problem is not that things are failing, but that they might be improving slower than they could be. That hardly seems like the end of the world. So why should we focus on technological progress? Has it really slowed? And how can we model it? I discuss these questions in another post. In this post, however, I will move straight onto Hall's favourite topic.

Cool technology

Flying cars

You might assume the case for flying cars looks something like this:

You get to places very fast.
Very cool.

However, there's a deeper case to be made for flying cars (or rapid transportation in general), and it starts with the observation that barefoot-walkers in Zambia tend to spend an hour or so a day travelling. Why is this interesting? Because this is the same as the average duration in the United States (of course Hall's other example is the US) or any other society.

Flying cars aren't about the speed – they're about the distance that this speed allows, given universal human preferences for daily travel duration. Cars on the road do about 60 km/h on average for any trip ("you might think that you could do better for a long trip where you can get on the highway and go a long way fast", Hall writes, but "the big highways, on the average, take you out of your way by an amount that is proportional to the distance you are trying to go"). A flying car that goes five times faster lets you travel within twenty-five times the area, potentially opening up a lot of choice.

Hall goes through some calculations about the utilities of different time-to-travel versus distance functions, given empirical results from travel theory, to produce this chart (which I've edited to improve the image quality and convert units) as a summary:

(The overhead time means how long it takes to transition into flying mode, for example if you have to attach wings to it, or drive to an airport to take off.)

Even a fairly lame flying car would easily be three times more valuable than a regular car, mainly by giving you more choice and therefore letting you visit places that you like more.

In terms of what a flying car would actually look like, you have several options. Helicopters are obvious, but they are about ten times the price of cars, mechanically complex (and with very low manufacturing tolerances), and limited by aerodynamics (the advancing blade pushes against the sound barrier, and the retreating one pushes against generating too little lift due to how slowly it moves) to a speed of 250 km/h or so.

Historically, many promising flying car designs that actually flew where autogyros, which generate thrust with a propeller but lift through an unpowered freely-rotating helicopter-like rotor. They generally can't take off vertically, but can land in a very small space.

Another design is a VTOL (vertical take-off and landing) aircraft. Some have been built and used as fighter jets, but they've gained limited use because they're slower and less manoeuvrable than conventional fighters and have less room for weapons. However, Hall notes that one experimental VTOL aircraft in particular – the XV-5 – would "have made one hell of a sports car" and its performance characteristics are recognisable as those of a hypothetical utopian flying car. It flew in 1964, but was cancelled because the Air Force wanted something as fast and manoeuvrable as a fighter jet, rather than "one hell of a sports car".

Of current flying car startups, Hall mentions Terrafugia and AeroMobil, which produce traditional gasoline-powered vehicles (both with fuel economies comparable in litres/km to ordinary cars). There's also Volocopter and EHang, both of which produce electric vehicles with constrained ranges.

Hall divides the roadblocks (or should I say NOTAMs?) for flying cars into four categories.

The first is that flying is harder than driving. To test this idea, Hall learned to fly a plane, and concluded that it is considerably harder, but not insurmountably. Besides, we're not far from self-driving; commercial passenger flights are close to self-piloting already, the existing Volocopter is only "optionally piloted", and the EHang 184 flies itself.

The second is technological. The main challenges here are flying low and slow without stalling (you want to be able to land in small places, at least in emergencies), and reducing noise to manageable levels.

The third is economic. Even though the technology theoretically exists, it may be that we're not yet at a stage where personal flying machines are economically feasible. To some extent this is true; Hall admits that even on the pre-1970 trends in private aircraft ownership, the US private aircraft market would only be something like 30 000 - 40 000 per year (compared to the 2 000 or so that it currently is), about a hundredth of the number of cars sold. The economics means we should expect that the adoption curve is shallow, but not that it's necessarily non-existent.

The final reason is simple: even if you could make a flying car, you wouldn't be allowed to. Everything in aviation is heavily regulated, pushing up costs in a way that, Hall says, leads private pilots to joke about "hundred-dollar burgers". Of course, flying is hard, so you want standards high enough that at the very least you don't have to dodge other people's home-made flying motorbikes as they rain down from the sky, but in Hall's opinion the current balance is wrong.

And it's not just that the balance is wrong, but that the regulations are messed up. For example, making aircraft in the light sports aircraft category would be a great way to experiment with electric flight, but the FAA forbids them from being powered by anything other than a single internal combustion piston engine.

In particular, the FAA "has a deep allergy to people making money with flying machines". If you own a two-seat private aircraft, you can't charge a passenger you take on a flight more than half of the fuel cost, so no air Uber. Until the FAA stopped dragging its feet on drone regulation in 2016, drones were operated under model aircraft rules, and therefore could not be used for anything other than hobby or recreational purposes. Similar rules still apply to ultralights, with one suspicious exception: a candidate for a federal, state, or local election is allowed to pay for a flight.

(And of course, to all these rules it's usually possible to apply for a waiver – so if you're a big company with an army of lawyers, do what you want, but if you're two people in a garage, good luck.)

There's no clear smoking gun of one piece of regulation specifically causing significant harm to flying car innovation. However, the harms of regulation are often a death-by-a-thousand-cuts situation, where a million rules each clip away at what is permissible and each add a small cost. Hall's conclusion is harsh: "It’s clear that if we had had the same planners and regulators in 1910 that we have now, we would never have gotten the family car at all."

One particular effect of flying cars would be to weaken the pull of cities, another topic to which Hall brings a lot of opinions.

City design

"Designing a city whose transportation infrastructure consists of the flat ground between the boxes is insane."

This is true. Most traffic problems would go away if you could add enough levels. However, "[e]ven the recent flurry of Utopia-building projects are still basically rows of boxes sitting on the dirt plus built-in wifi so the self-driving cars can talk to each other as they sit in automated traffic jams".

As usual, Hall spies some sinister human factors lurking behind the scenes, delaying his visions of techno-utopia:

"There is a perverse incentive for bureaucrats and politicians to force people to interact as much as possible, and indeed to interact in contention, as that increases the opportunities for control and the granting of favors and privileges. This is probably one of the major reasons that our cities have remained flat, one-level no-man’s-lands where pedestrians (and beggars and muggers) and traffic at all scales are forced to compete for the same scarce space in the public sphere, while in the private sphere marvels of engineering have leapt a thousand feet into the sky, providing calm, safe, comfortable environments with free vertical transportation."

This is an interesting idea, and I've read enough Robin Hanson to not discount such perverse explanations immediately, but once again I'm not convinced how important this factor is, and Hall, as usual, is happy to paint only in broad to strokes.

However, he makes a clearly strong point here:

"Densification proponents often point to an apparent paradox: removing a highway which crosses a community often does not increase traffic on the remaining streets, as the kind of hydraulic flow models used by traffic planners had assumed that it would. On the average, when a road is closed, 20% of the traffic it had handled simply vanishes. Traffic is assumed to be a bad thing, so closing (or restricting) roads is seen as beneficial. Well duh. If you closed all the roads, traffic would go to zero. If you cut off everybody’s right foot and forced them to use crutches, you’d get a lot less pedestrian traffic, too."

Hall takes a liberal principle of being strongly in favour of giving people choice, arguing that the goal of city design and transportation infrastructure should be to maximise how far people can travel quickly, rather than trying to ensure that they don't need to travel anywhere other than the set of choices the all-seeing, all-knowing urban designer saw fit to place nearby. Of course, once again flying cars are the best:

"The average American commute to work, one way by car, ranges from 20 minutes to half an hour (the longer times in denser areas). This gives you a working radius of about 15 miles [= 24 km], or [1800 square kilometres] around home to find a workplace (or around work to find a home). With a fast VTOL flying car, you get a [240-kilometre] radius or [180 thousand square kilometres] of commutable area. Cars, trucks, and highways were clearly one of the major causes of the postwar boom. It isn’t perhaps realized just how much the war on cars contributed to the great stagnation—or how much flying cars could have helped prolong the boom."

Nuclear power

I discuss nuclear power at length in another post.

Space travel?

What about the classic example of supposedly stalled innovation – we were on the moon in 1969, and won't return until at least 2024?

"With space travel, there’s a pretty straightforward answer: the Apollo project was a political stunt, albeit a grand and uplifting one; there was no compelling reason to continue going to the moon given the cost of doing so."

The general curve of space progress seems to be over-achievement relative to technological trends in the 60s, followed by stagnation, not because the technology is impossible – we did go to the moon after all – but because it just wasn't economical. Only now, with private space companies like SpaceX and Rocket Lab actually making a business out of taking things to space outside the realm of cosy costs-plus government contracts is innovation starting to pick up again.

(In the past ten years, we've seen the first commercial crewed spacecraft, reuse of rocket stages, the first methane-fuelled rocket engine ever flown, the first full-flow staged-combustion rocket engine ever flown, and the first liquid-fuelled air-launched orbital rocket, just to pick some examples.)

Hall has some further comments about space. First, in this passage he shows an almost-religious deference to trend lines:

"As you can see from the airliner cruising speed trend curve, we shouldn’t have expected to have commercial passenger space travel yet, even if the Great Stagnation hadn’t happened."

I don't think it makes sense to take a trend line for atmospheric flight speeds and use that to estimate when we should have passenger space travel; the physics is completely different, and in particular speeds are very constrained in orbit (you need to go 8 km/s to stay in orbit, and you can't go faster around the Earth without constant thrusting to stop yourself from flying off – something Hall clearly understands, as he explains it more than once).

Secondly, he is of course in favour of everything high-energy and nuclear.

For example: Project Orion was an American plan for a spacecraft powered (potentially from the ground up, rather than just in space) by throwing nuclear bombs out the back and riding the plasma from the explosions. This is a good contender for the stupidest-sounding idea that actually makes for a solid engineering plan; it's a surprisingly feasible way of getting sci-fi performance characteristics from your spacecraft. Other feasible methods have either far lower thrust (like ion engines, meaning that you can't use them to take off or land), or have far lower exhaust velocity (which means much more of your spacecraft needs to be fuel). The obvious argument against Orion, at least for atmospheric launch, is the fallout, but Hall points out it's actually not that bad – the number of additional expected cancer deaths from radiation per launch is "only" in the single digits, and that's under a very conservative linear no-threshold model of radiation dangers, which is likely wrong. (The actual reasons for cancellation weren't related to radiation risks, but instead the prioritisation of Apollo, the Partial Test Ban Treaty of 1963 that banned atmospheric nuclear tests, and the fact that no one in the US government had a particularly pressing need to put a thousand tons into orbit.) Hall also mentions an interesting fact about Orion that I hadn't seen before: "the total atmospheric contamination for a launch was roughly the same no matter what size the ship; so that there would be an impetus toward larger ones" – perhaps Orion would have driven mass space launch.

A more controlled alternative to bombing yourself through space is to use a nuclear reactor to heat up propellant in order to expel it out the back of your rocket at high speeds, pushing you forwards. The main limit with these designs is that you can't turn the heat up too much without your reactor blowing up. Hall's favoured solution is a direct fission-to-jet process, where the products of your nuclear reaction go straight out the engine without all this intermediate fussing around with heating the propellant. A reaction that converts a proton and a lithium-7 atom into 2 helium nuclei would give an exhaust velocity of 20 Mm/s (7% of the speed of light), which is insane.

To give some perspective: let's say your design parameters are that you have a 10 ton spacecraft, of which 1 ton can be fuel. With chemical rocket technology, this gives you a little toy with a total ∆V of some 400 m/s, meaning that if you light it up and let it run horizontally along a frictionless train track, it'll break the sound barrier by the time it's out of fuel, but it can't take you from a Earth-to-moon-intercept trajectory to a low lunar orbit even with the most optimal trajectories. With the proton + lithium-7 process Hall describes, your 10% fuel, 10-ton spaceship can accelerate at 1G for two days. If you want to go to Mars, instead of this whole modern business of waiting for the orbital alignment that comes once every 26 months and then doing a 9-month trip along the lowest-energy orbit possible, you can almost literally point your spaceship at Mars, accelerate yourself to a speed of 1 000 km/s over a day (for comparison, the speeds of the inner planets in their orbits are in the tens of kilometres per second range), coast for maybe a day at most, and then decelerate for another day. For most of the trip you get free artificial gravity because your engine is pushing you so hard. This would be technology so powerful even Hall feels compelled to tack on a safety note: "watch out where you point that exhaust jet".

Nanotechnology!

Imagine if machine pieces could not be made on a scale smaller than a kilometre. Want a gear? Each tooth is a 1km x 1km x 1km cube at least. Want to build something more complicated, say an engine? If you're in a small country, it may well be a necessarily international project, and also better keep it fairly flat or it won't fit within the atmosphere. Want to cut down a single tree? Good luck.

This is roughly the scale at which modern technology operates compared to the atomic scale. Obviously this massively cuts down on what we can do. Having nanotechnology that lets us rearrange atoms on a fine level, instead of relying on astronomically blunt tools and bulk chemical reactions, could put the capabilities of physical technology on the kind of exponential Moore's law curve we've seen in information technology.

There are some problems in the way. As you get to smaller and smaller scales:

matter stops being continuous and starts being discrete (and therefore for example oil-based lubrication stops working);
the impact of gravity vanishes but the impact of adhesion increases massively;
heat dissipation rates increase;
everything becomes springy and nothing is stiff anymore; and
hydrogen atoms (other atoms are too heavy) can start doing weird quantum stuff like tunnelling.

Also, how do we even get started? If all we have are extremely blunt tools, how do you make sharp ones?

There are two approaches. The first, the top-down approach, was suggested in a 1959 talk by Richard Feynman, which is credited as introducing the concept of nanotechnology. First, note that we currently have an industrial tool-base at human scales that is, in a sense, self-replicating: it requires human inputs, but we can draw a graph of the dependencies and see that we have tools to make every tool. Now we take this tool-base, and create an analogous one at one-fourth the scale. We also create tools that let us transfer manipulations – the motions of a human engineer's hands, for example – to this smaller-scale version (today we can probably also automate large parts of it, but this isn't crucial). Now we have a tool-base that can produce itself at a smaller scale, and we can repeat the process again and again, making adjustments in line with the above points about how the engineering must change. If each step is one-fourth the previous, 8 iterations will take us from a millimetre-scale industrial base to a tens-of-nanometres-scale one.

The other approach is bottom-up. We already have some ability to manipulate things on the single-digit nanometre scale: the smallest features on today's chips are in this range, we have atomic-scale microscopes that can also manipulate atoms, and of course we're surrounded by massively complicated nanotechnology called organic life that comes with pre-made nano-components. Perhaps these tools let us jump straight to making simple nano-scale machines, and a combination of these simple machines and our nano-manipulation tools lets us eventually build the critical self-sustaining tool-base at the atomic level.

Weather machines?!

Here's one thing you could do with nanotechnology: make 5 quintillion 1 cm controllable hydrogen balloons with mirrors, release them into the atmosphere, and then set sunlight levels to be whatever you want (without nanotechnology, this might also be doable, but nanotechnology lets you make very thin balloons and therefore removes the need to strip-mine an entire continent for the raw materials).

Hall calls this a weather machine, and it is exactly what it says on the tin, both on a global and local level. He estimates that it would double global GDP by letting regions set optimal temperatures, since "you could make land in lots of places on the earth, such as Northern Canada and Russia, as valuable as California". Of course, this is assuming that we don't care about messing up every natural ecosystem and weather pattern on the planet, but if the machine is powerful enough we might choose to keep the still-wild parts of the world as they are. I don't know if this would work, though; sunlight control alone can do a lot to the weather, but perhaps you'd need something different to avoid, for example, the huge winds from regional temperature differences? However, with a weather machine, the sort of subtle global modifications needed to reverse the roughly 1 watt per square metre increase in incoming solar radiation that anthropogenic emissions have caused would be trivial.

Weather machines are scary, because we're going to need very good institutions before that sort of power can be safely wielded. Hall thinks they're coming by the end of the century, if only because of the military implications: not only could you destroy agriculture wherever you want, but the mirrors could also focus sunlight onto a small spot. You could literally smite your enemies with the power of the sun.

Don't want things in the atmosphere, but still want to control the climate? Then put up sunshades into orbit, incentivising the development of a large-scale orbital launch infrastructure at the same time that we can afterwards use to settle Mars or whatever. As a bonus, put solar panels on your sunshade satellites, and you can generate more power than humanity currently uses.

As always, nothing is too big for Hall. He goes on to speculate about a weather machine Dyson sphere at half the width of the Earth's orbit. Put solar panels on it, and it would generate enormous amounts of power. Use it as a telescope, and you could see a phone lying on the ground on Proxima Centauri b. Or, if the Proxima Centaurians try to invade, you can use it as a weapon and "pour a quarter of the Sun’s power output, i.e. 100 trillion terawatts, into a [15-centimetre] spot that far away, making outer space safe for democracy."

Flying cities?!?

And because why the hell not: imagine a 15-kilometre airplane shaped like a manta ray and with a thickness of a kilometre (so the Burj Khalifa fits inside), with room for 10 million people inside. It takes 200 GW of power to stay flying – equivalent to 4 000 Boeing 747s – which could be provided by a line of nuclear power plants every 100 metres or so running along the back. This sounds like a lot, but Hall helpfully points out the reactors would only be 0.01% of the internal volume, so you could still cluster Burj Khalifas inside to your heart's content, and the energy consumption comes out to only 20 kW per person, about where we'd be today if energy use had continued growing on pre-1970s trends.

If you don't want to go to space but still want to leave the Earth untouched, this is one solution, as long as you don't mind a lot of very confused birds.

Technology is possible, but has risks

I worry that Where is my Flying Car? easily leaves the impression that everything Hall talks about is part of some uniform techno-wonderland, which, depending on your prior about technological progress, is somewhere between certainly going to happen or permanently relegated to the dreams of mad scientists. Hall does not work to dispel this impression: he goes back and forth between talking about how practical flying cars are and exotic nuclear spacecraft, or between reasonable ideas about traffic layout in cities and far-off speculation about city-sized airplanes. Credible world-changing technologies like nanotechnology easily seem like just another crazy thought Hall sketched out on the back of the envelope and could not stop being enthusiastic about.

So should we take Hall's more grounded speculation seriously and ignore the nano-nuclear-space-megapolises? I think this would be the wrong takeaway. First, I'm not sure Hall's crazy speculation is crazy enough to capture possible future weirdness within it; he restricts himself mainly to physical technologies, and thus leaves out potentially even weirder things like a move to virtual reality or the creation of superhuman intelligence (whether AI or augmented humans).

Second, Hall does have a consistent and in some way realist perspective: if you look at the world – not at the institutions humans have built, or whatever our current tech toolbox contains, but at the physical laws and particles at our disposal – what do you come up with?

After all, our world is ultimately not one of institutions and people and their tools. The "strata" go deeper, until you hit the bedrock of fundamental physics. We spend most of our time thinking about the upper layers, where the underlying physics is abstracted out and the particles partitioned into things like people and countries and knowledge. This is for good reason, because most of the time this is the perspective that lets you best think about things important to people. Occasionally, however, it's worth taking a less parochial perspective by looking right down to the bedrock, and remembering that anything that can be built on that is possible, and something we may one day deal with.

This perspective should also make clear another fact. The things we care about (e.g. people) exist many layers of abstraction up from the fundamental physics, and are therefore fragile, since they depend on the correct configuration of all levels below. If your physical environment becomes inhospitable, or an engineered virus prevents your cells from carrying out their function, the abstraction of you as a human with thoughts and feelings will crash, just like a program crashes if you fry the circuits of the computer it runs on.

So there are risks, new ones will appear as we get better at configuring physics, and stopping civilisation from accidentally destroying itself with some new technology is not something we're automatically guaranteed to succeed at.

Hall does not seem to recognise this. Despite all his talk about nanotechnology, the grey goo scenario of self-replicating nanobots going out of control and killing everyone doesn't get a mention. As far as I'm aware, there's no strong theoretical reason for this to be impossible – nanobots good at configuring carbon/oxygen/hydrogen atoms are a very reasonable sort of nanobot, and I can't help but noticing that my body is mainly carbon, oxygen, and hydrogen atoms. "What do you replace oil lubrication with for your atomic scale machine parts" is a worthwhile question, as Hall notes, but I'd like to add that so is the problem of not killing everyone.

Hall does mention the problem of AI safety:

"The latest horror-industry trope is right out of science fiction [...]. People are trying to gin up worries that an AI will become more intelligent than people and thus be able to take over the world, with visions of Terminator dancing through their heads. Perhaps they should instead worry about what we have already done: build a huge, impenetrably opaque very stupid AI in the form of the administrative state, and bow down to it and serve it as if it were some god."

What's this whole thing with arguments of the form "people worry about AI, but the real AI is X", where X is whatever institution the author dislikes? Here's another example from a different political perspective (by sci-fi author Ted Chiang, whose fiction I enjoy). I don't think this is a useless perspective – there is an analogy between institutions that fail because their design optimises for the wrong thing, and the more general idea of powerful agents accidentally designed to optimise for the wrong thing – but at the end of the day, surprise surprise, the real AI is a very intelligent computer program.

Hall also mentions he "spent an entire book (Beyond AI) arguing that if we can make robots smarter than we are, it will be a simple task to make them morally superior as well." This sounds overconfident – morality is complicated, after all – but I haven't read it.

As for climate change, Hall acknowledges the problem but justifies largely dismissing it by citing “[t]he actual published estimates for the IPCC’s worst case scenario, RCP8.5, [which] are for a reduction in GDP of between 1% and 3%". This is true ... if you only consider the United States! (The EU is in the same range but the global estimates range up to 10%, because of a disproportionate effect on poor tropical countries.) As the authors of that very report also note, these numbers don't take into account non-market losses. If Hall wants to make an argument for techno-optimistic capitalism, he should consider taking more care to distinguish himself from the strawman version.

It's not the technology, stupid!

Hall does not think that we'd have all the technologies mentioned above if only technological progress had not "stagnated". The things he expects could've happened by now given past trends are:

The technological feasibility of flying cars would be demonstrated and sales would be on the rise; Hall goes as far as to estimate the private airplane market in the US could have been selling 30k-40k planes per year (a fairly tight confidence interval for something this uncertain); compare with the actual US market today, which sells around 16 million cars and a few thousand private aircraft per year.
Demonstrated examples of multi-level cities and floating cities.
Chemical spacecraft technology would be about where they are now, but some chance that government funding would have resulted in Project Orion-style nuclear launch vehicles.
Nanotechnology: basic things like ammonia fuel cells might exist, but not fancier things like cell repair machines or universal fabricators.
Nuclear power would generate almost all electricity, and hence there would be a lot less CO2 in the atmosphere (this study estimates 174 billion fewer tons of CO2 had reasonable nuclear trends continued, but Hall optimistically gives the number as 500 billion tons).
AI and computers at the same level as today.
A small probability that something unexpected along the lines of cold fusion would have turned out to work and been commercialised.
A household income several times larger than today.

So what went wrong? Hall argues:

"The faith in technology reflected in Golden Age SF and Space Age America wasn’t misplaced. What they got wrong was faith in our culture and bureaucratic arrangements."

He gives two broad categories of reasons: concrete regulations, and a more general cultural shift from hard technical progress to worrying and signalling.

Regulation ruins everything?

Hall does not like regulation. He estimates that had regulation not grown as it did after 1970, the increased GDP growth might have been enough to make household incomes 1.5 to 2 times higher than they are today in the US. I can find some studies saying similar things – here is one claiming 0.8% lower GDP growth per year since 1980 due to regulation, which would imply today's economy would be about 1.3 times larger had this drag on growth existed. As far as I can tell, these estimates also don't take into account the benefits of regulation, which are sometimes massive (e.g. banning lead in gasoline). However, I think most people agree that regardless of how much regulation there should be, it could be a lot smarter.

Hall's clearest case for regulation having a big negative impact on an industry is private aviation in the United States, which crashed around 1980 after more stringent regulations were introduced. The number of airplane shipments per year dropped something like six-fold and never recovered.

A much bigger example is nuclear power, which I will discuss in an upcoming post, and which Hall also has plenty to say about.

Strangely, Hall misses perhaps the most obvious case in modern times: GMOs pointlessly being almost regulated out of existence, a story told well in Mark Lynas' Seeds of Science (my review here). Perhaps this is because of Hall's focus on hard sciences, or his America-centrism (GMO regulation is worse in the EU than in the United States).

And speaking of America-centrism, the biggest question I had is why even if the US is bad at regulation, no country decides to do better and become the flying car capital of the world. Perhaps good regulation is hard enough that no one gets it right? Hall makes no mention of this question, though.

He does, however, throw plenty of shades on anything involving centralisation. For example:

"Unfortunately, the impulse of the Progressive Era reformers, following the visions of [H. G.] Wells (and others) of a “Scientific Socialism,” was to centralize and unify, because that led to visible forms of efficiency. They didn’t realize that the competition they decried as inefficient, whether between firms or states, was the discovery procedure, the dynamic of evolution, the genetic algorithm that is the actual mainspring of innovation and progress."

He brings some interesting facts to the table. For example, an OECD survey found a 0.26 correlation between private spending on research & development and economic growth, but a -0.37 between public R&D and growth. Here's Hall's once again somewhat dramatic explanation:

“Centralized funding of an intellectual elite makes it easier for cadres, cliques, and the politically skilled to gain control of a field, and they by their nature are resistant to new, outside, non-Ptolemaic ideas. The ivory tower has a moat full of crocodiles.”

He backs this up with his personal experiences of US government spending on nanotechnology lead to a flurry of scientists trying to claim that their work counted as nanotechnology (up to and including medieval stained glass windows) as well as trying to discredit anything that actually was nanotechnology, to make sure that the nanotechnologists wouldn't steal more federal funding in the future.

Studies, not surprisingly, find that the issue is more complicated (see for example here, which includes a mention of the specific survey Hall references).

Hall also includes a graph of economic growth vs the Fraser Institute's economic freedom score in the United States. I've created my own version below, including some more information than Hall does:

In general, it seems sensible to expect economic freedom to increase GDP: the more a person's economic choices are limited, the more likely the limitations are to prevent them from taking the optimal action (the main counterexample being if optimal actions for an individual create negative externalities for society). We can also see that this is empirically the case – developed countries tend to have high economic freedom. However, in using this graph as clear evidence, I think Hall is once again trying to make too clear a case on the basis of one correlation.

Effective decentralised systems, whether markets or democracy, are always prone to attack by people who claim that things would be better if only we let them make the rules. Maybe it takes something of Hall's engineer mindset to resist this impulse and see the value of bloodless systems and of general design principles like feedback and competition. (And perhaps Hall should apply this mindset more when evaluating the strength of evidence for his economic ideas.)

As for what the future of societal structure looks like, Hall surprisingly manages to avoid proposing flying-car-ocracy:

""[It] may well be possible to design a better machine for social and economic control than the natural marketplace. But that will not be done by failing to understand how it works, or by adopting the simplistic, feedback-free methods of 1960s AI programs. And if ever it is done, it will be engineers, not politicians, who do it."

He goes further:

"As a futurist, I will go out on a limb and make this prediction: when someone invents a method of turning a Nicaragua into a Norway, extracting only a 1% profit from the improvement, they will become rich beyond the dreams of avarice and the world will become a much better, happier, place. Wise incorruptible robots may have something to do with it."

Risk perception and signalling

Hall's second reason for us not living up to expectations for technological progress is cultural. He starts with the idea of risk homeostasis in psychology: everyone has some tolerance for risk, and will seek to be safer when they perceive current risk to be higher, and take more risks when they perceive current risk to be lower. In developed countries, risks are of course ridiculously low compared to historical levels, so most people feel safer than ever. Some start skydiving in response, but Hall suggests there's another effect that happens when an entire society finds itself living below their risk tolerance:

"One obvious way [to increase perceived risk] is simply to start believing scare stories, from Corvairs to DDT to nuclear power to climate change. In other words, the Aquarian Eloi became phobic about everything specifically because we were actually safer, and needed something to worry about."

I know what you're thinking – what the hell are "Aquarian Eloi"? Hall likes to come up with his own terms for things, and in this case he is making a reference to H. G. Wells' The Time Machine, in which descendants of humanity live out idle and dissolute lives (modelled on England's idle rich of the time), in order to label what he claims is the modern zeitgeist. Yes, this book is weird at times.

Another cultural idea he touches on is increased virtue signalling. Using the idea of Maslow's hierarchy of needs, he explains that as more and more of the population is materially well-off, more people invest more effort into self-actualisation. Some of this is productive, but, humans being humans, a lot of this effort goes into trying to signal how virtuous you are. Of course, there's nothing inherently wrong with that, as long as your virtue signalling isn't preventing other people climbing up from lower levels of Maslow's hierarchy – or, Hall would probably add, from building those flying cars.

Environmentalism vs Greenism

A particular sub-case of cultural change that Hall has a lot to say about is the "Green religion", something he distinguishes (though sometimes with not enough care) from perfectly reasonable desires "to live in a clean, healthy environment and enjoy the natural world".

This ideological, fear-driven and generally anti-science faction within the environmentalist movement is much the same thing as what Steven Pinker calls "Greenism", which I talked about in my review of Enlightenment Now (search for "Greenism") and also features in my review of Mark Lynas' Seeds of Science (search for "torpedoes"). Unlike Lynas or even Pinker, Hall does not hold back when it comes to criticising this particular strand of environmentalism. He explains it as an outgrowth of the risk-averseness and virtue signalling trends described above. The "Green religion", he claims, is now the "default religion of western civilization, especially in academic circles", and "has developed into an apocalyptic nature cult". To explain its resistance to progress and improving the human condition, he writes:

"It seems likely that the fundamentalist Greens started with the notion that anything human was bad, and ran with the implication that anything that was good for humans was bad. In particular, anything that empowered ordinary people in their multitudes threatened the sanctity of the untouched Earth. The Green catechism seems lifted out of classic Romantic-era horror novels. Any science, any engineering, the “acquirement of knowledge,” can only lead to “destruction and infallible misery.” We must not aspire to become greater than our nature."

There are troubling tendencies in ideological Greenism (as there is with anything ideological), but I think "apocalyptic nature cult" takes it too far, and as a substitute religion for the west, it has some formidable competitors. Hall is right to point out the tension between improving human welfare and Greenist desires to limit humans, but I'd bet that the driving factor isn't direct disdain for humans, but rather the sort of sacrificial attitudes that are common in humans (consider the people who went around whipping themselves during the Black Death to try to atone for whatever God was punishing them for). Probably there's some part of human psychology or our cultural heritage that makes it easy to jump to sacrifice, disparaging ourselves (or even all of humanity), and repentance as the answer to any problem. While this a nobly selfless approach, it's just less effective than, and sometimes in opposition to, actually building things: developing new technologies, building clean power plants, and so on.

Hall also goes too far in letting the Greenists tar his view of the entire environmentalist movement. Not only is climate change a more important problem than the 1-3% estimated GDP loss for the US suggests, but you'd think that the sort of big technical innovation that is happening with clean tech would be exactly the sort of progress Hall would be rooting for.

Hall does have an environmentalist proposal, and of course it involves flying cars:

"The two leading human causes of habitat destruction are agriculture and highways—the latter not so much by the land they take up, but by fragmenting ecosystems. One would think that Greens would be particularly keen for nuclear power, the most efficient, concentrated, high-tech factory farms, and for ... flying cars. "

[Ellipsis in original]

Energy matters!

Despite being partly blinded by his excessive anti-Greenism, there is one especially important correction to some strands of environmentalist thinking that Hall makes well: cheap energy really matters and we need more of it (and energy efficiency won't save the day).

Above, I used the stagnation in energy use per capita as an example of things going wrong. This may have raised some eyebrows; isn't it good that we're not consuming more and more energy? Don't we want to reduce our energy consumption for the sake of the environment?

First, it is obviously true that we need to reduce the environmental impact of energy generation. Decoupling GDP growth from CO2 emissions is one of the great achievements of western countries over the past decades, and we need to massively accelerate this trend.

However, our goal, if we're liberal humanists, should be to give people choices and let them lead happy lives (while applying the same considerations to any sentient non-human beings, and ideally not wrecking irreplaceable ecosystems). In our universe, this means energy. Improvements in the quality of life over history are, to a large extent, improvements in the amount of energy each person has access to. This is very true:

“Poverty is ameliorated by cheap energy. Bill Gates, nowadays perhaps the world’s leading philanthropist, puts it, “If you could pick just one thing to lower the price of—to reduce poverty—by far you would pick energy.”"

Even in the United States, "[e]nergy poverty is estimated to kill roughly 28,000 people annually in the US from cold alone, a toll that falls almost entirely on the poor".

Climate change cannot be solved by reducing energy consumption, because there are six billion people in the world who have not reached western living standards and who should be brought up to them as quickly as possible. This will take energy. What we need is to simultaneously massively increase the amount of energy that humanity uses, while also switching over to clean energy. If you think only one of these is enough, you have either failed to understand the gravity of the world's poverty situation or the gravity of its environmental one.

(Energy efficiency matters, because all else being equal, it reduces operating costs. It is near-useless for solving emissions problems, however, because the more efficiently we can use energy, the more of it we will use. Hall illustrates this with a thought experiment of a farmer who uses a truck to carry one crate of tomatoes at a time from their farm to a customer, and whose only expense is fuel for the truck. Double its fuel efficiency, and it's economical to drive twice as far, and hence service four times as many customers (assuming customer number is proportional to reachable area), plus each trip is twice as long on average. The net result is that the 2x increase in efficiency leads to 8x more kilometres driven and hence 4x higher fuel consumption. The general case is called Jevons paradox.)

So yes, we need energy, most urgently in developing countries, but the more development and deployment of new energy sources there is, the cheaper they will be for everyone – consider Germany's highly successful subsidies for solar power – so developed countries have a role to play as well. (Also, are we sure there would be no human benefits to turning the plateauing in developed country energy use back into an increase?)

You'd think this is obvious. Unfortunately it isn't. In a section titled ""AAUGHH!!", Hall presents these quotes:

“The prospect of cheap fusion energy is the worst thing that could happen to the planet. —Jeremy Rifkin

Giving society cheap, abundant energy would be the equivalent of giving an idiot child a machine gun. —Paul Ehrlich

It would be little short of disastrous for us to discover a source of clean, cheap, abundant energy, because of what we might do with it. —Amory Lovins”

They are what leads Hall to say, perhaps with too much pessimism:

"Should [a powerful new form of clean energy] prove actually usable on a large scale, they would be attacked just as viciously as fracking for natural gas, which would cut CO2 emissions in half, and nuclear power, which would eliminate them entirely, have been."

It is good to give people the choice to do what they want, and therefore good to give them as much energy as possible to play with, whether they want it to power the construction of their dream city or their flying car trips to Australia (I do draw the line at Death Stars, though).

Right now we're limited by the wealth of our societies, limiting us to about 10 kW/capita in developed countries, and by the unacceptable externalities of our polluting technology. The right goal isn't to enforce limits on what people can do (except indirectly through the likes of taxes and regulation to correct externalities), but to bring about a world where these limits are higher.

If energy is expensive, people are cheap – lives and experiences are lost for want of a few watts. This is the world we have been gradually dragging ourselves out of since the industrial revolution, and progress should continue. Energy should be cheap, and people should be dear.

Don't panic; build

Where is my Flying Car? is a weird book.

First of all, I'm not sure if it has a structure. Hall will talk about flying cars, zoom off to something completely different until you think he's said all he has to say on them, and just when you least expect it: more flying cars. The same pattern of presentation repeats with other topics. Also, sections begin and sometimes end with a long selection of quotes, including no less than three from Shakespeare.

Second, the ideas. There are the hundred speculative examples of crazy (big, physical) future technologies, the many often half-baked economic/political arguments, the unstated but unmissable America-centrism, and witty rants that wander the border between insightful social critique and intellectualised versions of stereotypical boomer complaints about modern culture.

Also, the cover is this:

Above: ... a joke?

However, I think overall there's a coherent and valuable perspective here. First, Hall is against pointless pessimism. He makes this point most clearly when talking about dystopian fiction, but I think it generalises:

"Dystopia used to be a fiction of resistance; it’s become a fiction of submission, the fiction of an untrusting, lonely, and sullen twenty-first century, the fiction of fake news and infowars, the fiction of helplessness and hopelessness. It cannot imagine a better future, and it doesn’t ask anyone to bother to make one. It nurses grievances and indulges resentments; it doesn’t call for courage; it finds that cowardice suffices. Its only admonition is: Despair more."

Hall's answer to this pessimism is to point out ten billion cool tech things that we could do one day. He veers too much to the techno-optimistic side by not acknowledging any risks, but overall this is an important message. Visions of the future are often dominated by the negatives: no war, no poverty, no death. Someone needs to fill in the positives, and while Hall focuses more on the "what" of it than the "how does it help humans" part, I think a hopeful look at future technologies is a good start.

In addition to being against pessimism about human capabilities, Hall also takes, at least implicitly, a liberal stand by being against pessimism about humans. His answer to "what should we do?" is to give people choice: let them travel far and easily, let them live where they want, let them command vast amounts of energy.

Hall also identifies two ways to keep a civilisation on track in terms of making technological progress and not getting consumed by signalling and politics: growing, and having a frontier.

On the topic of growth, he makes basically the same point as my post on growth and civilisation:

"One of the really towering intellectual achievements of the 20th Century, ranking with relativity, quantum mechanics, the molecular biology of life, and computing and information theory, was understanding the origins of morality in evolutionary game theory. The details are worth many books in themselves, but the salient point for our purposes is that the evolutionary pressures to what we consider moral behavior arise only in non-zero-sum interactions. In a dynamic, growing society, people can interact cooperatively and both come out ahead. In a static no-growth society, pressures toward morality and cooperation vanish; you can only improve your situation by taking from someone else. The zero-sum society is a recipe for evil."

Secondly, the idea of a frontier: something outside your culture that your society presses against (ideally nature, but I think this would also apply to another competing society). This is needed because"[w]ithout an external challenge, we degenerate into squabbling [and] self-deceiving".

"But on the frontier, where a majority of one’s efforts are not in competition with others but directly against nature, self-deception is considerably less valuable. A culture with a substantial frontier is one with at least a countervailing force against the cancerous overgrowth of largely virtue-signalling, cost-diseased institutions."

Frontiers often relate to energy-intensive technologies:

"High-power technologies promote an active frontier, be it the oceans or outer space. Frontiers in turn suppress self-deception and virtue signalling in the major institutions of society, with its resultant cost disease. We have been caught to some extent in a self-reinforcing trap, as the lack of frontiers foster those pathologies, which limit what our society can do, including exploring frontiers. But by the same token we should also get positive feedback by going in in the opposite direction, opening new frontiers and pitting our efforts against nature."

Finally, Hall's book is a reminder that an important measure to judge a civilisation against is its capacity to do physical things. Even if the bulk of progress and value is now coming from less material things, like information technology or designing ever fairer and more effective institutions, there are important problems – covid vaccinations, solving climate change, and building infrastructure, for example – that depend heavily on our ability to actually go out and move atoms in the real world. Let's make sure we continue to get better at that, whether or not it leads to flying cars.

RELATED:

Data science 2

2021-01-22T12:15:00.003+00:00

6.4k words, including equations (about 30 minutes)

See the first post for an introduction.

Monte Carlo methods

In the late 1940s, Stanislaw Ulam was trying to work out the probability of winning in a solitaire variant. After cranking out combinatorics equations for a while, he had the idea that simulating a large number of games starting from random starting configurations with the "fast" computers that were becoming available could be a more convenient method.

At the time, Ulam was working on nuclear weapons at Los Alamos, so he had the idea of using the same principle to solve some difficult neutron diffusion problems, and went on to develop such methods further with John von Neumann (no mid-20th century maths idea is complete without von Neumann's hand somewhere on it). Since this was secret research, it needed a codename, and a colleague suggested "Monte Carlo" after the casino in Monaco. (This group of geniuses managed to break rule #1 of codenames, which is "don't reveal the basic operating principle of your secret project in its codename".)

Ulam used this work to help himself become (along with Edward Teller) the father of the hydrogen bomb. Our purposes here will be a bit more modest.

The basic idea of Monte Carlo methods is just repeated random sampling. Have a way to generate a random variable $X$ , but not to generate fancy maths stats like $P(X \in S)$ , where $S$ is some subset of the sample space? Fear not – let $f(x)$ , for values $x$ that $X$ can take, be 1 if $x \in S$ and 0 otherwise. Then $E(f(X))$ is $P(f(X) = 1) = P(X \in S)$ and we've solved the problem if we can estimate $P(f(X)=1)$ . If we can randomly sample values from $X$ (and calculate the function $f$ ), then this is easy, because we simply sample many values and calculate for what fraction of them $f(X) = 1$ .

In general,

E(f(X)) \approx \frac{1}{n} \sum_{i=1}^n f(x_i)

for large $n$ and with $x_i$ drawn independently at random from $X$ , a result that comes from the law of the unconscious statistician (discussed in part 1) once you realise that as $n$ increases the fraction of $x_i$ s in the sample approaches $P(X=x_i)$ .

We can also do integration in a Monte Carlo style. The standard way to integrate a function $f$ is to sample it at uniform points, multiply each sampled value by the distance between the uniform points, and then add everything up. There's nothing special about uniformity though – as the number of samples increases, as long as we make sure to multiply each by the distance to the next sample, the result will converge to the integral.

Above on the left, we see standard integration, with undershoot in pink and overshoot in orange, and Monte Carlo integration, with random samplings, on the right.

Sometimes a lot of the interesting stuff (e.g. expected value, area in the integral, etc.) comes from a part of the function's domain that's low-probability when values in the domain are generated via $X$ . If this happens, you either crank up the $n$ in your Monte Carlo, or then get smart about how exactly you sample (this is called importance sampling). If we're smart about this, our randomised integration can be faster than the standard method.

We will look at examples of using Monte Carlo -style random simulation to do both Bayesian and frequentist statistics below.

Confidence

In addition to providing a best-guess estimate of something (the probability a coin comes up heads, say), useful statistics should be able to tell us about how confident we should be in a particular guess – the best estimate of the probability a coin lands heads after observing 1 head in 2 throws or 50 heads in 100 throws is the same, but the second one still allows us to say more.

The question of how to quantify confidence leads into the question of what probability is.

The frequentist approach is to say that probabilities are observed relative frequencies across many trials, and if you don't have many trials to look at, then you imagine some hypothetical set of trials that an event might be seen as being drawn from.

The Bayesian approach is that probabilities quantify the state of your own knowledge, and if you don't have data to look at, you should still be able to draw a probability distribution representing your knowledge.

Bayesianism

Bayesianism is the idea that you represent uncertainty in beliefs about the world using numbers, which come from starting out with some prior distribution, and then shifting the distribution back and forth as evidence comes in. These numbers follow the axioms of probability, and so we might as well call them probabilities.

(Why should these numbers follow the axioms of probability? Because if you do otherwise and base decisions on those beliefs, you will do stupid things. As a simple example, making bets consistent with a probability model where the probabilities do not sum to 1 makes you exploitable. Let's say you're buying three options, each of which pays out 100€ if the winner of the 2036 US presidential election is EterniTrump, GPT-7, or Xi Jinping respectively, and pay 40€ for each (consistent with assigning a probability of greater than 0.4 to each event occurring). You're sure to be down 20€ that you could've spent on underground bunkers instead.)

In Bayesian statistics, you don't perform arcane statistical tests to reject hypotheses. Your subjective beliefs about something are a probability distribution (or at least they should be, if you want to reason perfectly). Once you've internalised the idea of what a probability distribution means, and know how to reason about updates to that probability distribution rather than in black-and-white terms of absolute truth or falsehood, Bayesianism is intuitive and will make your reasoning about probabilistic things (i.e., everything except pure maths) better.

(Why is Bayesianism named after Bayes? Bayes invented Bayes' theorem but not Bayesianism; however, Bayesian updating using Bayes' theorem is the core part of ideal Bayesian reasoning.)

There's one tricky part of Bayesianism, and it's a consequence of the Bayesian insistence that subjective uncertainty is represented by a probability distribution, and hence quantified. It's this: you always need to start with a quantified probability distribution (called a prior), even before you've seen any data.

There's a clear regress here, at least philosophically. Sure, you might be able to come up with a sensible prior for how effective masks are against a respiratory disease, but ask a baby for $P(\frac{P(\text{covid} | \text{mask})}{P(\text{covid}|\neg \text{mask})} = r)$ and you're not likely to get a coherent answer (and remember that your current prior should come from baby-you's prior in an unbroken series of Bayesian updates) – let alone if we're imagining some hypothetical platonic being existing beyond time and space who has never seen any data, or the World Health Organisation.

In practice, however, I don't think this is very worrying. Priors formalise the idea that you can apply background knowledge even when you don't have data for the specific case in front of you. Reject the use of priors, and you'll fall into another regress: "study suggests mask-wearing effective against the coronavirus variant in 40-60 year-old European females in green t-shirts; no information yet on 40-60 year-old European females in red t-shirts ..."

Computational Bayes

In general, the scenario we have when doing a Bayesian calculation is that there's some model $X$ that depends on parameter(s) $\theta$ , and we want to find what those parameters are given some sample $x$ from $X$ (since this is Bayesian, we have to assume that $\theta$ itself is a value of the random variable $\Theta$ describing the probabilities of each possible $\theta$ ). Now we could do this mathematically by calculating

\Pr_\Theta(\theta \, | \, X=x) = c \Pr_X(x | \Theta = \theta) \Pr_\Theta(\theta),

and then finding the constant $c$ with integration by the rule that probabilities must sum to 1. (Remember the interpretation of these terms: $\Pr_\Theta(\theta)$ is the prior distribution we assume for $\Theta$ before seeing evidence; $\Pr_\Theta(\theta \, | \, X=x)$ is the posterior likelihood distribution after seeing the data; see the previous post for some intuition on Bayes if these aren't clear to you.)

However, maybe some part of this (especially the integration) would be tricky, or you just happen to have a Jupyter notebook open on your computer. In any case, we can go about things in a different way, as long as we have a way to generate samples from our prior distribution and re-weight them appropriately.

The first thing we do is represent the prior distribution of $\Theta$ by sampling it many times. We don't need an equation for it, just some function (in the programming sense) that pulls from it.

Next, consider the impact of our data on the estimates. We can imagine each sample we took as a representation of a tiny blob of probability mass corresponding to some particular $\theta_i$ , and imagine rescaling it in the same way that we rescaled the odds of various outcomes when talking about the odds ratio form of Bayes' rule in the first post. How much do we rescale it by? By the likelihood of observing $x$ if $\Theta=\theta_i$ : this is the $\Pr_X(x|\Theta=\theta)$ term in the above equation.

Finally, we need to do the scaling. Thankfully, this doesn't take integration, since we can calculate the sum of our re-weighted likelihoods and just divide all our scaled values by that – boom, we have an (approximation of) a posterior probability distribution.

To make things concrete, let's write code and visualise a simple case: estimating the probability that a coin lands heads. The first step in Bayesian calculations is usually the trickiest: we need a prior. For simplicity, let's say our prior is that the coin has an equal chance of having every possible probability (so the real numbers 0 to 1) of coming up heads.

(The fact that the thing we're estimating is itself a probability doesn't matter; don't be confused by the fact that we have two sorts of probability – our knowledge about the coin's probability of coming up heads, represented as a probability distribution, and the probability that the coin comes up heads (an empirical fact you can measure by throwing it many times). Equally well we might have talked about some non-probabilistic feature of the coin, like its diameter, but that would be a lot more boring.)

To write this out in actual Python, the first step (after importing NumPy for vectorised calculation and Matplotlib for the graphing we'll do later) is some way to generate samples from this distribution:

import numpy as np
import matplotlib.pyplot as plt

def prior_sample(n):
    return np.random.uniform(size=n)

(np.random.uniform(size=n) returns n samples from a uniform distribution over the range 0 to 1.)

To calculate the posterior:

def posterior(sample, throws, heads):
    """ This function calculates an approximation of the
        posterior distribution after seeing the coin
        thrown a certain number of times;
        sample is a sample of our prior distribution,
        throws is how many times we've thrown the coin,
        heads is how many times it has come up heads."""
    # The number of times the coin lands heads follows a binomial distribution.
    # Thus, below we reweight using a binomial pdf:
    # (note that we drop the throws-Choose-heads term because it's a constant
    # and we rescale at the end anyways)

    weighted_sample = sample ** heads * (1 - sample) ** (throws - heads)

    # Divide by the sum of every element in the weighted sample to normalise:

    return weighted_sample / np.sum(weighted_sample)

(Remember that the calculation of weighted_sample is done on every term in the sample array separately, in the standard vectorised way.)

Now we can generate a sample to model the prior distribution, and plot it as a histogram:

N = 100000
throws = 100
heads = 20

sample = prior_sample(N) # model the prior distribution

# Plot a histogram:
plt.hist(sample,
				 # split the range 0-1 into 50 bins for the histogram:
				 np.linspace(0, 1, 50), 
				 # weight each item by the likelihood:
         weights=posterior(sample, throws, heads))

The result will look something like this:

This is an approximation of the posterior probability distribution after seeing 100 throws and 20 heads. We see that most of the probability mass is clustered around a probability of 0.2 of landing heads; the chance of it being a fair coin is negligible.

What if we had a different prior? Let's say we're reasonably sure it's roughly a standard coin, and model our prior for the probability of landing heads as a normal distribution with mean 0.5 and standard deviation 0.1. To visualise this prior, here's a histogram of a 100k samples from it:

The posterior distribution looks almost identical to our previous posterior:

There's simply so much data (a hundred throws) that even very different priors will have converged on what the data indicates.

A normal distribution might not be a very good model, though. Say we think there's a 49.5% chance the coin is fair, a 49.5% chance it's been rigged to come up tails with a probability arbitrarily close to 1, and the remaining 1% is spread uniformly between 0 and 1 (be very careful about assigning zero probability to something!). Then our prior distribution might be coded like this:

def prior_sample_3(n):
    m = n // 100
    return np.concatenate((np.random.uniform(size=m),
                           np.zeros((n - m) // 2),
                           np.ones(n - (n - m) // 2) // 2),
                           axis=0)

and 100k samples might be distributed like this:

Let's also say we have less data than before – the coin has come heads 8 times out of 40, say. Now our posterior distribution looks like this:

We've ruled out that the coin is rigged (a single heads was enough to nuke the likelihood of a completely rigged coin to zero – be very careful about assigning a probability of zero to something!), and most of the probability mass has shifted to a probability of landing heads of around 20%, as before, but because our prior was different, a noticeable chunk of our expectation is still that the coin is exactly fair.

As a final example, here's a big flowchart showing how the probability you should assign to different odds of the coin coming up heads shifts as you get data (red = tails, green = heads) up to 5 coin throws, assuming a prior that's the uniform distribution:

Two questions to think about, one simple and on-topic, the other open-ended and off-topic:

What is the simple function giving, within a constant, the posterior distribution after $n$ heads and 0 tails? What about for $n$ tails and 0 heads?
Doesn't the coin-throwing diagram look like Pascal's triangle? What's the connection between normal distributions, Pascal's triangle, and the central limit theorem (i.e., that the sum of enough of many of any random variable is distributed roughly normally?)? What extensions of Pascal's triangle can you think of, possibly with probabilistic interpretations?

Frequentism

Frequentists try to banish the subjectivity out of probability. The probability of event $E$ is not a statement about subjective belief, but an empirical fact: given $n$ trials, what is the fraction of times that $E$ comes up, in the limit as $n \rightarrow \infty$ ? And ditch the Bayesian idea of doing nothing but shifting around the probability mass we assign to different beliefs; once you've done a statistical test, you either reject or fail to reject the null hypothesis.

A standard frequentist tool is hypothesis testing with a $p$ -value. The procedure looks like this:

Pick a null hypothesis (usually denoted $H_0$ ). (For example, $H_0$ could be that a coin is fair; that is, that the probability $h$ of it coming up heads is 0.5.)
Pick a test statistic: a function $t$ from the dataset $x$ to a number. (For example, the maximum likelihood estimator for $h$ , using the fact that we expect the number of heads to follow a binomial distribution with parameters for the number of throws and the probability $h$ .)
Figure out a model for, or a way to sample from, the distribution of possible datasets given that $H_0$ is true. (For example, we might write code to generate synthetic datasets $X^*$ of the same size as $x$ based on $h=0.5$ .)
Find the probability of the test statistic $t$ returning a result that is as extreme or more extreme than $t(x)$ . We might do this using fancy maths that gives us cumulative distribution functions based on the model from the previous step, or by having our code generate many synthetic datasets $X^*$ , calculate $t(X^*)$ for each of them, and seeing how $t(x)$ compares – what percentile of extremeness is it in? The answer is called the $p$ -value.

(What is "more extreme"? That depends on our null hypothesis. If both low and high values of $t(x)$ are evidence against $H_0$ – as in our example – then we use a two-tailed test; if $t(x)$ is in the 90% percentile of the $t(X^*)$ distribution, both $t(x)$ in the top and bottom 10% are at least as extreme as the value we got, and $p=0.2$ . If only low or high values are evidence against $H_0$ , then we use a one-tailed test. Say only high values are evidence against $H_0$ and $t(x)$ is in the 90% percentile; then $p=0.1$ .)

Here's some example code to calculate a $p$ -value, using random simulation:

# Import NumPy and graphing library:
import numpy as np
import matplotlib.pyplot as plt

# Define our null hypothesis:
h0_h = 0.5 # the value of h under the null hypothesis

# Define the data we've gotten:
throws = 50
heads = 20
# Generate an array for it:
data = np.concatenate((np.zeros(throws - heads), np.ones(heads)), axis = 0)

def t(x): # test statistic function
    return np.mean(x)
    # ^ this is the MLE for the binomial distribution

def synth_x(n, p):
    # Create a synthetic dataset of some size n, assuming some p
    return np.random.binomial(1, p, size=n)

# Take a lot of samples from the distribution of t(X*)
# (where X* is a synthetic dataset):
t_sample = np.array([t(synth_x(throws, h0_h)) for _ in range(100000)])

# Calculate the p-value, using a two-tailed test:
p1 = np.mean(t_sample >= t(data))
p2 = np.mean(t_sample <= t(data))
p = 2 * min(p1, p2)

# Display p-value
print(f"p-value is {p}") # about 0.20 in this case

# Plot a histogram:
plt.hist(t_sample, bins=50, range=[0,1])
plt.axvline(x=t(data), color='black') # draw a line to show where t(data) falls

The main tricky part in the code is the calculation of the $p$ -value. A neat way to do is the following: observe that a two-tailed $p$ -value is either twice the percent of (synthetic) data with a test statistic lower than $t(x)$ (in the case that the observation ended up on the lower side of the distribution of synthetic datasets), or twice the percent of (synthetic) data with a higher test statistic.

Now, what exactly is a $p$ -value? It's tempting to think of the $p$ -value as the probability that the null hypothesis is correct: that is, that $p=0.05$ means there's only a 5% chance the null hypothesis is true. However, what a $p$ -value actually tells you is this: assuming that your null hypothesis is true (and you can correctly model the distribution of data you'd get if it is), what is the probability of getting a result at least as extreme as your data? In maths:

p\text{-value} \ne P(H_0 \text{ is correct}), (!!)

but instead

p\text{-value} = P(t(x) \geq t(X^*)),

for a right-tailed test (flip the $\geq$ for a left-tailed test), where $X^*$ is assumed drawn from the distribution resulting from assuming the null hypothesis $H_0$ , or

P(|t'(x)| \geq|t'(X^*)|),

for a two-tailed test, where $t'$ is the test statistic function, but shifted so that the median $H_0$ value is 0, so that we can just take absolute value to get an extremeness measure (for example, in the code above we'd subtract a 0.5 from the current definition of t(x), since this is the median for the null hypothesis that the probability of heads is one-half).

Probability bounds

Sometimes it's useful to be able to quickly estimate a bound on some probability or expectation. Here are some examples, with quick proofs.

Markov's inequality

For $x > 0$ if $X$ takes positive numerical values,

P(X \geq a) \leq \frac{E(X)}{a}.

Why?

Short proof: Given $X \geq 0$ , $X \geq 1_{X \geq a} \cdot a$ (can be seen by considering cases $X < a$ , $X=a$ , and $X > a$ ), so, rearranging, $1_{X \geq a} \leq X/ a$ . Taking the expectation on both sides we get $E(1_{X \geq a}) \leq E(X) / a$ , and $E(1_{X \geq a}) = P(X \geq a)$ . $\square$

Intuitive proof: let's say you want to draw a probability density function to maximise $P(X \geq a)$ , given some value of the expectation of $E(X)$ (and given that $X$ only takes positive values). Any probability density assigned to values greater than $a$ is more expensive in terms of expectation increase than assigning value exactly at $a$ , and has an identical effect on $P(X \geq a)$ . So to maximise $P(X \geq a)$ , assign as much probability density as you can to $a$ , and none to values greater than $a$ . Given the restriction that $X$ can only take positive values, the lowest value you can assign any probability to (to balance out the expectation if $a > E(X)$ ) is 0. If we allocate $p_1$ to $X=0$ and $p_2$ to $X=a$ , then to match the expectation $E(X)$ we must have

p_1 \cdot 0 + p_2 \cdot a = E(X),

p_2 = P(X\geq a) = \frac{E(X)}{a}

in the maximal scenario; any other pdf we draw must have $P(X \geq a)$ smaller.

The above equation can also be interpreted as saying that the fraction of values greater than $k=a/E(X)$ times the average in a dataset of positive values can be at most $1/k$ (i.e. $E(X)/a$ ). For example, at most half of people can have twice the average income.

Chebyshev's inequality

(An extension of Markov's inequality.)

Let $X$ be a random variable with variance $\sigma^2$ and expected value $\mu$ . Then

P(|X-\mu| \geq x) \leq \frac{\sigma^2}{x^2},

since if $Y = (X-\mu)^2$ then, by Markov's inequality,

P(Y \geq x^2) \leq \frac{\mathbb{E}(Y)}{x^2} = \frac{\sigma^2}{x^2},

by the definition of variance as $\mathbb{E}((X - \mu)^2)$ . Finally, taking the square root inside the probability expression, $P(Y \geq x^2)=P(|X-\mu| \geq x)$ . $\square$

Jensen's inequality

Consider a concave function $f$ and the values $E(f(X))$ and $f(E(X))$ , where $X$ is (once again) a random variable.

Since $f$ is concave, if we plot $y=f(x)$ and the tangent line to $f$ at some $x_0$ , the tangent is an upper bound on $f(x)$ for all $x$ .

Let $E(X) = \mu$ , and let the tangent line to $y=f(x)$ at $x=\mu$ be $y=mx+b$ . We have that $f(X) \leq mx+b$ for all $x$ . Taking the expectation on both sides,

E(f(X)) \leq m \mu + b.

What is $m\mu +b$ ? It's the value of the tangent when it touches $f(x)$ at $x=\mu$ , and therefore it is also the value of $f$ at $\mu$ . Thus we can say

E(f(X)) \leq f(E(X)). \square

Probability systems

Causal diagrams

The Perseverance rover is due to land on Mars on February 18th, 2021, carrying a small helicopter called Ingenuity, which will likely become the first aircraft to make a powered flight on a planet that's not Earth.

Imagine that Perseverance is currently known to be in a position $X$ (where $X$ is some random variable, as is any capital letter). Ingenuity has completed its first flight, starting from the location of Perseverance (which we know to a high degree of accuracy), but because of a Martian sandstorm we only have inaccurate readings of Ingenuity's current location and need to locate it quickly to know if it's in a place where it's going to run out of power due to dust blocking its solar panels unless we do a risky manoeuvre with its propellers. Specifically, we have two in-flight readouts of its position, $R_1$ and $R_2$ , which are known to be its actual true position $Y_1$ and $Y_2$ at those times plus some random error modelled as a $\text{Normal}(0,\sigma_1^2)$ distribution, and also similarly we have a more accurate readout $R_f$ of its final position $Y_f$ , this time with the error following $\text{Normal}(0, \sigma_2^2)$ . We also model $Y_1$ as being generated from $X$ with a parameter $h_1$ representing its starting heading and velocity (e.g. $h_1$ is a vector and the model could be $Y_1 = X + h_1 + \epsilon$ , where $\epsilon$ is another normally distributed error term), and likewise we have parameters $h_2$ and $h_f$ that influence how $Y_2$ and $Y_f$ are generated from the preceding positions. We know that it's initial battery level was $b_0$ , and the battery level when it was at each of $Y_1$ , $Y_2$ , and $Y_f$ is $B_1$ , $B_2$ , and $B_f$ , where each of those is generated from the previous and the heading/velocity parameters $h_1$ , $h_2$ , and $h_f$ (e.g. $B_2 = B_1 - (1 + \epsilon) |h_1|$ – the amount of power lost is a normal error term plus a constant times the velocity). We need to find the probability that the next battery level $B_n$ , a random variable generated from $B_f$ (the previous level) and depending on $Y_f$ (since storm intensity varies with position; say we have a function $s$ that takes in positions and returns how much the dust will decrease power output and hence batter level at a particular position, then we might have $B_n = B_f - s(Y_f)$ ), is below a critical threshold $c$ , given the starting $X$ , and the position readings $R_1$ , $R_2$ , and $R_f$ . Also the administrator of NASA is breathing down your neck because this is a 2 billion dollar mission, so better work fast and not make mistakes.

This problem seems almost intractably complicated. A handy way of making complex probability questions less unapproachable is to draw out a causal diagram: what are the key parameters, and which random variables are generated from which other ones? Here's an example for the above problem:

Arrows indicate random variables being generated from others; dotted lines note important parameters (note that some parameters are missing – those of $X$ , for example). The probability we were asked about is $P(B_n < c | X = x, R_1 = r_1, R_2 = r_2, R_f = r_f)$ ; it doesn't look so complicated when you have the causal relations visualised in front of you.

The rest of the solution is left as an exercise for the reader. Please be in touch with NASA in late February to get the values $x$ , $r_1$ , $r_2$ , and $r_f$ .

Markov chains

A Markov chain has the following causal diagram:

In words: the $n$ th state of a Markov chain is generated from the $(n-1)$ th state.

This might seem very restrictive. For example, the simplest text-generation Markov chain would just generate, say, one character based on the previous one, probably based on data for how often a letter follows another. It might tend to do some moderately reasonable things, like following "t" by "h" fairly often (assuming it was trained on English), but good luck getting anything too sensible out of it.

However, we can do a trick: generate letter $n$ from the previous $k$ letters. This seems like it's not a Markov chain; letter $X_n$ depends on $X_{n-k}$ through $X_{n-1}$ . But we can define $Y_0=(X_0, X_1, ..., X_{k-1})$ , $Y_1 = (X_1, X_2, ..., X_k)$ , and so on, and now $Y_n$ can be generated entirely from $Y_{n-1}$ , and so the $Y$ s form a Markov chain.

So one one hand, we can do these sorts of tricks to use Markov chains even when it seems like the problem is too complex for them. But perhaps even more importantly, if you reduce something to a Markov chain, you can immediately apply a lot of nice mathematical results.

A Markov chain can be visualised with a state diagram. Here's one for a Markov chain representing traffic light transitions:

The same information can be described with a transition matrix, showing the probability of each transition happening:

Note that this is a very boring Markov chain, because it's not probabilistic – every link has a probability mass of 1. This is not very interesting. Thankfully, our traffic light engineer is willing to add some randomness for the sake of making the system more mathematically interesting. For example, they might change the system to look like this (showing both the state diagram and transition matrix):

Now there's a 10% chance that the yellow light before red is skipped, and a 40% chance that red-yellow moves back to red instead of going green.

The key property with Markov chain calculations is memorylessness: $X_n$ depends only on $X_{n-1}$ . If you can use this property, you can work out a lot of Markov chain problems. For example, let's say that $X_0 = \text{R}$ (we'll use $\text{R, RY, G, Y}$ to denote the states), and we want to find the probability that you'll actually get to drive in two state transitions from now – that is, $\mathbb{P}(X_2 = \text{G} \, | \, X_0 = \text{R})$ (I use $\mathbb{P}$ here to differentiate a probability expression from the transition matrix $P$ ). Doing some straightforward algebra, you can figure out that this probability is $P_{\text{R},\text{RY}} \cdot P_{\text{RY},\text{G}}$ (where $P_{a,b}$ is the spot in the matrix with row label (i.e. start state) $a$ and column label (i.e. end state) $b$ ).

(Note that each row of the transition matrix is a probability distribution for the next state, starting from the state the row is labelled with. Writing it as a matrix is a trick for expressing the probability distribution from each state in the same mathematical object.)

More generally: for any transition matrix, $P_{a,b}$ is $\mathbb{P}(X_n = b \, | X_{n-1} = a)$ . Now consider point $a,b$ of $P^2$ : by matrix multiplication, it is

\sum_i P_{a,i}P_{i,b},

but by the definition of the transition matrix, this is the same as

\sum_i \mathbb{P}(X_{1} = i \,|\, X_{0} = a) \mathbb{P}(X_{2} = b \,|\, X_{1} = i),

which is just summing up the probabilities of all paths through the state space that start at $a$ , go to some $i$ , and then end up at $b$ ; in other words, it is the probability that if you're at $a$ , you end up at $b$ after two state transitions.

You should be able to see that this extends more generally:

\mathbb{P}(X_n = b \,|\,X_0 = a) = P^n_{a,b}.

Linear algebra comes to the rescue yet again; we've reduced the problem of finding the probability of going between any two states in a Markov chain's state space in $n$ steps into the problem of multiplying a matrix $n$ times with itself and looking up one item in it.

Finding the stationary distribution

Given a starting state in a Markov chain, we can't say for sure what state it will be after $n$ transitions (unless it's entirely deterministic, like our initial boring traffic light model), but we can calculate exactly what the probability distribution over the states will be. This is usually denoted as a vector $\pi$ , with $\pi_a$ being the probability we're in state $a$ .

Here's something we might want to know: what is the stationary distribution; that is, how can we allocate probability mass amongst the different states in such a way that the total amount of probability mass in each state remains constant after a state transition?

Here's something you might ask: why is it interesting to know this? Perhaps most importantly, the stationary distribution of a Markov chain is the long-run average of time spent in each state (exercise: prove that this is the case); if you want to know how much time our probabilistic traffic lights will spend being green over a long period of time, you need to find the stationary distribution.

Now given our distribution $\pi$ (note: it's a row vector, not a column vector) and transition matrix $P$ , we can express the stationary distribution as the $\pi$ that satisfies two conditions. First,

\pi = \pi P.

This is the condition that $\pi$ must remain unchanged when transformed by our transition matrix $P$ during a state transition. You might have expected the transformation to be written $P \pi$ ; usually we'd express a matrix transforming a vector in this order. However, because of the way we've defined $P$ – start states on the vertical axis, end states on the horizontal – we need to do it this way. Here's a visualisation, with the result vector in red:

(Alternatively, we could take $\pi$ as a column vector, flip the meanings of the rows and columns in $P$ , and write $P\pi$ – equivalent to transposing both of the current definitions of $\pi$ and $P$ .)

The second condition (can you see why it's necessary?), where $\pmb{1}$ is a vector $(1,1,...,1,1)$ of the required length, is

\pi \cdot \pmb{1} = 1.

We can also write this as matrix multiplication, as long as we're clear about column and row vectors and transposing things as required. We can also be clever and write a single matrix that expresses both of these constraints, and then getting NumPy's linear algebra libraries to give us the answer becomes a single line of code.

(The second constraint is just the condition that any probability distribution sums to 1.)

Uniqueness of the stationary distribution

Now for another question: when does a unique stationary distribution exist? You should be able to think of a state diagram for which there are an infinite number of stationary distributions.

For example:

The states $C$ , $B$ , and $D$ (in the dotted red circle) and $E$ , $F$ , $G$ , and $H$ (in the dotted blue circle) are "independent", in the sense that you can never get from one set of states to the other. Imagine that for the state set $\{C, B, D\}$ , we have a stationary distribution over only those states $\pmb{\pi}$ , and another stationary distribution $\pmb{\rho}$ over $\{E,F,G,H\}$ . (Let each of these vectors have a slot for every state, but let it be zero for states outside the corresponding state set – $\pmb{\pi} = (0, \pi_b, \pi_c, \pi_d, 0, 0, 0, 0)$ , for example.) Now, because there can be no probability mass flow between these two sets, we can see that any distribution $\pmb{\sigma} = a \pmb{\pi} + b \pmb{\rho}$ is also a stationary distribution, provided that $a$ and $b$ are chosen such that $\pmb{\sigma} \cdot \pmb{1} = 1$ (probability distributions sum to one!).

It turns out that for any state set where each state is theoretically reachable from all the others – i.e., if we represent the state diagram as a directed graph, the graph is connected – there does exist a unique stationary distribution.

Detailed balance

Sometimes it doesn't take matrix calculations to find a stationary distribution. In the general case, the condition is that the probability mass flow into a state, from all other states, must equal the outflow to all other states. The simplest case this can happen is when, for any pair of states $a$ and $b$ , $a$ sends as much probability mass to $b$ upon a state transition as $b$ sends to $a$ . If we can ensure that this is true "locally" for each pair of states, then we don't have to do complex "global" optimisation over all states.

This condition is known as detailed balance. Mathematically, letting $\pi$ be a distribution of probability mass over states and $P$ be the transition matrix, we can express it as

\pi_a P_{ab} = \pi_b P_{ba}, \text{ for all states } a \text{ and } b,

something that should be clear if you remember the interpretation of the transition matrix element $P_{ab}$ as the probability of an $a \rightarrow b$ transition.

A final fun question: say we have an undirected graph and we consider a random walk over it (i.e., if we're at a given vertex, we take any edge going from it with equal probability). What is the stationary distribution over the states (i.e. the vertices of the graph)?

Data science 1

2020-12-31T14:43:00.019+00:00

8.3k words, including equations (about 40 minutes)

This is an overview of fundamental ideas in data science, mostly based on Damon Wischik's excellent data science course at Cambridge (if using these notes for revision for that course, be aware that I don't cover all examinable things and cover some things that aren't examinable; the criteria for inclusion is interestingness, not examinability).

The basic question is this: we're given data; what can we say about the world based on it?

These notes are split into two parts due to length. In part 1:

Notation
A few results in probability, including a look at Bayes theorem leading up to an understanding of the continuous form.
Model-fitting
- Maximum likelihood estimation
- Supervised & unsupervised learning
- Linear models (fitting them and interpreting them)
- Empirical distributions (with a note on KL divergence)

In part 2:

Monte Carlo methods
A few theorems that let you bound probabilities or expectations.
Bayesianism & frequentism
Probability systems (specifically basic results about Markov chains).

Probability basics

The kind of background you want to have to understand this material:

The basic maths of probability: reasoning about sample spaces, probabilities summing to one, understanding and working with random variables, etc.
The ideas of expected value and variance.
Some idea of the most common probability distributions:
- normal/Gaussian,
- binomial,
- poisson,
- geometric,
- etc.
What continuous and discrete distributions are.
Understanding probability density/mass functions, and cumulative distribution functions.

Notation

First, a few minor points:

It's easy to interpret $Y = f(X)$ , where $X$ and $Y$ are random variables, to mean "generate a value of $X$ , then apply $f$ to it, and this is $Y$ ". But $Y=f(X)$ is maths, not code; we're stating something is true, not saying how the values are generated. If $f$ is an invertible function, then $Y=f(X)$ and $X=f^{-1}(Y)$ are both equally good and equally true mathematical statements, and neither of them tell you what causes what.
Indicator functions are a useful trick when bounds are unknown; for example, write $1_{x \geq y}$ (or $1[x\geq y]$ ) to denote 1 if $x \geq y$ and 0 in all other cases.
- They also let you express logical AND as multiplication: $1_{f(x)} \cdot 1_{g(x)}$ , where $f$ and $g$ are boolean functions, is the same as $1_{f(x) \wedge g(x)}$ .

Likelihood notation

Discrete and continuous random variables are fundamentally different. In the discrete case, you deal with probability mass functions where there's a probability attached to each event; with the continuous case, you only get a probability density function that doesn't mean anything real and needs to be integrated to give you a probability. Many results apply to both discrete and continuous random variables though, and we might switch between continuous and discrete models in the same problem, so it's cumbersome to have to deal with the separate notation and semantics of them.

Enter likelihood notation: write $\Pr_X(x)$ to mean $P(X=x)$ if the distribution is discrete and $f(x)$ if the distribution of $X$ is continuous with probability density function $f$ .

Python & NumPy

Python is a good choice for writing code, for various reasons:

easy to read;
found almost everywhere;
easy to install if it isn't already installed;
not Java;

but particularly because it has excellent science/maths libraries:

NumPy for vectorised calculations, maths, and stats;
SciPy for, uh, science;
Matplotlib for graphing;
Pandas for data.

NumPy is a must-have.

To use it, the big thing to understand is the idea of vectorised calculations. Otherwise, you'll see code like this:

xs = numpy.array([1, 2, 3])
ys = x ** 2 + x

and wonder how we're adding and squaring arrays (we're not; the operations are implicitly applied to each element separately – and all of this runs in C so it's much faster than doing it natively in Python).

Computation vs maths

Today we have computers. Statistics was invented before computers, though, and this affected the field; work was directed to all the areas and problems where progress could be made without much computation. The result is an excellent theoretical mathematical underpinning, but modern statistics can benefit a lot from a computational approach – running simulations to get estimates and so on. For the simple problems there's an (imprecise) computational method and a (precise) mathematical method; for complex problems you either spend all day doing integrals (provided they're solvable at all) or switch to a computer.

In this post, I will focus on the maths, because the maths concepts are more interesting than the intricacies of NumPy, and because if you understand them (and programming, especially in a vectorised style), the programming bit isn't hard.

Some probability results

The law of total probability

Here's something intuitive: if we have a sample space (e.g. outcomes of a die roll) and we partition it into non-overlapping events $E_1$ to $E_N$ that cover every possible outcome (e.g. showing the numbers 1, 2, ..., 6, and losing the dice under the carpet), and we have some other event $A$ (e.g. a player gets mad), then

P(A) = \sum_{n=1}^{N} P(A | E_n)P(E_n);

if we know the probability of $A$ given each event $E_n$ , we can find the total probability of $A$ by summing up the probabilities of each $E_n$ , weighted by the conditional probability that $A$ also happens. Visually, where the height of the red bars represents each $P(A|E_n)$ , and the area of each segment represents the different $P(E_n)$ s, we see that the total red area corresponds to the sum above:

You say this diagram is "messy and unprofessional"; I say it has an "informal aesthetic".

This is called the law of total probability; a fancy name to pull out when you want to use this idea.

The law of the unconscious statistician

Another useful law doesn't even sound like a law at first, which is why it's called the law of the unconscious statistician.

Remember that the expected value, in case of a discrete distribution for the random variable $X$ , is

E(X)=\sum_i x_iP(X=x_i).

Now say we're not interested in the value of $X$ itself, but rather some function $f$ of it. What is the expected value of $f(X)$ ? Well, the values $x_i$ are the possible values of $X$ , so let's just replace the $x_i$ above with $f(x_i)$ :

E(f(X)) = \sum_i f(x_i) P(X=x_i)

... and we're done – but for the wrong reasons. This result is actually more subtle than this; to prove it, consider a random variable $Y$ for which $Y=f(X)$ . By the definition of expected value,

E(Y)=\sum_i y_i P(Y=y_i).

Uh oh – suddenly the connection between the obvious result and what expected value is doesn't seem so obvious. The problem is that the mapping between the $y_i$ and $x_i$ could be anything – many $x_i$ , thrown into the blackbox $f$ , might produce the same $y_i$ – and we have to untangle this while keeping track of all the corresponding probabilities.

For a start, we might notice values $x_i$ of $X$ . So we might write

E(Y)=\sum_i \Big( y_i \sum_{j \,|\, f(x_j)=y_i} P(X=x_j) \Big),

to sum over each possible value of $f(X)$ , and then within that, also loop over the possible values of $X$ that might have generated that $f(X)$ . We've managed to switch a term involving the probability that $Y$ takes some values to one about $X$ taking a specific value – progress!

Next, we realise that $y_i$ is the same for everything in the inner sum; $y_i = f(x_1) = f(x_2) = ... = f(x_j)$ . So we don't change anything if we write

E(Y)=\sum_i \Big( \sum_{j \,|\, f(x_j)=y_i} f(x_j) P(X=x_j) \Big)

instead. Now we just have to see that the above is equivalent to iterating once over all the $j$ s.

A diagram:

The yellow area is the expected value of $f(x) = Y$ . By the definition of expected value, we can sum up the areas of the yellow rectangles to get $E(f(X))$ . What we've now done is "reduced" this to a process like this: pick $y_1$ , looking at the $x_i$ that map to it with $f$ ( $x_1$ and $x_2$ in this case), and find these probabilities and multiply them by $f(x_1)=f(x_2)=y_1$ . So we add up the rectangles in the slots marked by the dotted lines, and we do it with this weird double-iteration of looking first at $y_i$ s and then at $x_i$ s.

But once we've put it this way, it's simple to see we get the same result if we iterate over the $x_i$ s, get the corresponding rectangle slice for each, and add it all up. This corresponds to the formula we had above (summing $f(x_i) P(X=x_i)$ over all possible $i$ ).

Bayes' theorem (odds ratio and continuous form)

Above is a Venn diagram of a sample space (the box), with the probabilities of event $B$ and event $R$ marked by blue and red areas respectively (the hatched area represents that both happen).

By the definition of conditional probability,

P(R|B)=\frac{P(B \cap R)}{P(B)}, \text{ and} \\ P(B|R)=\frac{P(B \cap R)}{P(R)}.

Bayes theorem is about answering questions like "if we know how likely we are to be in the red area given that we're in the blue area, how likely are we to be in the blue area if we're in the red?" (Or: "if we know how likely we are to have symptoms if we have covid, how likely are we to have covid if we have symptoms?").

Solving both of the above equations for $P(B \cap R)$ and equating them gives

P(R|B) P(B) = P(B|R) P(R),

which is the answer – just divide out by either $P(B)$ or $P(R)$ to get, for example,

P(B|R) = \frac{P(R|B)P(B)}{P(R)}.

Let's say the red area $$R$$ represents having symptoms. Let's say we split the blue area $B$ into $B_1$ and $B_2$ – two different variants of covid, say. Now instead of talking about probabilities, let's talk about odds: let's say the odds ratios that a random person has no covid, has variant 1, and has variant 2 are 40:2:1, and that symptoms are, compared to the no-covid population, ten times as likely in variant 1 and twenty times as likely in variant 2 (in symbols: $P(R| \neg B_1 \cap \neg B_2)/40 = P(R|B_1) / 2 = P(R|B_2)$ ). Now we learn that we have symptoms and want to calculate posterior probabilities, to use Bayes-speak.

To apply Bayes' rule, you could crank out the formula exactly as above: convert odds to probabilities, divide out by the total probability of no covid or having variant 1 or 2, and then get revised probabilities for your odds of having no covid or a variant. This is equivalent to keeping track of the absolute sizes of the intersections in the diagram below:

But this is unnecessary. When we learned we had symptoms, we've already zoomed in to the red blob; that is our sample space now, so blob size compared to the original sample space no longer interests us.

So let's take our odds ratios directly, and only focus on relative probabilities. Let's imagine each scenario fighting over a set amount of probability space, with the starting allocations determined by prior odds ratios:

Now Bayes rule says to multiply each prior probability $P(B_i)$ by $P(R|B_i)$ . To adjust our prior odds ratio 40:2:1 by the ratios 1:10:20 telling us how many times more likely we are to see $R$ (symptoms) given no covid or $B_1$ or $B_2$ , just multiply term-by-term to get 40:20:20, or 2:1:1. You can imagine each outcome fighting it out with their newly-adjusted relative strengths, giving a new distribution of the sample space:

Now if we want to get absolute probabilities again, we just have to scale things right so that they add up to 1. This tiny bit of cleanup at the end (if we want to convert to probabilities again) is the only downside of working with odds ratios.

This gives us an idea about how to use Bayes when the sample space is continuous rather than discrete. For example, let's say the sample space is between 0 and 100, representing the blood oxygenation level $$X$$ of a coronavirus patient. We can imagine an approximation where we write an odds ratio that includes every integer from 0 to 100, and then refine that until, in the limit, we've assigned odds to every real number between 0 and 100. Of course, at this point the odds ratio interpretation starts looking a bit weird, but we can switch to another one: what we have is a probability distribution, if only we scale it so that the entire thing integrates to one.

The same logic applies as before, even though everything is now continuous. Let's say we want to calculate a conditional probability like the probability of $$X$$ (the random variable for the patient's blood oxygenation) taking the value $$x$$. At first we have no information, so our best guess is the prior across all patients, $$\Pr_X(x)$$. Say we now get some piece of evidence, like the patient's age, and know the likelihood ratios of the patient being that age given each blood oxygenation level. To get our updated belief distribution, we can just go through and multiply the prior likelihoods of each blood oxygenation level by the ratios given the new piece of evidence.

Above, the red line is the initial distribution of blood oxygenation $x$ across all patients. The yellow line represents the relative likelihoods of the patient's actual known age $a$ given a particular $x$ . The green line at any particular $$x$$ is the product of the yellow and red function at that same $$x$$, and it's our relative posterior. To interpret it as a probability distribution, we have to scale it vertically so that it integrates to 1 (that's why we have a proportionality sign rather than an equals sign).

Now let's say more evidence comes in: the patient is unconscious (which we'll denote $U=\text{"yes"}$ ). We can repeat the same process of multiplying out relative likelihoods and the prior, this time with the prior being the result in the previous step:

We can see that in this case the blue line varies a lot more depending on $x$ , and hence our distribution for $x$ (the purple line) changes more compared to our prior (the green line). Now let's say we have a very good piece of evidence: the result $m$ of a blood oxygenation meter $M$ .

There's some error on the oxygenation measurement, so our final belief (that $x$ is distributed according to the black line) is very clearly a distribution of values rather than a single value, but it's clustered around a single point.

So to think through Bayes in practice, the lesson is this: throw out the denominator in the law. It's a constant anyways; if you really need it you can go through some integration at the end to find it. But it's not the central point of Bayes' theorem. Remember instead: prior times likelihood ratio gives posterior.

Fitting models

A probability model tries to tell you how likely things are. Fitting a probability model to data is about finding one that is useful for given data.

Above, we have two axes representing whatever, and the intensity of the red shading is the probability attributed to a particular pair of values.

The model on the left is simply bad. The one in the middle is also bad, though; it assigns no probability to many of the data points that were actually seen.

Choosing which distribution to fit – or whether to do something else entirely – is sometimes obvious, sometimes not. Complexity is rarely good.

Maximum likelihood estimation (MLE)

Let's say we do have a good idea of what the distribution is; the weight of stray cats in a city depends on a lot of small factors pushing both ways (when it last caught a mouse, the temperature over the past week, whether it was loved by its mother, etc.), so we should expect a normal distribution. Well, probably.

Let's say we have a dataset of cat weights, labelled $x_1$ to $x_n$ because we're serious maths people. How do we fit a distribution?

Step 1 is Wikipedia. Wikipedia tells us that a normal distribution has two parameters, $\mu$ (the mean) and $\sigma$ (the standard deviation), and that the likelihood (not probability! see above) a normal distribution $X$ with those parameters takes a value $x$ is

\Pr_X(x)= \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2}\big( \frac{x-\mu}{\sigma} \big)^2}.

Oh dear.

After a moment's thought, we can interpret it more clearly:

\Pr_X(x) = \frac{\text{blah}}{\sigma \text{ blah}} \text{blah}^{\text{-blah} {\big(\frac{x-\mu}{\sigma}\big)^2}}.

So it's just an exponential that decays in both directions from $\mu$ , and that's squeezed by $\sigma$ .

(Why are there constants then? Because it's a probability distribution, and must therefore integrate to 1 over its entire range or else all hell will break loose.)

Step 2 is philosophising. What does it really mean to get the best fit of a distribution?

The first thing we can notice is that there are only two dials we can adjust: the values of $\mu$ and $\sigma$ . For this particular problem at least, we've reduced the massive problem of picking the best model to one of finding the best spot in a 2D space (well, half of 2D space, since $\sigma$ must be greater than zero).

The second thing we can notice is that the only tool we have at our disposal here to tell us about the fit to the distribution is the likelihood function, and, well, as the saying goes: when all you have is a likelihood function ...

A good fit will give high likelihoods to the points in the data set (we can't get an arbitrarily good fit by giving everything a lot of likelihood, because there's only so much likelihood to go around – the probabilities that the likelihood function assigns across its domain must sum to 1).

Let's call the likelihood of the data, given some model, to be the likelihood that we get that specific data set by independently generating samples from the model until we have the same number as in the data set (if we have a lot of data points, the likelihood of any particular set of them will usually be very low, since it's the product of the likelihood of a lot of individual points). And let's go ahead and try to tune the model so that the likelihood of our data is maximised.

(Remember, likelihood is probability, except for continuous random variables like our normal distribution, where we can't talk about the probability of a dataset (only about something like the probability of getting a dataset at least as close as [some metric] to the dataset).)

Step 3 is algebra. So what is the likelihood of all our data? Using basic probability, it's the product of the likelihoods of each data point (just like the probability of getting a set of independent events is the product of the probabilities of each event). Returning to our normal distribution with cat data $x_1$ to $x_n$ , the likelihood of the data given distribution $X$ with mean $\mu$ and standard deviation $\sigma$ is

\Pr_X(x_1) \cdot \Pr_X(x_2) \cdot ... \cdot \Pr_X(x_n) \\ = \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2}\big( \frac{x_1-\mu}{\sigma} \big)^2} \cdot ... \cdot \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2}\big( \frac{x_n-\mu}{\sigma} \big)^2} \\ = \left(\frac{1}{\sigma \sqrt{2 \pi}} \right)^n e^{-\frac{1}{2}\big( \big( \frac{x_1 - \mu}{\sigma} \big)^2 + ... + \big(\frac{x_n - \mu}{\sigma} \big)^2 \big)}.

Oh dear. Maximising this is a pain.

Thankfully, there's a trick. We don't care about the likelihood, only that we set $\mu$ and $\sigma$ so that the likelihood is maximised. We can apply any monotonically increasing function to the likelihood, maximise that, and we'll have the $\mu$ and $\sigma$ that maximise the original mess.

Which monotonically increasing function? Logarithms are generally best, because they convert the products you get from calculating the likelihood of a dataset into sums (and in this case they're especially nice, because they'll also take out the exponentials in our distribution's likelihood function).

In fact, throw away the previous calculation, note that

\log\Pr_X(x) = -\log(\sigma \sqrt{2 \pi}) - \frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2 \\ = -\log(\sqrt{2 \pi}) - \log(\sigma) - \frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2, \\

from which we can throw away the $\log(\sqrt{2\pi})$ because it's the same in each term, and then sum all the rest up to get a total log likelihood of

-n\log(\sigma) - \sum_{i=1}^n \Big( \frac{1}{2} \left(\frac{x_i-\mu}{\sigma}\right)^2 \Big).

Call this $f$ ; the values of $\mu$ and $\sigma$ that maximise it are when when $\frac{\partial f}{\partial \mu} = 0$ and $\frac{\partial f}{\partial \sigma} = 0$ ; that's when we've found our peak on the 2D space of possible $(\mu, \sigma)$ pairs (technically this condition only tells us it's a stationary point, but it turns out to be the maximum, as you can prove by taking more derivatives).

So the maximum satisfies

\frac{\partial f}{\partial \mu} = -\sum_{i=1}^n \Big( \frac{x_i-\mu}{\sigma} \Big) = 0, \text{ and} \\ \frac{\partial f}{\partial \sigma} = -\frac{n}{\sigma} + \sum_{i=1}^n \left( \frac{(x_i - \mu)^2}{\sigma^3} \right) = 0.

The first condition gives

\hat{\mu} = \frac{1}{n}\sum_{i=1}^n x_i,

in other words that $\hat{\mu}$ , our best estimator function for the value of $\mu$ , is the average of the values in the data set.

From the second condition, we can do algebra to get

\hat{\sigma} = \sqrt{\frac{1}{n} \sum_{i=1}^n(x_i-\mu)^2}.

We need to be careful here, though. When writing out the conditions, $\mu$ and $\sigma$ stood for specific values of the parameters of the normal distribution $X$ . We don't know these values; the best we can do is estimate them with estimators, which are technically not values but functions that take a data set and return an estimated value (and denoted by $\hat{\text{hats}}$ ). We can't have unknown values in our definition of $\hat{\sigma}$ , as we currently do with the $\mu$ in it; we have to replace it with the estimator for $\mu$ like this:

\hat{\sigma} = \sqrt{\frac{1}{n} \sum_{i=1}^n(x_i-\hat{\mu})^2}

– making sure that the estimator $\hat{\mu}$ does not depend on $\hat{\sigma}$ , since that would again make things undefined – or then by writing out the $\hat{\mu}$ estimator like this:

\hat{\sigma} = \sqrt{\frac{1}{n} \sum_{i=1}^n \left(x_i-\frac{1}{n}\sum_{i=1}^n x_i\right)^2},

which at least makes it very clear that the $x_i$ s and their number $n$ define $\hat{\sigma}$ .

When you're done defining your estimators, you should have a clear diagram in your head of how to pour data into the functions you've written down and come out with concrete numbers, with no dangling inputs anywhere – you're not done if you have any.

Supervised and unsupervised learning

There are two main types of fancy model fitting we can do:

Supervised learning, where we have a set of pairs (of numbers or anything else) and we try to design a system to predict one element from the other. For example, maybe we measure the length and weight of some stray cats, but get bored of trying to get them to stay on the scale long enough, so we want to ditch the weighing and predict a weight from the length alone – how well can we do this?
Unsupervised learning, where we have our data (as a set of tuples of associated data, like cat lengths, weights, and locations), and we try to fit a model to it so we can generate similar items; maybe we want to fake a larger stray cat population in our data than actually exists but not get caught by the statistics bureau. (This category also includes things like trying to identify clusters to interpret the data.) Fitting a distribution is perhaps the simplest example: using our one-dimensional cat weight database discussed in the MLE section, we can "generate" new cats by sampling from it, though the "cat" will just be the weight number. The more interesting case is when we have to generate a lot of associated data; for example, this website offers you a new face every time you reload it. Behind it is a probability distribution for a human face in some crazy-dimensional variable space that's detailed enough that sampling it gives you all the data needed to figure out the colours of each pixel in a photorealistic face picture.

The unifying idea is maximum likelihood estimation (MLE). Clearly, something like MLE is needed if you want to fit a distribution to data for unsupervised learning; we're going to need to generate something eventually, so we better have a probability model. It's less clear that supervised learning has anything to do with MLE though, and tempting to think of it as defining some random loss function to measure how bad a fit is, and then minimising that. It's possible to think of supervised learning this way, but then you'll end up with a lot of detail about loss functions in your head, all of which will seem to be pulled out of thin air.

Instead, think of supervised learning as MLE too. We specify a probability model, which will take in some parameters (e.g. the exponent $a$ and constant $b$ in a cat length/weight model like $\text{weight} = b \times \text{length}^a + \epsilon$ , where $\epsilon$ is a normally distributed error term with mean 0 and some standard deviation we either know already or then ask the fitting procedure to find for us), and the value of the predictor variable(s) (e.g. the cat's length), and spit out its prediction of the variable(s) of interest.

(Note that often the variable of interest is not numerical, but a label: "spam", "tumour", "Eurasian oystercatcher", etc.)

In fact, seen from the MLE perspective, it can almost be hard to see the difference – if so, good. Just look at the processes:

Unsupervised learning:
1. Get your dataset $x = (x_1, x_2, ..., x_n)$ .
2. Decide on a probability model (e.g. a simple distribution) $X$ with a parameter set $\theta = (\theta_1, \theta_2, ..., \theta_m)$ .
3. Find the $\theta$ that maximises $\Pr_X(x_1; \theta) \times ... \times \Pr_X(x_n; \theta)=\Pr_X(x;\theta)$ ,* since assuming our data points are drawn independently, this is the likelihood of the dataset.
Supervised learning:
1. Get your dataset of pairs of the form (thing to predict, thing to predict from): $((y_1, x_1), (y_2, x_2), ..., (y_n, x_n))$ .
2. Decide on a probability model $Y$ that which relies on parameter set $\theta = (\theta_1, \theta_2, ..., \theta_n)$ , and also $x_i$ , to predict $y_i$ ..
3. Find the $\theta$ that maximises $\Pr_Y(y_1;x_1, \theta) \times ... \times \Pr_Y(y_n; x_n, \theta) = \Pr_Y(y_1, ..., y_n; x_1, ...., y_n, \theta)$ .*

*(We write $\Pr_X(x_i;\theta)$ to mean the likelihood that $X$ takes the value $x_i$ if the parameters are $\theta$ ; we avoid writing it as a conditional probability $\Pr_X(x \, |\, \theta)$ because interpreting this as a conditional probability is technically only valid with a Bayesian interpretation.)

Linear models

You can invent any model you choose. As always, simplicity pays though, and it turns out that there's a class of probability models which are easy to work with and reason about, for which general algorithms and mathematical tools exist, and which is often good enough: linear models.

The word "linear" immediately brings to mind straight lines. That's not what it means in this context. The linearity in linear models is because the output is a linear combination of "features" (predictor variables).

The general form is

\hat{y_i} = c_1 e_{1,i} + c_2 e_{2,i} + ... +c_n e_{n,i},

where $\hat{y_i}$ is the predicted value, $c_1$ through $c_n$ are constants, and $e_{1,i}$ through $e_{n,i}$ are the features describing the $i$ th set of data. In the simplest case, a feature might be a value we measure directly, but in general it can be any function of data we measure. Ideally, we want that the true value $y_i \approx c_1 e_{1,i} + ... + c_n e_{n,i}$ .

In the above diagram, we see we measure the data $x_i$ (note that it can be a tuple of values rather than a single value), pass it through some blackbox function to generate features, and take the prediction $\hat{y_i}$ to be the sum of multiplying together each feature by the weight assigned to it.

Note that the linear model above is a prediction-maker but not a probability model because it doesn't assign likelihoods. The probability model for a linear model is often taken to be

y_i = c_1 e_{1,i} + c_2 e_{2,i} + ... +c_n e_{n,i} + \epsilon

that is, there's an error term $\epsilon$ that we assume to be a normal distribution with standard deviation $\sigma$ (which may be known, or finding it may be part of fitting the model).

The above is also an equation for predicting one specific output ( $y_i$ ) from one specific set of features, which in turn are determined by one specific input (e.g. a single data point). More generally we can write it in vector form:

\pmb{y} \approx c_1 \pmb{e_1} + ... + c_n \pmb{e_n},

where $\pmb{y}=(y_1, y_2, ..., y_{n})$ , and likewise $\pmb{e_j}$ is a vector whose $i$ th position corresponds to the $j$ th feature of the $i$ th data item.

Note that we can read this equation in two ways: as a vector equation about data, as just described, that's fitted to give $\pmb{y}$ from its features, or as a prediction, saying that the value of a particular $y_i$ will be roughly this.

There's a set of standard tricks to use in linear modelling:

"One-hot coding": using a function that is 0 unless the input data satisfies some condition (having a label, exceeding a value, etc.).
If we have the data point $x_i$ , using the features $e_{0,i} = 1$ , $e_{1,i} = x_i$ , and $e_{2,i} = x_i^2$ to fit a quadratic (if you fit a polynomial of degree higher than 2 without a very solid reason, you're probably overfitting).
We often have a pattern with a known period $T$ (days, years, etc.), and some non-zero starting phase $\phi$ . Therefore we'd want a feature like $\sin((2\pi/T)x+\phi)$ , where $x$ to is an input, to fit this pattern to. If $\phi$ is known, we don't have a problem, but if we want to fit the phase, it doesn't work: the model is not linear in $\phi$ . To fix this, use a trig angle addition identity; the above becomes $\sin(\phi) \cos((2\pi/T)x) + \cos(\phi) \sin((2\pi/T)x)$ , where $\sin(\phi)$ and $\cos(\phi)$ are just constants so can be forgotten about because the fitting model will determine the constants of our features. (Recovering $\phi$ from the final constants will take a bit of maths; note that the constant of the cosine and sine terms in the fitted model will have the amplitude mixed in, in addition to $\phi$ .)

Here's an annotated linear model with parameter interpretation:

The features in this model:

$e_1=x$ .
$e_2$ is 0 if $x < A$ and 1 otherwise.
$e_3$ is 0 if $x < A$ and $x$ otherwise.

(If we want to fit the best value of $A$ , we'll have to do some maths and reconfigure the model. Right now $A$ is a constant that's defined in the functions that calculate the features from the input data.)

The interpretation of the constants:

$c_0$ is the prediction for $x=0$ .
$c_1$ is the base slope.
$c_2$ is the difference between the prediction for $x=0$ (the $y$ -intercept of the $x < A$ line) and the $y$ -intercept of the $x>A$ line.
$c_3$ is how much the slope changes after $x=A$ .

We could have chosen different features (for example, letting $e_1 = 0$ for $x > A$ ), and then gotten perhaps more readable constants ( $c_3$ would become just the slope, not the difference in slope). We could also have added a feature like $e_4 = x^2$ , and then the model would no longer look like just straight lines. But whatever we do, we need to be careful to interpret the constants we get correctly, especially when the model gets complicated.

For our cat weight prediction example, we might expect weight $W$ and length $L$ to have a relation like $W \approx c L^3$ , where $c$ is a constant that the model will fit. If we want to ask questions about whether a cubic relation really is the best, take logs and fit something like $\log(W) = c_1 + c_2 \log(L)$ – $c_2$ tells us the exponent.

Feature spaces and fitting linear models

The main benefit of linear models is that by talking about linear combinations of data vectors we reduce the maths of fitting parameters to linear algebra. Linear algebra is about transformations of space and the vectors in it, so it also allows for a visual interpretation of everything.

Let's say we have a model like this:

\pmb{y} \approx c_1 \pmb{e_1} + c_2 \pmb{e_2}.

Here, $\pmb{y}$ is the actual measured data, and $\pmb{e_i}$ are functions of the (also measured) predictor variables. Let's say $\pmb{y} = (y_1, y_2, y_3)$ – i.e., we have three data points. We can imagine $\pmb{y}$ as a vector pointing somewhere in 3D space, with $y_1$ , $y_2$ , and $y_3$ the distances along the $x$ , $y$ , and $z$ axes. Likewise, $\pmb{e_1}$ and $\pmb{e_2}$ can be thought of as 3D vectors encoding some (function of the) data we've measured.

Now the only dials a linear model gives us to adjust are the weights of $\pmb{e_1}$ and $\pmb{e_2}$ : $c_1$ and $c_2$ . There's a 2D space of them (since there are two constants to adjust – $c_1$ and $c_2$ ), and as it happens, there's a nice geometric interpretation: each pair $(c_1, c_2)$ corresponds to a point on the plane spanned by $\pmb{e_1}$ and $\pmb{e_2}$ (specifically, the point you get to if you move $c_1$ times along $\pmb{e_1}$ and then $c_2$ times along $\pmb{c_2}$ ).

So what are the best values of $c_1$ and $c_2$ ? The intuitive answer is that we want to get as close as possible to $\pmb{y}$ :

In this case, the closest to $\pmb{y}$ that we can reach on the plane spanned by $\pmb{e_1}$ and $\pmb{e_2}$ is the green vector, and the black vector is the difference between the predicted data vector and actual data vector.

Mathematically, what are we doing here? We're minimising the distance between the vector $\hat{\pmb{y}} = c_1 \pmb{e_1} + c_2 \pmb{e_2}$ (where $c_1$ and $c_2$ can be varied) and $\pmb{y}$ ; this distance is given by

\sqrt{(\hat{y_1} - y_1)^2 + (\hat{y_2} - y_2)^2 + (\hat{y_3} - y_3)^2 }.

Previously we simplified optimisation by applying a logarithm (a monotonically increasing function) and optimising that; this time we do the same by applying the squaring function (which is monotonically increasing for positive numbers, which our distance is limited to). This means that the quantity to minimise is

(\hat{y_1} - y_1)^2 + (\hat{y_2} - y_2)^2 + (\hat{y_3} - y_3)^2.

In other words, we minimise the sum of squared errors ("least squares estimation" is the most common phrase).

If we have more than three data points, then we can't picture it, but the idea is exactly the same. Fitting an $n$ -dimensional dataset to a linear model of $m$ features boils down to moving as close as possible in $n$ D space to the observed data vector, while limited to the $m$ -dimensional (at most; see below) space spanned by the features.

(Above, $n=3$ and $m=2$ . Generally $n$ is huge because datasets can be huge, while $m$ is much smaller since it's the number of features we've written down into the model.)

A maths lecturer is giving a lecture about 5-dimensional geometry.

A student asks a question: "I can follow the algebra just fine, but it would be helpful if I could visualise it. Is there any way to do that?"

The lecturer replies: "Oh, it's easy. Just imagine everything in $n$ dimensions, and then let $n=5$ ."

(variants of this joke are common; see for example here.)

Linear independence

A set of vectors is linearly dependent if there exists a vector in it that can be written as a linear combination of the other vectors. If your feature vectors are linearly dependent, you will get the same predictions out of your model, but you can't interpret the coefficients.

(For visual intuition: two vectors in 2D are linearly dependent if they lie on the same line, three vectors in 3D are linearly dependent if they lie on the same plane (a superset of the case that they lie on the same line), and so on.)

An easy way to make this mistake is if you're doing one-hot coding of categories. Let's say you're fitting a linear model to estimate student exam grades $y$ based on their university, with a model that looks like this:

y \approx \alpha + \beta \cdot 1_{\text{Oxford}}+\gamma\cdot1_{\text{Cambridge}}+...,

using indicator function notation. Whatever linear fitting routine you do will happily give you coefficient values and the predictions it gives will be sensible, but you won't be able to interpret the coefficients. To see what's happening, consider an Oxford student: their predicted grade $y$ is $\alpha + \beta$ . What is $\alpha$ and $\beta$ ? Good question – we can only assign meaning to their combination. If instead we eliminate one university and write

y \approx \alpha + \beta \cdot 1_{\text{Cambridge}} + ...,

when we now fit the coefficients, $\alpha$ will be the predicted grade for Oxford students, and $\alpha+\beta$ the predicted grade for Cambridge students, so we can interpret $\alpha$ as the Oxford average, and $\beta$ as the difference between Oxford and Cambridge. (The predictions given by the model won't change though.)

The vector interpretation is that if our dataset contains, say, 3 Oxford students followed by 2 Cambridge students, the (5D) data vectors in the first model will be

\alpha \begin{pmatrix}1 \\ 1 \\ 1 \\ 1 \\ 1\end{pmatrix} + \beta \begin{pmatrix}1 \\ 1 \\ 1 \\ 0 \\ 0\end{pmatrix} + \gamma \begin{pmatrix}0 \\ 0 \\ 0 \\ 1 \\ 1\end{pmatrix}.

But these vectors aren't linearly independent: the last two vectors sum up to the first one, and therefore there will be many triplets $(\alpha, \beta, \gamma)$ that give identical predictions.

Linear fitting and MLE

We talked about MLE being the holy grail of model fitting, and then about linear models and how fitting them comes down to a geometry problem. As it turns out, MLE lurks behind least squares estimation as well.

I mentioned earlier that linear models often assume a normal distribution for errors. Let's assume that, and do MLE.

Our model is that

Y_i = c_1 e_{1,i} + ... + c_n e_{n,i} + \epsilon,

where $\epsilon \sim N(0,\sigma^2)$ (i.e. follows a normal distribution with mean zero and standard deviation $\sigma$ ).

A useful property of normal distributions is that if we add a constant $c$ to a normal distribution with mean $\mu$ , the result has a normal distribution with mean $\mu + c$ and the same standard deviation (this isn't true of all distributions!). Therefore we can write the above as

Y_i \sim N(c_1 e_{1,i} + ... + c_n e_{n,i}, \sigma^2).

The likelihood for getting $y$ is

\Pr_Y(y;c_1...c_n, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2} \left( \frac{y - (c_1 e_{1,i} + ... + c_n e_{n,i})} {\sigma} \right)^2},

once again copying out the likelihood function for normal distributions.

Now remember that we just want to fit $c_1$ through $c_n$ . These only occur in the exponent, so we can ignore all the constants out front, and also we can see that since there's a negative in the exponent, maximising it is equivalent to minimising the stuff in the exponent. Taking out $\sigma$ and constants, the relevant stuff to minimise is

(y-(c_1 e_{1,i} + ... + c_n e_{n,i}))^2,

where we can see that the thing we subtract from $y$ is our model's prediction of $y$ (one component of what we previously denoted $\hat{\pmb{y}}$ ). Once again, we can see we're minimising a square of the error. Of course, we have many $y$ -values to fit; to see that it's the sum of these that we minimise, rather than some other function of them, just note that if we take a logarithm we'll get a term like the above (times constants) for each data point we're using to fit.

So least-squares fitting comes from MLE and the assumption of normally distributed errors.

(Are errors normally distributed? Often yes. Remember though that our features are functions of things we measure; even if $x$ has normally-distributed errors, after we apply an arbitrary function to it to generate feature $e$ , the resulting $e$ might not have normally distributed errors (but for many simple functions it still will). We could be more fancy, and devise other fitting procedures, but often least squares is good enough.)

Empirical distributions

What's the simplest probability model we can fit to a dataset? It's tempting to think of an answer like "a normal distribution", or "a linear model with one linear feature". But we can be even more radical: treat the dataset itself as a distribution.

On the left, we've plotted the number of data points that take different values of $x$ (this is a discrete distribution; for a continuous distribution, the probability that any two samples drawn are equal is infinitesimal). On the right, all we've done is normalised the distribution, by rescaling the vertical axis so that the heights of all the bars sum to one. Once we've done that, we can go ahead and call it a probability distribution, and assign the meaning that the height of the bar at $x$ is the probability that the distribution $X$ that we've just defined takes the value $x$ . This is called an empirical distribution.

Sampling from an empirical distribution is easy – just pick a value at random from the dataset. (Of course, the likelihood such a distribution assigns to any value not in the dataset is zero, which can be a problem for many use cases.)

In fact, you've probably already dealt with empirical distributions, at least implicitly. When you calculate the mean and variance of a dataset, you can interpret this as calculating the properties of the empirical distribution given by that dataset. An empirical distribution as an abstract thing apart from your dataset may seem ad hoc, but it's not any less defined than a normal distribution.

The standard way to illustrate an empirical distribution is by plotting its cumulative distribution function (cdf); an empirical one is known as an ecdf. This is almost necessary for continuous variables. In general, the ecdf of a dataset is a very useful and general way to visualise it: it saves you from the pains of histograms (how large to make the bins? if you take logs or squares first, do you take them before or after binning? etc. etc.), and is also complete in the sense of technically displaying every point in the dataset.

The ecdf for the above distribution would look something like this:

(Like any cdf, it takes the value 0 up until the first data point and the value 1 after the last data point.)

If we now fit any parametric (i.e. non-empirical) distribution, comparing its cdf to the ecdf is a good test of how good the fit is.

Measuring the goodness of a model fit with KL divergence

The empirical distribution is the best possible fit to a given dataset, and therefore it's a good benchmark to measure the fit of a proposed model against.

Let's say our data is $x=x_1, ... ,x_n$ , and the empirical distribution is $X^*$ . The likelihood of drawing $x$ from $X*$ is (under the assumption of each $x_i$ being drawn independently)

\Pr_{X^*}(x_1) \cdot ... \cdot \Pr_{X^*}(x_n).

Now $\Pr_{X^*}(x_i)$ is just the fraction of how many $x_j$ in $x$ are equal to $x_i$ . Writing $N_{x_i}$ to mean the number of values equal to $x_i$ in the data, we can write

\Pr_{X^*}(x_i) = \frac{N_{x_i}}{n}.

Taking logs, and writing $q_v = N_{v} / n = \Pr_{X^*}(v)$ , the above product for the likelihood becomes the sum, over possible values $$v$$ of $$x_i$$, for the log likelihood:

\sum_{v} N_{v} \log(q_v).

Now we'll do one last trick, which is to scale by $1/n$ ; otherwise, the term in front of the log will tend to be bigger if we have more data points, while we want something that means the same regardless of how many data points there are. After we do that, we notice a nice symmetry:

\sum_{v} q_v \log(q_v).

This is a good baseline to compare any other model to. For example, let's say we fit to this a (discrete) distribution $X$ (with the same sample space as $X^*$ ) with parameters $\theta$ . Write $p_v = \Pr_X(v; \theta)$ , and we can express the log likelihood of the dataset as

\sum_{v} N_{v} \log(p_v).

Normalising by $1/n$ as before, we get

\sum_{v} q_v \log(p_v).

Now to get a measure of fit goodness, just subtract, and do some algebra on top if you feel like it:

\sum_{v} q_v \log(q_v) - \sum_{v} q_v \log(p_v) \\ = \sum_{v} q_v \log(q_v/p_v) \\ = \sum_{v} \Pr_{X^*}(v) \log\left(\frac{\Pr_{X^*}(v)}{\Pr_X(v;\theta)}\right).

(In the last step, I've just expanded out our earlier definitions of $p_i$ and $q_i$ .)

This is called the Kullback-Leibler divergence (KL divergence). If $X=X^*$ , then it comes out to 0; for worse fits, the value becomes greater.

There's a nice information theoretic interpretation of this result. $- \sum_{v} q_v \log_2(p_v)$ is the average number of bits needed to most efficiently represent a value randomly drawn from the dataset, using a coding scheme optimised for the distribution $X$ .

Review: Foragers, Farmers, and Fossil Fuels

2020-12-17T08:24:00.006+00:00

Book: Foragers, Farmers, and Fossil Fuels: How Human Values Evolve, by Ian Morris (2015)
7.8k words (about 26 minutes)

This post has also been published here.

Two hundred years ago, most people lived in societies that considered slavery, war, and discrimination based on class, ethnicity, and gender to be justifiable. Today, most people live in societies that hold the opposite beliefs.

What changed? A simple and tempting narrative is that we have simply become wiser; that various Enlightenment philosophers, thoughtful activists, and other principled people figured out that the pre-industrial moral order is wrong and managed to persuade everyone to change.

It is true that many smart and principled people had good ideas and that this was a big proximate driver of better values. But is it a coincidence that this change in values happened around the same time as the industrial revolution?

What about the previous economic revolution, the agricultural one? Did that also coincide with a change in the values that people held? The evidence says yes – foraging societies tend to be more accepting of violence and far less accepting of hierarchy than farming ones.

The argument of Ian Morris' Foragers, Farmers, and Fossil Fuels is that these timings are not a coincidence. Societies that change their main method of getting energy also change their values, because some sets of values give greater success for a certain type of society. Farming societies that stick to anti-hierarchical forager attitudes won't survive competition with farming societies that learn to believe in hierarchies (maybe they won't be economically competitive and won't be able to field as big an army to defend themselves as the god-king next door can field to conquer them). Likewise, industrial societies that stick to inflexible hierarchies and elite-focused economies can't compete with more equal democracies that don't squander the talents of the non-elite, and maintain a well-looked-after middle-class of rich consumers and educated workers.

We can contrast two ways of trying to explain the history of values. The first says that the history of values is a history of ideas; a battle of ideas against other ideas, waged in the minds of people. The second says that the history of values is a history of what works best. The battle is between the benefits conferred by believing in certain ideas and those conferred by other ones, and it is waged out in the real world, where empires fall or rise based on whether they value the things that will lead them to success.

It is clear that neither style of explanation is enough on its own. No matter how persuasive it can be made, a sufficiently destructive idea – as an extreme example, that everyone should commit suicide – will not find its adherents in charge of the future (or coming from the opposite direction: why do you think many religions are so big on the "be fruitful and multiply" point?). On the other hand, no matter how practically useful a certain idea is, someone has to have the idea and persuade other people to adopt it as a value before it has a chance of spreading because of its practical benefits.

The question, then, is just how far can we push the deterministic account, where the methods of energy capture constrain values. In Ian Morris' telling, the answer is surprisingly far, and if his account of the history of values is correct, I agree with him (in particular, the similarities of farming society values across continents is hard to explain otherwise). However, I think Morris, along with most people who advance or accept similar arguments, goes too far with the moral pragmatism that these ideas may be thought to imply.

But first: what values did foragers, farmers, and fossil fuel users actually hold, and what is Morris' energy-based explanation of the changes between them?

Foragers

Everyone has some idea of what a forager or hunter-gatherer is, but since we want to deal with differences between foragers and farmers, we want a clear idea of where the line is. Morris cites a good definition by Catherine Panter-Brick: foragers are people who "exercise no deliberate alteration of the gene pool of exploited resources". If you plant and harvest a few naturally occurring plants, you're still a forager, but when you start refining the crops generation by generation or breeding the animals, that's the point when you become a farmer.

Of course, there is a vast amount of variance in culture, lifestyle, and values between different forager bands. To almost every generalisation about foragers, there exists some tribe that does the opposite. However, Morris argues that for each main type of human society (foraging/farming/industrial), it is useful to talk about the average set of values such societies held or tended to develop towards, at least in terms of the broad categories of tolerance of political/economic/gender hierarchy and propensity to violence. This covers up lots of important questions – different societies may have justified violence under different circumstances, or had different reasons for why economic inequality was acceptable, but such differences are sucked up into one category and ignored in this sort of analysis. That this makes sense will become apparent once we see that foragers, farmers, and fossil fuel users can be sensibly compared and contrasted even at this very general level.

In some ways, forager values are familiar. Even among foragers, possession and ownership are big deals, with every item generally having an owner. In other ways, they're surprisingly different.

Take violence. Though it's very difficult to come up with exact figures for anything to do with foragers (ancient foragers left behind only bones and tools, and modern foragers only live in places that farmers didn't want, so might not be a representative sample), the chance of dying by murder may have been around 10% in an average forager tribe, compared to 0.7% today, 1-2% across the 1900s (including all wars), roughly 5% in your average farming society or in the most murderous countries of today, and 20% for Poland during World War II.

This was not recognised by anthropologists until the 1990s or so because, as Morris explains:

"[T]he social scale imposed by foraging is so small that even high rates of murder are difficult for outsiders to detect. If a band with a dozen members has a 10% rate of violent death, it will suffer roughly one homicide every twenty-five years, and since anthropologists rarely stay in the field for even twenty-five months, they will witness very few violent deaths."

This is why Elizabeth Marshall Thomas' !Kung ethnography was called "The Gentle People", even though "their murder rate was much the same as what Detroit would endure at the peak of its crack cocaine epidemic".

Foragers are also extremely averse to hierarchy. Perhaps the best summary is given by a !Kung San forager asked about the absence of chiefs:

"Of course we have headmen! In fact we’re all headmen … Each one of us is headman over himself!"

It's not just that foragers don't have strict hierarchies and this behaviour falls out naturally as a result; they are actively opposed to any sort of hierarchy or inequality. Material inequality is considered morally wrong, and fairness essential. Pressure to share spoils is applied liberally. And as in any group of humans, you'll have upstarts who try to achieve greatness and power, but such people usually have opposition groups immediately form to hold them back. Anthropologist Christopher Boehm calls these "reverse dominance hierarchies"; Morris translates this as "coalitions of losers".

The one sort of inequality that foragers aren't opposed to is gender inequality, with the dominant role in politics and violence generally falling to men (as an example of this attitude, Morris cites a forager of the Ona people (also known as the Selk'nam or Onawo) saying "the men are all captains and the women are sailors"). However, the gender inequality in forager societies is still on a different level from the extreme gender inequality and regimentation of farmer societies, and attitudes about sex were looser too. Morris writes that "abused wives regularly just walk away [...] without much fuss or criticism, and attitudes towards marital fidelity and premarital virginity tend to be quite relaxed".

Farmers

As with foragers, Morris lumps together farming societies into one ideal type, labelled Agraria by Ernest Gellner. As before, this covers up a lot of variation (in particular, he identifies horticulturalists, city states like classical Athens or medieval Venice, and proto-industrial nations like Qing dynasty China, Mughal India, Ottoman Turkey, and Enlightenment Western Europe as the three extremes of Agraria), but Morris argues "the exceptions and sub-categories should not be allowed to obscure the reality of an ideal type representing in abstract terms the core features of peasant farming society". He cites Robert Redfield:

"[I]f a peasant from [any one of widely separated farming societies] could have been transported by some convenient genie to any one of the others and equipped with a knowledge of the language in the village to which he had been moved, he would very quickly come to feel at home. And this would be because the fundamental orientations of life would be unchanged. The compass of his career would continue to point to the same moral north."

So what is the moral north of farming societies? Perhaps surprisingly, it's almost as hard to make definite conclusions about what anyone other than the elite thought in agrarian societies as it is to make conclusions about foragers.

While the elite read and wrote a lot, they didn't care much about what the peasants thought, and peasants were not literate. The most literate ancient societies – for example Athens in the 4th and 5th centuries BCE – had a rudimentary literacy rate of 10%, so one person in ten might be able to glean some meaning from words, but how well they could set down their thoughts on moral values is a different question. To get higher literacy rates, you have to move in time to the early second millennium, and in space to urban China or western Europe. Morris writes that "genuine mass literacy, with half or more of the population able to read simple sentences, belongs to the age of fossil fuels”, and because of this, most of “our evidence for peasant experience comes from archaeology and accounts by twentieth-century anthropologists, rural sociologists, and development economists." If history is the written record of the past, then the majority of the population lived their lives outside history until the past century or two. (Perhaps we might even say that history in this sense only began with the internet age, when the private lives of everyone began being set down.)

Before going into the trickier question of values, we can compare foragers and farmers in some simple ways. First, their energy consumption was higher. Foragers, like all humans, need to eat about eight and a half megajoules (2000 kilocalories) of energy as food per person per day to stay alive. Add cooking, and total energy consumption roughly doubles. The energy use of agrarian societies starts out at a forager level of around 20 MJ/person/day (5000 kcal), and goes up to the 100-150 MJ/person/day level (compare to 500 MJ/person/day (120 000 kcal), plus/minus a factor of two or so, for modern rich industrial nations).

Second, farming societies have very roughly perhaps half as few violent deaths as foragers, due to the existence of governments that at least occasionally kept the peace.

However, their life wasn't better on most metrics. In contrast to the literature (both then and now) full of "tales of vagabonds, wandering minstrels, and young men striking out to make their fortunes", "most farmers lived in worlds much smaller than most foragers had done, and never went much more than a day or two’s walk from the villages they were born in". Not only this, but:

"Excavated skeletons suggest that ancient farmers tended to suffer more than foragers from repetitive stress injuries; their teeth were often terrible, thanks to restricted diets heavy on sugary carbohydrates; and their stature, which is a fairly good proxy for overall nutrition, tended to fall slightly with the onset of agriculture, not increasing noticeably until the twentieth century AD."

No farming society even managed to escape the repeating cycles of population growth and starvation that foragers were also prone to, despite having more direct control over their food supplies. Populations would increase to keep pace with the good times until all farmers were slaving way to stay at subsistence levels given the crowdedness and quality of the land. Then many would starve to death when the bad times came.

Another trend across the history of farming societies is three things coinciding: energy consumption rises above 40 MJ (twice the minimum agrarian level and the typical forager level), towns grow past 10 000 people, and a few people take charge and start bossing around the others with their governments.

In farming societies, widespread respect and reverence for hierarchy was internalised by everyone. Morris writes that “[f]arming society often seemed obsessed with the symbolism of rank”, and twentieth century anthropologists "regularly found that having a healthy respect for authority – knowing your place – was a key part of their informants’ sense of themselves as good people". This often came, and still comes, as a surprise to non-farmers:

"[W]hen European reformers began venturing outside their urban enclaves into the countryside in the eighteenth century, they were often astonished that instead of complaining about inequality and demanding the redistribution of property, peasants largely took it as right and proper that most people were poor and weak while a few were rich and strong."

Especially revered was the "Old Deal", Morris' term for the generalised social contract between classes in agrarian societies: that some have the duty to be commanders (or "shepherds of the people", in the preferred phrasing of many a king), others to obey those commands, and if everyone follows this script then things work fine.

Even when the powerful were questioned, the questioning didn't go as far as the Old Deal itself. In fact it rarely reached the king. “The tsar is good but the boyars [aristocrats] are bad", goes a Russian saying; even those who protested the powerful assumed that the highest levels of power must be good and holy, and the problems came from their will being incorrectly carried out by lesser lords. Even when the king himself came under fire, the Old Deal itself, or the inequality it entailed, were not questioned. The most common sort of rebellion against a king took what Morris calls a "good-old-days form": the justification was that the king had broken the Old Deal (or been abandoned by the gods or lost the Mandate of Heaven) and the urgent need was to restore the days when the right dictator was in charge, not abolish the dictatorship in the first place.

There were exceptions – in the 1640s some Chinese peasants called themselves "Levelling Kings" and went around questioning who gave their rulers the right to call them serfs, and of course there's the gradual English case and the rather more abrupt French case – but these only came when the societies in question started hitting energy consumptions of 150 MJ/day, the very highest end that agrarian societies could achieve without a full-on industrial revolution.

(Morris implies that the energy consumption is the cause. This seems backwards; an explanation running through the institutions and organisation needed to sustain this energy level seems much more reasonable. In general, perhaps when Morris talks about "energy consumption", you should read "the societal factors that enable higher energy consumption" in its place.)

Given how anti-hierarchy foragers were, how did this come to be? Were the peasants all forced into a rigid hierarchy by ruthless elites?

'“You may fool all the people some of the time; you can even fool some of the people all the time; but you can’t fool all the people all the time,” Abraham Lincoln is supposed to have said (unless it was P. T. Barnum). But Korsgaard and Seaford apparently think that Lincoln/Barnum was wrong, and that for ten thousand years everyone in Agraria was led by the nose—women by men, poor by rich, everyone by priests—and robbed blind. This I just cannot credit. Humans are the cleverest animals on the planet (for all we know, the cleverest in the whole universe). We have worked out the answers to almost every problem we have ever encountered. So how, if farming values were really just a trick perpetrated by wicked elites, did they survive for ten millennia? Most of the farmers I have met have been canny folk; so why could farmers in the past not figure out what was going on behind the wizard’s veil?

The answer, in my opinion, is that there was no veil. The veil is a figment of modern academics’ imaginations, made necessary by the assumption that only a tiny elite could possibly have thought that hierarchy was a good thing. In reality, farmers had farming values not because they fell for a trick but because they had common sense.'

It is clearly a mistake to think that farmers participated in farming societies and its values through gritted teeth. However, I don't think it was so much farmers' common sense that made them adopt farming values. Societies that brainwashed their members into sincerely accepting farming-era hierarchies did better, and eventually all farming societies mastered this art.

Specific inequalities: forced labour and patriarchy

In addition to the general extreme hierarchy of farming societies, there are two specific types of inequality that are both interesting in their causes and tragic in their consequences.

The first is slavery, and forced labour more generally. Both are almost entirely absent in foraging bands, which might take captives from other tribes but usually eventually integrate them into the tribe rather than keeping them forever as slaves. In contrast, some form of forced labour is found in almost every agrarian society.

Why? Because financial institutions weren't strong enough. Markets for labour existed almost everywhere, but there was a problem: “anyone who had enough land to support a family preferred to make a living by working it rather than by selling labor”, because, without reliable banks for everyone, keeping a good farm was the only robust way to accumulate and maintain wealth, especially for your children. When it was time for a big construction project (maybe the pharaoh died and you need a pyramid to bury him in), even wealthy employers like the state couldn't always hire enough workers. Often they resorted to violence to lower the costs of labour. Violence, after all, came cheap.

The second specific kind of inequality was male domination and strict gender roles. Morris offers a two-pronged explanation. First, farmer men had more reason than forager men to keep farmer/forager women under control:

“The main reason that male foragers generally care less than male farmers about controlling women [...] is that foragers have much less to inherit than farmers. [...] [Q]uestions about the legitimacy of children matter a lot less than they do when only legitimate offspring will inherit land and capital.”

(We might ask why farming societies were so strict about only legitimate offspring inheriting property, but perhaps this is a case of biological values limiting the space of cultural variation.)

Second, gender roles became more regimented out of necessity. Agricultural work – plowing, manuring, and irrigation – relies on brute upper body strength, which favours males. Farmers worked harder in general than foragers, so more male-specific strength-based work also pushed everything else – home upkeep (which foragers didn't need to do) and food processing – onto women. As early as 7000 BCE, skeletons from Syria suggest that both genders regularly carried heavy loads, but only women had an arthritic condition caused by kneeling and footwork, probably as a result of grinding grain.

Finally, child bearing is obviously restricted to women. With the advent of farming, the doubling time for populations fell by a factor of five, from ten thousand to two thousand years. Infant mortality seems not to have changed, so this is due to increased birth rates alone.

Morris writes that this decision on gender norms seems so obvious that "no farming society that moved beyond horticulture ever seems to have decided anything else". According to him, "if we sit theorizing in our fossil-fuel studies" we might imagine an alternative were women had the upper hand, "sending otherwise-useless men out to labor for them in the fields, but in reality, the organizational needs of farming societies gave men the means to inflict devastating economic pain on faithless wives while also raising the costs for men of failing to deter women from bringing cuckoos back to the nest". The empirical correlation between gender inequality and farming societies seems strong and Morris' arguments are plausible, but whether they're the final word is less clear.

Of course, you can't hold everyone down all the time. Morris lists many historical cases of people who were slaves and/or women, but nevertheless defied expectations and attained great success. For example, Morris tells the story of an Athenian slave banker called Pasion, who did so well that he was eventually not only able to buy his own freedom but also the bank itself.

(Interestingly, Wikipedia tells the story slightly differently, saying he was manumitted as a reward for his work, and inherited the bank after his former owners retired, rather than by buying it outright. Wikipedia cites the 1971 Athenian Propertied Families by J. K. Davies; Morris cites Edward Cohen's Athenian Economy and Society and Jeremy Trevett's Apollorodus Son of Pasion, both from 1992. I don't know who to believe, or whether a consensus exists.)

Morris' harsh conclusion is that both forced labour and patriarchy were "functionally necessary to farming societies that generated more than 10k kcal/cap/day [42 MJ/cap/day]”.

Fossil-fuel users

Many places underwent the agricultural revolution independently of each other, because farming spread slow enough that distant people could invent it on their own before the waves of someone else's discovery of farming reached them. In contrast, the industrial revolution happened in north-west Europe fast enough, and gave big enough advantages, that no other region had an independent industrial revolution.

The culture and values of the post-industrial West – democracy, human rights, individualism, market-orientedness, and so on – are often labelled Western. In some sense this is a tautology; by definition, these are the values that Western countries have at the moment. The label is also used in a deeper sense, to mean that there is some kernel of Westernness in these values that makes them the logical conclusion of pre-industrial Western thought, and perhaps incompatible with different cultural bases.

One consequence of Morris' arguments is that this perspective is wrong. What we might call Western values are no more Western values than farming-era values are Sumerian values (or Indus Valley values or Mesoamerican values or ...); the reason Western values are called Western values but farming values aren't called Sumerian values is that the industrial revolution spread faster than the agricultural one. To explain Western values we should look not at ancient Greek philosophers and whatnot but at the demands of industrialised societies.

This does not mean that every industrialised society will approach the West in its values, only that the pressures are there (and wily enough dictators or future technological trends may be enough to avoid them). It might also be that the reason that Europe underwent an industrial revolution while other societies at the edges of agrarian achievement did not is that, by accidents of history and geography, pre-industrial north-west European values were closer to modern industrial values than those of the other societies that have stood at the cusp of industrialisation.

But the overall conclusion remains: "Western" values are the universal values that industrialised societies tend towards. The conflict between Boko Haram or the Taliban and the West, to use two of Morris' examples, is not so much a conflict of culture versus culture, but of era versus era; a last stand of the hierarchy- and patriarchy-obsessed farming values that were held by everyone (except a forager here or there) until a few hundreds years ago. On a more granular level, the steady retreat of discrimination and formality from Western societies is simply the gradual acceptance that these vestiges of the farming era are no longer useful.

As with the transition to farming society, there's the question of how people eventually reached almost opposite stances of what their ancestors had believed. Unlike with the agricultural revolution, the question is especially pressing because the timescale of the changes is so short. But once again, a lot of it was driven by economics.

The first step was people moving from countryside farming to factory jobs:

"Nineteenth-century sources make it very clear that entering the wage-labor market could be a traumatic experience, requiring workers to submit to strict time discipline and factory conditions unlike anything they had known in the countryside; and yet millions chose to do so, because the alternative—hunger—was worse.

So eager were poor farmers for dirty, dangerous factory jobs that British employers only needed to increase wages by 5 percent (in real terms) between 1780 and 1830, although output per worker grew by 25 percent. Wage increases accelerated only in the 1830s, and even then only for urban workers. The great motor was productivity, which was now rising so high that employers began finding it cheaper to share some of their profits with their workers than to try to break strikes. (In another great irony, by the time that Dickens, Marx, and Engels were writing, wages were rising faster than ever before in history.) For the next fifty years, wages rose as fast as productivity; after 1880, they rose even faster. By then, incomes were beginning to rise in the countryside too.”

One resulting value change was the abolition of forced labour:

“By making wage labour attractive enough to draw in millions of free workers, higher wages made forced labor less necessary, and because impoverished serfs and slaves—unlike the increasingly prosperous wage labourers—could rarely buy the manufactured goods being churned out by factories, forced labour increasingly struck business interests as an obstacle to growth (especially when it was competitors who were using it).”

The farmer-era justifications for gender hierarchy also broke down. First, industrialised societies had less need for brute strength and more need for organisational work, in which there is no gender disparity. Second, birth rates eventually went down, reducing the amount of time women spent on children. As a result, almost universal male dominance during the farming era has given way to a world where 81% of people say gender equality is important, including 98% in Britain but also over 90% of Indonesians and Turks and even 78% of Iranians (India, with a very low 60% and a huge population, is probably the biggest drag on the average).

Morris offers a great summary of the principles of success in agrarian versus industrial societies:

“Agraria had worked by drawing lines, not just between elite and mass or men and women, but also between believers and nonbelievers, pure and defiled, free and slave, and countless other categories. Each group was assigned its place in a complex hierarchy of mutual obligations and privileges, tied together by the Old Deal and guaranteed by the gods and the threat of violence. Fossil-fuel societies, however, work best by erasing lines. The more a group replaces the rigid structure of figure 3.6 with the anti-structure of figure 4.7—a completely empty box, made up of interchangeable citizens—the bigger and more efficient its markets will be and the better it will function in the fossil-fuel world.”

The most successful agrarian social structure have a social structure like the one above; the most successful industrial societies look like this instead:

This, in a nutshell, is why agrarian societies tend towards extreme hierarchy while industrial societies tend towards a social structure of interchangeable mobile individuals, free to do what they want and incentivised to slot themselves wherever they create the most value (at least economically).

With industrialisation, we've managed to roll back the discrimination and hierarchy of the farming age. We've even gone back to valuing fairly flat political hierarchies like the foragers (though we maintain them through democratic institutions rather than "coalitions of losers"), and become more egalitarian about gender than the foragers were, all the while living in societies far less violent than the average hunter-gatherer band.

There is one area where we're more tolerant of hierarchy than foragers, though: economic inequality. Once again the reason is practical:

"[...] Industria can flourish only if it has affluent middle and working classes that create effective demand for all the goods and services that fossil-fuel economies generate, but on the other, it also needs a dynamic entrepreneurial class that expects material rewards for providing leadership and management. In response, fossil-fuel values have evolved across the last two hundred years to favor government intervention to reduce wealth equality—but not too much.”

However, even then we still abhor the farmer-era standard of seeing it as fair when the elite extract as much as they can from everyone under them. In fact, merely the fact that calling elites extractive has become a good political weapon shows how far we've come – as discussed in the farming section, farming-era people saw ruthlessly extractive elites as part of a fair social contract.

A summary of value evolution?

We've just gone over a lot of detail about foragers, farmers, and fossil-fuel user values, and some reasons why values might have developed in the way they did. Is this a story of a random path through the stages of technological development, with harsh selection pressures making sure that societal values are dragged along for the ride? Or is there some pattern to the madness?

Morris' summary table does a good job of summing up the "what" of it:

Two things leaps out from this table, especially if we plot it graphically: when it comes to attitudes towards hierarchy, fossil-fuel users are much closer to foragers than farmers are to anyone, and violence has gone down all along.

(Slide from a talk I gave at EA Cambridge)

Other people have noticed this; economist and futurist Robin Hanson has written about the modern conservative-liberal axis mapping onto how willing people are to abandon farming ways and revert to more forager-like lifestyles and values as societies grow richer (as some people inexplicably prefer writing in digestible chunks rather than monolithic book-length blog posts, it's hard to give just one or two key links, but see for example here, here, here, and here).

Perhaps we can tell a story like this: in the beginning there were foragers. They tended to live as people tend to do, and value the things that evolution had crafted people to want. Humans being humans, there was a lot of politicking, and with no institutions to restrain it, a fair amount of violence. The outside world was harsh and outside anyone's control.

Then the agricultural revolution slowly creeped across the world. At first people lived as before, but generation by generation it turned out that the societies that managed to best persuade people to accept a bit more hierarchy – to show a bit more obedience to the chiefs, grant a bit less non-reproductive status to women – did a bit better than the others. Over millennia, such societies either had their tricks independently discovered or copies by others, or then outright went warpath to subjugate over societies to their rule – and, of course, preach their values, which (given human adaptability) they held sincerely, and with no idea that they thought differently from their distant ancestors. Eventually, the big tricks – organised religion and the god-kings keeping power by letting their henchmen extract as much as they could from their subjects – became almost universal. They also lowered the level of violence by imposing some amount of internal order and perhaps a culture promoting peaceful conflict resolution, if only to spare more strength to throw at neighbouring societies.

Then came the industrial revolution, and suddenly what mattered is how well a society could harness the talents of its members and establish efficient, competitive markets to drive innovation. This created pressures to democratise and erase lines between people. Technology and wealth also increased people's ability to control their lives. Rich and comfortable industrialised people no longer needed to abide by strict farming-era social rules to survive, and so slowly gave up on them, reverting back to more forager-like ways, though with the added advantages of unprecedented peace and material wellbeing.

How selection pressures change values

The reasons why societies tend to adopt pragmatic values are subtle; it's not as if people go around cynically holding the values that will best contribute to their tribe's or society's long-term success. As a result, Morris' descriptions of how selection pressures do their work are worth quoting at length.

First, here's how farmers ended up dominating the world in the first place:

“The first farmers had free will, just like us. As their families grew, their landscapes filled up. […] For all we know, some foragers in the Jordan Valley ten thousand years ago [chose to remain foragers]. The problem, though, was that they were not making a one-time choice. Tens of thousands of other people were asking the same question, and each family had to revisit the decision of whether to intensify or go hungry multiple times every year. Most important of all, each time one family chose to work harder and intensify its management of plants and animals, the payoffs from sticking with the old ways declined a little further for everyone else. Every time cultivators started thinking of the plants and animals on which they lavished care and attention as their personal gardens and flocks, not part of a common stock, hunting and gathering would become that much more difficult for those who stuck to it. Foragers who clung stubbornly and/or heroically to the old ways were doomed because the odds kept tilting against them.”

But how did this result in a world of dictator kings? Morris:

“We should probably assume that people tried lots of different ways to solve the collective action problem of how to create larger, more integrated societies with more complex divisions of labor as they moved from foraging to farming, but almost everywhere, it seems that the solution that worked best was the idea of the godlike king.”

Morris isn't very clear on why godlike kings, out of all possible forms of social organisation, worked best. We can imagine that it's hard to coordinate big armies for defence or offence without one, or that the symbolism of a godlike figurehead is the most reliable way to unite masses in a largely illiterate society, or vaguely gesture like Morris at the challenges of managing complex societies, but there doesn't seem to be much hard evidence or reason for a precise mechanism one way or the other, at least in Foragers, Farmers, and Fossil Fuels.

In general, collective action problems are important in any large organisation, and the simplest solution is complete centralisation; effectively reducing collective action problems back into individual action problems. Of course, this comes with all the cruelties and inefficiencies of real-world non-omnibenevolent, non-omniscient centralised decision-making. Given this, was the centralisation-vs-decentralisation tradeoff really so simple in the farming era that "godlike kings everywhere" was the only effective answer? Perhaps the tradeoffs really were that one-sided in the farming age, and this became a trickier question only in the industrial age when nurturing human talent and prosperity became key societal goals, and we created effective decentralised institutions like free markets and democracy. Or maybe there was a high but not extreme level of optimal centralisation, but the greed of individual rulers often pushed their societies past this level despite selection pressures working in favour of more responsibly lead societies, and it was only with the industrial age that these pressures became high enough to force the world away from the godlike king model.

Morris also describes the rise of capitalism:

“Capitalism took off in early-modern Western Europe because practical people figured out that this was the most effective way to get things done in an increasingly energy-rich world. Other people disagreed, and did things differently. Conflicts and compromises ensued as the competitive logic of cultural evolution went to work and drove the less effective ways extinct.”

Once again, I think the concept of selection pressures is a powerful lens, but the details of what drives the relationship are missing. What exactly was it about an energy-rich environment that made capitalism ideal? Even by Morris' own account, it seems the methods (e.g. complex manufacturing chains, mature financial institutions, etc.) required to most effectively extract and use energy given a particular technology level are what matter, not the raw total of joules consumed per person per day.

Respondents

Foragers, Farmers, and Fossil Fuels originated from the Tanner Lectures at Princeton. As part of the format, the book includes four responses to Morris' arguments, by Richard Seaford, Jonathan Spence, Christine Korsgaard, and Margaret Atwood.

On the whole, these responses don't add much to book, though they are helpful in making Morris elaborate on his arguments in the final chapter (cheekily entitled "My Correct Views on Everything").

Seaford and Spence provide short chapters that seem to be more about their own interests than Morris' arguments, and have the tone of questions asked by professors who slept through the talk but are still trying to say something insightful at the questions session.

Atwood, of The Handmaid's Tale fame, brings an arsenal of literary flair to bear on the task. She manages to make some good points (what about horse-riding pastoralists, who may have been the first large-scale war-makers?), along with some ridiculous statements:

“Several billion years ago, marine algae produced the atmosphere that allows us to breathe, and these algae continue to produce from 60 to 80 percent of our oxygen. Without marine algae, we ourselves cannot survive. During the Vietnam War, huge vats of Agent Orange were being shipped across the Pacific. Should they have sunk and leaked, we would not be having this conversation today.”

Let's do some very rough calculations. If all the Agent Orange deployed in Vietnam had been uniformly distributed across the Pacific, the mass concentration of its component acids (making the highest assumptions about what concentration it was sprayed at) would have been lower than one part in tens of trillions, a hundred thousand times lower than the mass concentrations of either lead or mercury already in the oceans. I couldn't find any study of what happens to algae in oceans if you dump Agent Orange on them, but one article about using algaecide in swimming pools says applying one ten-thousandth of the pool volume is typical. Another article mentions 5-10% as a common concentration, giving an algae-killing active ingredient concentration of maybe 1 in 100 000 in water. Agent Orange would need to kill algae at ten million times lower concentrations in oceans than commercial algaecide does in swimming pools for the Pacific's oxygen production to be destroyed.

(Or maybe Atwood means the literal sense that, because of various butterfly effects, any such change in history makes any present event, including this conversation, unlikely?)

By far the most substantive response comes from the philosopher Christine Korsgaard. She also has the idea that the farming era was an aberration, with a fresh interpretation:

“Instead of thinking that values are determined by modes of energy capture, perhaps we should think that as human beings began to be in a position to amass power and property in the agricultural age, forms of ideology set in that distorted real moral values [i.e. the values a society should hold], distortions that we are only now, in the age of science and extensive literacy, beginning to overcome.”

More significantly, she makes a distinction between the values a society holds and values that should be held (“positive values” and “real moral values” respectively), in contrast to Morris' arguments that such a distinction is meaningless and the only real distinction is between biological values and the form they take in a given society. Her response manages to pick away at Morris' nonchalant bulldozing of all philosophical subtleties.

Responding to this in the last chapter, Morris quotes, and then dismisses, Ernest Gellner's response to a social theory presentation at an archaeology conference: "They tell me you're a good archaeologist, so why are you trying to be a bad philosopher?". Perhaps he should have taken the question more to heart.

The future

The experiment of how to switch from foraging to farming was run many times. Forager bands in many places adopted farming techniques. Some of them had good ideas about how to structure their now-farming societies and succeeded, while others had bad ideas and perished, or were forced to copy techniques from the more successful.

In contrast, today the entire world has been thrust into the industrial age in the space of a few hundred years. There is only one experiment going on, and only one chance to get it right. There's no one to copy from to see what we should do, and no one to pick up the job if our attempt fails.

A successful transition to the industrial world, and whatever we might mark as the next step after that, is therefore less certain than the successful transition from foragers to farmers. The values that industrial life imposes on us might be better than the those of the farming age, but it is not yet clear if they will become as universal as hierarchies and kings once were.

(Better by which standard? I think humans are similar enough that there is a context-independent universal human ethical framework.)

Morris' arguments also lead to the question of how values might change in the future. Will the set of values that a society tends towards continue to improve as technology and wealth increases, or is the cuddliness of industrial values (compared to farming ones) a fluke?

The significance of Foragers, Farmers, and Fossil Fuels for this question is that we won't necessarily be the ones deciding. Over a span of years or decades, we can maintain our values through argument and education. Over a span of centuries, though, we can argue all we like, just as countless luddites and aristocrats railed against industrial/Western values, but if the game has changed and someone else's values make them play it better, it won't be enough. The harsh logic of evolution-like selection pressures can't be resisted forever; those that are best at spreading themselves into the future will eventually claim it.

Yuval Noah Harari, author of Sapiens, says that once we can engineer desires, the question is not "what do we want to become?", but "what do we want to want?". Morris counters that the real question is instead "what are we going to want, whether we want it or not?", and his answer is bleak yet pragmatic: "each age gets the thought it needs" ("needs" referring to "survival needs").

I don't think we need to be either nihilistic (in thinking that every set of societal values is as good as any other; some do a better job of serving universal human wants), nor pessimistic (in thinking that we can't do anything about a slide to worse values; we've never had more control over the future of our world).

Morris writes:

“Trying to imagine people who are somehow divorced from the demands of capturing energy and then speculating about what their moral values would be is an odd activity.”

I disagree. Of course we can imagine people living without being constrained by energy needs. How many science fiction writers or futurists haven't imagined a post-scarcity society?

In fact, aren't we well on our way towards such a world? Forager and farmer lives were significantly shaped by the need to get food, water, light, and warmth. Today in developed countries, these aren't free, but our lives aren't shaped by worrying about them. Sure, you need to work a job, but what you worry about in the job is likely very far separated from survival needs, and provided you have one and aren't massively wasteful, the water and light flows exactly as you want it. Technological progress removes difficulty and scarcity. Ultimately, there's no physical limit stopping us from removing scarcity considerations from our lives (or, more precisely, making them trivial enough that we don't need to worry about them; nothing is ever entirely free in this universe).

Once we've done so, no longer have to make compromises between what we should do and what we as a society are forced to value in order to survive. And so I think it is reasonable to imagine humans whose values aren't warped by survival needs; in fact such values might be good ones to aim for.

(Or maybe the need to focus at least a bit on survival is the one anchor to objective reality that prevents societies from losing themselves entirely to petty politicking and status games.)

Of course, there's always the problem of competition. What happens to our happy post-scarcity society when the people next door ratchet up the competition, say by throwing off all the safeguards around capitalism, or developing AIs or nanomachines or Robin Hanson's emulated minds, and then outcompeting us by adopting values more suitable to exploiting those technologies? Even if we ourselves don't suffer – say we have a big enough wall – in the long run we'd give up the rest of the world (or solar system or galaxy) to the pragmatic-valued competitors. At best, the long-term future looks like an oasis of human flourishing, surrounded by a galaxy-spanning alien economy with weird but morally neutral ways. (Imagine a forager tribe considering the massive and weird industrialised world around them; now imagine we're the foragers.) At worst, any good in our oasis would be outweighed by the morally bad machinations that fuel the endless growth of that weird galaxy-spanning alien economy.

So will we be forced to compromise ever more and more to avoid being outrun by those with fewer scruples about changing their values? Or can we build a world where human values are a winning strategy?

Looking at our track record, I think we have a chance.

Related:
Growth and civilisation

EA ideas 4: utilitarianism

2020-08-10T17:43:00.004+01:00

4.9k words (≈17 minutes)

Posts in this series:
FIRST | PREVIOUS | NEXT

Many ideas in effective altruism (EA) do not require a particular moral theory. However, while there is no common EA moral theory, much EA moral thinking leans consequentialist (i.e. morality is fundamentally about consequences), and often specifically utilitarian (i.e. wellbeing and/or preference fulfilment are the consequences we care about).

Utilitarian morality can be thought of as rigorous humanism, where by “humanism” I mean the general post-Enlightenment secular value system that emphasises caring about people, rather than upholding, say, religious rules or the honour of nations. Assume that the welfare of a conscious mind matters. Assume that our moral system should be impartial: that wellbeing/preferences should count the same regardless of who has them, and also in the sense of being indifferent of who’s perspective it is being wielded from (for example, a moral system that says to only value yourself would give you different advice than it gives me). The simplest conclusion you can draw from these assumptions is to consider welfare to be good and seek to increase it.

I will largely ignore differences between the different types of utilitarianism. Examples of divisions within utilitarianism include preference vs hedonic/classical utilitarianism (do we care about the total satisfied preferences, or the total wellbeing; how different are these?) and act vs rule utilitarianism (is the right act the one with the greatest good as its consequence, or the one that conforms to a rule which produces the greatest good as its consequences – and, once again, are they different?).

Utilitarianism is decisive

We want to do things that are “good”, so we have to define what we mean by it. But once we’ve done this, this concept of good is of no help unless it lets us make decisions on how to act. I will refer to the general property of a moral system being capable of making non-paradoxical decisions as decisiveness.

Decisiveness can fail if a moral system leads to contradiction. Imagine a deontological system with the rules “do not lie” and “do not take actions that result in someone dying”. Now consider the classic thought experiment of what such a deontologist would do if the Gestapo knocked on their door and asked if they’re hiding any Jews. A tangle of absolute rules almost ensures the existence of some case where they cannot all be satisfied, or where following them strictly will cause immense harm.

Decisiveness fails if our system allows circular preferences, since then you cannot make a consistent choice. Imagine you follow a moral system that says volunteering at a soup kitchen is better than helping old people across the street, collecting money for charity is better than soup kitchen volunteering, and helping old people across the street is better than collecting money. You arrive at the soup kitchen and decide to immediately walk out to go collect money. You stop collecting money to help an old person across the street. Halfway through, you abandon them and run off back to the soup kitchen.

Decisiveness fails if there are tradeoffs our system cannot make. Imagine highway engineers deciding whether to bulldoze an important forest ecosystem or a historical monument considered sacred. If your moral system cannot weigh environment against historical artefacts (and economic growth, and the time of commuters, and …), it is not decisive.

So for any two choices, a decisive moral system must be able to compare them, and the comparisons it makes cannot be circular preference. This implies a ranking: X is better than Y translates to X is before Y in the ranking list.

(If we allow circular preferences, we obviously can’t make a list, since the graph of “better-than” relations would include cycles. If there are tradeoffs we can’t make – X and Y such that X and Y are neither better than equal or worse than each other – we can generate a ranking list but not a unique one (in set theory terms, we have a partial order rather than a total order).)

Decisiveness also fails if our system can’t handle numbers. It is better to be happy for two minutes than one minute than fifty nine seconds. More generally, to practically any good we can either add or subtract a bit: one more happy thought, one less bit of pain.

Therefore a decisive moral system must rank all possible choices (or actions or world states or whatever), with no circular preferences, and with arbitrarily many notches between each ranking. It sounds like what we need is numbers: if we can assign a number to choices, then there must exist a non-circular ranking (you can always sort numbers), and there’s no problem with handling the quantitativeness of many moral questions.

There can’t be one axis to measure the value of pleasure, one to measure meaning, and another for art. Or there can – but at the most basic level of moral decision-making, we must be able to project everything onto the same scale, or else we’re doomed to have important moral questions where we can only shrug our shoulders. This leads to the idea of all moral questions being decidable by comparing how the alternatives measure up in terms of “utility”, the abstract unit of the basic value axis.

You might say that requiring this extreme level of decisiveness may sometimes be necessary in practice, but it’s not what morality is about; perhaps moral philosophy should concern itself with high-minded philosophical debates over the nature of goodness, not ranking the preferability of everything. Alright, have it your way. But since being able to rank tricky “ought”-questions is still important, we’ll make a new word for this discipline: fnergality. You can replace “morality” or “ethics” with “fnergality” in the previous argument and in the rest of this post, and the points will still stand.

What is utility?

So far, we have argued that a helpful moral system is decisive, and that this implies it needs a single utility scale for weighing all options.

I have not specified what utility is. Without this definition, utilitarianism is not decisive at all.

How you define utility will depend on which version of utilitarianism you endorse. The basic theme across all versions of utilitarianism is that utility is assigned without prejudice against arbitrary factors (like location, appearance, or being someone other than the one who is assigning utilities), and is related to ideas of welfare and preference.

A hedonic utilitarian might define the utility of a state of the world as total wellbeing minus total suffering across all sentient minds. A preference utilitarian might ascribe utility to each instance of a sentient mind having a preference fulfilled or denied, depending on the weight of the preference (not being killed is likely a deeper wish than hearing a funny joke), and the sentience of the preferrer (a human’s preference is generally more important than a cat’s). Both would likely want to maximise the total utility that exists over the entire future.

These definitions leave a lot of questions unanswered. For example, take the hedonic utilitarian definition. What is wellbeing? What is suffering? Exactly how many wellbeing units are being experienced per second by a particular jogger blissfully running through the early morning fog?

The fact that we can’t answer “4.7, ±0.5 depending on how runny their nose is” doesn’t mean utilitarianism is useless. First, we might say that an answer exists in principle, even if we can’t figure it out. For example, a hedonic utilitarian might say that there is some way to calculate the net wellbeing experienced by any sentient mind. Maybe it requires knowing every detail of their brain activity, or a complete theory of what consciousness is. But – critically – these are factual questions, not moral ones. There would be moral judgements involved in specifying exactly how to carry out this calculation, or how to interpret the theory of consciousness. There would also be disagreements, in the same way that preference and hedonic utilitarians disagree today (and it is a bad idea to specify one Ultimate Goodness Function and declare morality solved forever). But in theory and given enough knowledge, a hedonic utilitarian theory could be made precise.

Second, even if we can only approximate utilities, doing so is still an important part of difficult real-world decision-making.

For example, Quality- and Disability-Adjusted Life Years (QALYs and DALYs) try to put a number on the value of a year of life with some disease burden. Obviously it is not an easy judgement to make (usually the judgement is made by having a lot of people answer carefully designed questions on a survey), and the results are far more imprecise than the 3-significant-figure numbers in the table on page 17 here would suggest. However, the principle that we should ask people and do studies to try figure out how much they’re suffering, and then make the decisions that reduce suffering the most across all people, seems like the most fair and just way to make medical decisions.

Using QALYs may seem coldly numerical, but if you care about reducing suffering, not just as a lofty abstract statement but as a practical goal, you will care about every second. It can also be hard to accept QALY-based judgements, especially if they prefer others to people close to you. However, taking an impartial moral view, it is hard not to accept that the greatest good is better than a lesser good that includes you.

(Using opposition to QALYs as an example, Robin Hanson argues with his characteristic bluntness that people favour discretion over mathematical precision in their systems and principles “as a way to promote an informal favoritism from which they expect to benefit”. In addition to the ease of sounding just and wise while repeating vague platitudes, this may be a reason why the decisiveness and precision of utilitarianism become disadvantages on the PR side of things.)

Morality is everywhere

By achieving decisiveness, utilitarianism makes every choice a moral one.

One possible understanding of morality is that it splits actions into three planes. There are rules for what to do (“remember the sabbath day”). There are rules for what not to do (“thou shalt not kill, and if thy doest, thy goeth to hell”). And then there’s the earthly realm, of questions like whether to have sausages for dinner, which – thankfully – morality, god, and your local preacher have nothing to say about.

Utilitarianism says sausages are a moral issue. Not a very important one, true, but the happiness you get from eating them, your preferences one way or the other, and the increased risk of heart attack thirty years from now, can all be weighed under the same principles that determine how much effort we should spend on avoiding nuclear war. This is not an overreach: a moral theory is a way to answer “ought”-questions, and a good one should cover all of them.

This leads to a key strength of utilitarianism: it scales, and this matters, especially when you want to apply ethics to big uncertain things. But first, a slight detour.

Demandingness

A common objection to utilitarianism is that it is too demanding.

First of all, I find this funny. Which principle of meta-ethics is it, exactly, that guarantees your moral obligations won’t take more than the equivalent of a Sunday afternoon each week?

However, I can also see why consequentialist ethics can seem daunting. For someone who is used to thinking of ethics in terms of specific duties that must always be carried out, a theory that paints everything with some amount of moral importance and defines good in terms of maximising something vague and complicated can seem like too much of a burden. (I think this is behind the misinterpretation that utilitarianism says you have a duty to calculate that each action you take is the best one possible, which is neither utilitarian nor an effective way to achieve anything.)

Utilitarianism is a consequentialist moral theory. Demands and duties are not part of it. It settles for simply defining what is good.

(As it should. The definition is logically separate from the implications and the implementation. Good systems, concepts, and theories are generally narrow.)

Scaling ethics to the sea

There are many moral questions that are, in practice, settled. All else being equal, it is good to be kind, have fun, and help the needy.

To make an extended metaphor: we can imagine that there is an island of settled moral questions; ones that no one except psychopaths or philosophy professors would think to question.

This island of settled moral questions provides a useful test for moral systems. A moral system that doesn’t advocate kindness deserves to go in the rubbish. But though there is important intellectual work to be done in figuring out exactly what grounds this island (the geological layers it rests on, if you will), the real problem of morality in our world is how we extrapolate from this island to the surrounding sea.

In the shallows near the island you have all kinds of conventional dilemmas – for example, consider our highway engineers in the previous example weighing nature against art against economy. Go far enough in any direction and you will encounter all sorts of perverse thought experiment monsters dreamt up by philosophers, which try to tear apart your moral intuitions with analytically sharp claws and teeth.

You might think we can keep to the shallows. That is not an option. We increasingly need to make moral decisions about weird things, due to the increasing strangeness of the world: complex institutions, new technologies, and the sheer scale of there being over seven billion people around.

A moral system based on rules for everyday things is like a constant-sized knife: fine for cutting up big fish (should I murder someone?), but clumsy at dealing with very small fish (what to have for dinner?), and often powerless against gargantuan eldritch leviathans from the deep (existential risk? mind uploading? insect welfare?).

Utilitarianism scales both across sizes of questions and across different kinds of situations. This is because it isn’t based on rules, but on a concept (preference/wellbeing) that manages to turn up whenever there are morally important questions. This gives us something to aim for, no matter how big or small. It also makes us value preference/wellbeing wherever it turns up, whether in people we don’t like, the mind of a cow, or in aliens.

Utilitarianism and other kinds of ethics

Utilitarianism, and consequentialist ethics more broadly, lacks one property that is a common social (if not philosophical) use of morality.

Consider confronting a thief robbing a jewellery store. A deontological argument is “stealing is wrong; don’t do it”. A utilitarian argument would need to spell out the harms: “don’t steal, because you will cause suffering to the owner of the shop”. But the thief may well reply: “yes, but the wellbeing I gain from distributing the proceeds to my family is greater, so my act is right”. And now you’d have to point out that the costs to the shop workers who will lose their jobs if the shop goes bankrupt, plus more indirect costs like the effect on people’s trust in others or feelings of safety, outweigh these benefits – if they even do. Meanwhile the thief makes their escape.

By making moral questions depend heavily on facts about the world, utilitarianism does not admit smackdown moral arguments (you can always be wrong about the facts, after all). This is a feature, not a bug. Putting people in their place is sometimes a necessary task (as in the case of law enforcement), but in general it is the province of social status games, not morality.

Of course, nations need laws and people need principles. The insight of utilitarianism is that, important as these things are, their rightness is not axiomatic. There is a notion of good, founded on the reality of minds doing well and fulfilling their wishes, that cuts deeper than any arbitrary rule can. It is an uncomfortable thought that there are cases where you should break any absolute moral rule. But would it be better if there were rules for which we had to sacrifice anything?

Recall the example of the Gestapo asking if you’re hiding Jews in your house. Given an extreme enough case, whether or not a moral rule (e.g. “don’t lie”) should be followed does depend on the effects of an action.

At first glance, while utilitarianism captures the importance of happiness, selflessness, and impartiality, it doesn’t say anything about many other common moral topics. We talk about human rights, but consequentialism admits no rights. We talk about good people and bad people, but utilitarianism judges only consequences, not the people who bring them about. In utilitarian morality, good intentions alone count for nothing.

First, remember that utilitarianism is a set of axioms about the most fundamental definition of good is. Just like simple mathematical axioms can lead to incredible complexity and depth, if you follow utilitarian reasoning down to daily life, you get a lot of subtlety and complexity, including a lot of common-sense ethics.

For example, knowledge has no intrinsic value in utilitarianism. But having an accurate picture of what the world is like is so important for judging what is good that, in practice, you can basically regard accurate knowledge as a moral end in itself. (I think that unless you never intend to be responsible for others or take actions that significantly affect other people, when deciding whether to consider something true you should care only about its literal truth value, and not at all about whether it will make you feel good to believe it.)

To take another example: integrity, in the sense of being honest and keeping commitments, clearly matters. This is not obvious if you look at the core ideas of utilitarianism, in the same way that the Chinese Remainder Theorem is not obvious if you look at the axioms of arithmetic. That doesn’t somehow make it un-utilitarian; for some examples of arguments, see here.

See also this article for ideas on why strictly following rules can make sense even for strict consequentialists, given only the fact that human brains are fallible in predictable ways.

As a metaphor, consider scientists. They are (in some idealised hypothetical world) committed only to the pursuit of truth: they care about nothing except the extent to which their theories precisely explain the world. But the pursuit of this goal in the real world will be complicated, and involve things – say, wild conjectures, or following hunches – that might even seem to go against the end goal. In the same way, real-world utilitarianism is not a cartoon caricature of endlessly calculating consequences and compromising principles for “the greater good”, but instead a reminder of what really matters in the end: the wishes and wellbeing of minds. Rights, duties, justice, fairness, knowledge, and integrity are not the most basic elements of (utilitarian) morality, but that doesn’t make them unimportant.

Utilitarianism is horrible

Utilitarianism may have countless arguments on its side, but one fact remains: it can be pretty horrible.

Many thought experiments show this. The most famous is the trolley problem, where the utilitarian answer requires diverting a trolley from a track containing 5 people to one containing only a single person (an alternative telling is doctors killing a random patient to get the organs to save five others). Another is the mere addition paradox, also known as the repugnant conclusion: we should consider a few people living very good lives as a worse situation than many people living mediocre lives.

Of course, the real world is never as stark as philosophers’ thought experiments. But a moral system should still give an answer – the right one – to every moral dilemma.

Many alternatives to utilitarianism seem to fail at this step; they are not decisive. It is always easier to wallow in platitudes than to make a difficult choice.

If a moral system gives an answer we find intuitively unappealing, we need to either reject the moral system, or reject our intuitions. The latter is obviously dangerous: get carried away by abstract morals, and you might find yourself denying common-sense morals (the island in the previous metaphor). However, particularly when dealing with things that are big or weird, we should expect our moral intuitions to occasionally fail.

As an example, I think the repugnant conclusion is correct: for any quantity of people living extremely happy lives, there is some larger quantity of people living mediocre lives that would be a better state for the world to be in.

First, rejecting the repugnant conclusion means rejecting total utilitarianism: the principle that you sum up individual utilities to get total utility (for example, you might average utilities instead). Rejecting total utilitarianism implies weird things, like the additional moral worth of someone’s life depending on how many people are already in the world. Why should a happy life in a world with ten billion people be worth less than one in a world with a thousand people?

Alternatives also bring up their own issues. To take a simple example, if you value average happiness instead, eliminating everyone who is less happy than the average is a good idea (in the limit, every world of more than one person should be reduced to a world of one person).

Finally, there is a specific bias that explains why the repugnant conclusion seems so repugnant. Humans tend to show scope neglect. If our brains were built differently, and assigned due weight to the greater quantity of life in the “repugnant” choice, I think we’d find it the intuitive one.

However, population ethics is both notoriously tricky and a fairly new discipline, so there is always the chance there exists a better alternative population axiology than totalism.

Is utilitarianism complete and correct?

I’m not sure what evidence or reasoning would let us say that a moral system is complete and correct.

I do think the basic elements of utilitarianism are fairly solid. First, I showed above how requiring decisiveness leads to most of the utilitarian character of the theory (quantitativeness, the idea of utility). The reasons are similar to the ones for using expected value reasoning: if you don’t, you either can’t make some decisions, or introduce cases where you make stupid ones. Second, ideas of impartiality and universality seem like fundamental moral ideas. I’d be surprised if you could build a consistent, decisive, and humane moral theory without the ideas of quantified utility and impartiality.

Though this skeleton may be solid, the real mess lies with defining utility.

Do we care about preferences or wellbeing? It seems that if we define either in a broad enough way to be reasonable, the ideas start to converge. Is this a sign that we’re on the right track because the two main variants of utilitarianism talk about a similar thing, or that we’re on the wrong track and neither concept means much at all?

Wellbeing as pleasure leaves out most of what people actually value. Sometimes people prefer to feel sadness; we have to include this. How? Notice the word I used – “prefer”. It seems like this broad-enough “wellbeing” concept might just mean “what people prefer”. But try defining the idea of preference. Ideal preferences should be sincere and based on perfect information – after all, if you hear information that changes your preference, it’s your estimate of the consequences that changed, not the morally right action. So when we talk about preference, we need complete information, which means trying to answer the question “given perfect information about what you will experience (or even the entire state of the universe, depending on what preferences count) in option A and in option B, which do you prefer?” Now how is this judgement made? Might there be something – wellbeing, call it – which is what a preferrer always prefers?

Capturing any wellbeing/preference concept is difficult. Some things are very simple: a healthy life is preferable to death, for example, and given the remaining horribleness in the real world (e.g. sixty million people dying each year) a lot of our important moral decisions are about the simple cases. Even the problem of assigning QALY values to disease burdens has proven tractable, if not easy or uncontroversial. But solving the biggest problems is only the start.

An important empirical fact about human values is that they’re complex. Any simple utopia is a dystopia. Maybe the simplest way to construct a dystopia is to imagine a utopia and remove one subtle thing we care about (e.g. variety, choice, or challenge).

On one hand, we have strong theoretical reasons why we need to reduce everything to utilities to make moral decisions. On the other, we have the empirical fact that what counts as utility to people is very complex and subtle.

I think the basic framework of utilitarian ideas gives us a method, in the way that the ruler and compass gave the Greeks a method to begin toying with maths. Thinking quantitatively about how all minds everywhere are doing is probably a good way to start our species’ serious exploration of weird and/or big moral questions. However, modern utilitarianism may be an approximation, like Newton’s theory of gravity (except with a lot more ambiguity in its definitions), and the equivalent of general relativity may be centuries away. It also seems certain that most of the richness of the topic still eludes us.

Indirect arguments: what people think, and the history of ethics

In addition to the theoretical arguments above, we can try to weigh utilitarianism indirectly.

First, we can see what people think (we are talking about morality after all – if everyone hates it, that’s cause for concern). On one hand, out of friends with who I’ve talked about these topics with (the median example being an undergraduate STEM student), basically everyone favours some form of utilitarianism. On the other hand, a survey of almost a thousand philosophers found only a quarter accepting or leaning towards consequentialist ethics (slightly lower than the number of deontologists, and less than the largest group of a third of respondents who chose “other”). (However, two thirds endorse the utilitarian choice in the trolley problem, compared to only 8% saying not to switch (the rest were undecided).) My assumption is that a poll of everyone would find a significant majority against utilitarianism, but I think this would be largely because of the negative connotations of the word.

Second, we can look at history. A large part of what we consider moral progress can be summarised as a move to more utilitarian morality.

I am not an expert in the history of ethics (though I’d very much like to hear from one), but the general trend from rule- and duty-based historical morality to welfare-oriented modern morality seems clear. Consider perhaps the standard argument in favour of gay marriage: it’s good for some people and it hurts no one, so why not? Arguments do not get much more utilitarian. (Though of course, other arguments can be made with different starting points, for example a natural right to various freedoms.) In contrast the common counter-argument – that it violates the law of nature or god or at least social convention – is rooted in decidedly non-utilitarian principles. Whereas previously social disapproval was a sufficient reason to deny people happiness, today we assume a heavy, even insurmountable, burden of proof of any custom or rule that increases suffering on net.

A second trend in moral attitudes is often summarised as an “expanding moral circle”: granting moral significance to more and more entities. The view that only particular people of particular races, genders, or nationalities count as moral patients has come to be seen as wrong, and the expansion of moral patienthood to non-humans is already underway.

A concern for anything capable of experiencing welfare is built into utilitarianism. Utilitarianism also ensures that this process will not blow up to absurdities: rather than blindly granting rights to every ant, utilitarianism allows for the fact that the welfare of some entities deserves greater weight, and assures us there’s no need to worry about rocks.

It would be a mistake to say that our moral progress has been driven by explicit utilitarianism. Abolitionists, feminists, and civil rights activists had diverse moral philosophies, and the deontological language of rights and duties has played a big role. But consider carefully why today we value the rights and duties that we do, rather than those of past eras, and I think you’ll find that the most concise way to summarise the difference is that we place more value on welfare and preferences. In short, we are more utilitarian.

Two of the great utilitarian philosophers were Jeremy Bentham and John Stuart Mill, who died in the early and late 1800s respectively (today we have Peter Singer). On the basis of his utilitarian ethics, Bentham advocated for the abolition of slavery and capital punishment, gender equality, decriminalising homosexuality (an essay so radical at its time that it went unpublished for over a hundred years after Bentham’s death), and is especially known as one of the first defenders of animal rights. Mill also argued against slavery, and is especially known as an early advocate of women’s rights. Both were also important all-around liberals.

Nineteenth century utilitarians were good at holding moral views that were ahead of their time. I would not be surprised if the same were true today.

EA ideas 3: uncertainty

2020-07-26T09:42:00.006+01:00

2.0k words (7 minutes) Posts in this series:FIRST | PREVIOUS | NEXT

Moral uncertainty is uncertainty over the definition of good. For example, you might broadly accept utilitarianism, but still have some credence in deontological principles occasionally being more right.

Moral uncertainty is different from epistemic uncertainty (uncertainty about our knowledge, its sources, and uncertainty over our degree of uncertainty about these things). In practice these often mix – uncertainty over an action can easily involve both moral and epistemic uncertainty – but since is-ought confusions are a common trap in any discussion, it is good to keep these ideas firmly separate.

Dealing with moral uncertainty

Thinking about moral uncertainty quickly gets us into deep philosophical waters.

How do we decide which action to take? One approach is called “My Favourite Theory” (MFT), which is to act entirely in accordance to the moral theory you think is most likely to be correct. There are a number of counterarguments, many of which involve around problems of how we draw boundaries between theories: if you have 0.1 credence in each of 8 consequentialist theories and 0.2 credence in a deontological theory, should you really be a strict deontologist? (More fundamentally: say we have some credence in a family of moral systems with a continuous range of variants – say, differing by arbitrarily small differences in the weights assigned to various forms of happiness – does MFT require we reject this family of theories in favour of ones that vary only discretely, since in the former case the probability of a particular variant being correct is infinitesimal?). For a defence of MFT, see this paper.

If we reject MFT, when making decisions we have to somehow make comparisons between the recommendations of different moral systems. Some regard this as non-sensical; others write theses on how to do it (some of the same ground is covered in a much shorter space here; this paper also discusses the same concerns with MFT that I mentioned in the last paragraph, and problems with switching to “My Favourite Option” – acting according to the option that is most likely to be correct, summed over all moral theories you have credence in).

Another less specific idea is the parliamentary model. Imagine that all moral theories you have some credence in send delegates to a parliament, who can then negotiate, bargain, and vote their way to a conclusion. We can imagine delegates for a low-credence theory generally being overruled, but, on the issues most important to that theory, being able to bargain their way to changing the result.

(In a nice touch of subtlety, the authors take care to specify that though the parliament acts according to a typical 50%-to-pass principle, the delegates act as if they believe that the percent of votes for an action is the probability that it will happen, removing the perverse incentives generated by an arbitrary threshold.)

As an example of other sorts of meta-ethical considerations, Robin Hanson compares the process of fitting a moral theory to our moral intuitions to fitting a curve (the theory) to a set of data points (our moral intuitions). He argues that there’s enough uncertainty over these intuitions that we should take heed of a basic principle of curve-fitting: keep it simple, or otherwise you will overfit, and your curve will veer off in one direction or another when you try to extrapolate.

Mixed moral and epistemic uncertainty

Cause X

We are probably committing a moral atrocity without being aware of it.

This is argued here. The first argument is that past societies have been unaware of serious moral problems and we don’t have strong enough reasons to believe ourselves exempt from this rule. The second is that there are many sources of potential moral catastrophe – there are very many ways of being wrong about ethics or being wrong about key facts – so though we can’t point to any specific likely failure mode with huge consequences, the probability that at least one exists isn’t low.

In addition to an ongoing moral catastrophe, it could be that we are overlooking an opportunity to achieve a lot of good for cheap. In either case there would be a cause, dubbed Cause X, which would be a completely unknown but extremely important way of improving the world.

(In either case, the cause would likely involve both moral and epistemic failure: we’ve both failed to think carefully enough about ethics to see what it implies, and failed to spot important facts about the world.)

“Overlooked moral problem” immediately invites everyone to imagine their pet cause. That is not what Cause X is about. Imagine a world where every cause you support triumphed. What would still be wrong about this world? Some starting points for answering this are presented here.

If you say “nothing”, consider MacAskill’s anecdote in the previous link: Aristotle was smart and spent his life thinking about ethics, but still thought slavery made sense.

Types of epistemic uncertainty

I use the term "epistemic uncertainty" because the concept is broader than just uncertainty over facts. For example, our brains are flawed in predictable ways, and dealing with this is different from dealing with being wrong or having incomplete information about a specific fact.

Flawed brains

A basic cause for uncertainty is that human brains make mistakes. Especially important are biases, which consistently make our thinking wrong in the same way. This is a big and important topic; the classic book is Kahneman’s Thinking, Fast and Slow, but if you prefer sprawling and arcane chains of blog posts, you’ll find plenty here. I will only briefly mention some examples.

The most important bias to avoid when thinking about EA may be scope neglect. In short, people don’t automatically multiply. It is the image of a starving child that counts in your brain, and your brain gives this image the same weight whether the number you see on the page has three zeros or six after it. Trying to reason about any big problem without being very mindful of scope neglect is like trying to captain a ship that has no bottom: you will sink before you move anywhere.

Many biases are difficult to counter, but occasionally someone thinks of a clever trick. Status quo bias is a preference for keeping things as they are. It can often be spotted through the reversal test. For example, say you argue that we shouldn’t lengthen human lifespans further. Ask yourself: should we then decrease life expectancy? If you think that we should have neither more nor less of something, you should also have a good reason for why it just so happens that we have an optimum amount already. What are the chances that the best possible lifespan for humans also happens to be the highest one that present technology can achieve?

Crucial considerations

A crucial consideration is something that flips (or otherwise radically changes) the value of achieving a general goal.

For example, imagine your goal is to end raising cows for meat, because you want to prevent suffering. Now say there’s a fancy new brain-scanner that lets you determine that even though the cow ends up getting chucked into a meat grinder, on average the cow’s happiness is above the threshold for when non-existence is preferable to existence (assume this is a well-defined concept in your moral system). Your morals are the same as before, but now they’re telling you to raise more cows for meat.

An example of a chain of crucial considerations is whether or not we should develop some breakthrough but potentially dangerous technology, like AI or synthetic biology. We might think that the economic and personal benefits make it worth the expense, but a potential crucial consideration is the danger of accidents or misuse. There might be another crucial consideration that it’s better to have the technology developed internationally and in the open, rather than have advances made by rogue states.

There are probably many crucial considerations that are either unknown or unacknowledged, especially in areas that we haven’t thought about for very long.

Cluelessness

The idea of cluelessness is that we are extremely uncertain about the impact of every action. For example, making a car stop as you cross the street might affect a conception later that day, and might make the difference between the birth of a future Gandhi or Hitler later on. (Note that many non-consequentialist moral systems seem even more prone to cluelessness worries – William MacAskill points this out in this paper, and argues for it more informally here.)

I’m not sure I fully understand the concerns. I’m especially confused about what the practical consequences of cluelessness should be on our decision-making. Even if we’re mostly clueless about the consequences of our actions, we should base them on the small amount of information we do have. However, at the very least it’s worth keeping in mind just how big uncertainty over consequences can be, and there are a bunch of philosophy paper topics here.

For more on cluelessness, see for example:

Simplifying Cluelessness (an argument that cluelessness is an important and real consideration)
an in-depth look at different forms of cluelessness
the author of the previous paper discussing related ideas in a podcast (a transcript is available).

Reality is underpowered

Imagine we resolve all of our uncertainties over moral philosophy, iron out the philosophical questions posed by cluelessness, confidently identify Cause X, avoid biases, find all crucial considerations, and all that remains is the relatively down-to-earth work of figuring out which interventions are most effective. You might think this is simple: run a bunch of randomised controlled trials (RCTs) on different interventions, publish the papers, and maybe wait for a meta-analysis to combine the results of all relevant papers before concluding that the matter is solved.

Unfortunately, it’s often the case that reality is underpowered (in the statistical sense): we can’t run the experiments or collect the data that we’d need to answer our questions.

To take an extreme example, there are many different factors that affect a country’s development. To really settle the issue, we might make groups of, say, a dozen countries each, give them different amounts of the development factors (holding everything else fairly constant), watch them develop over 100 years, run a statistical analysis of the outcomes, and then draw conclusions about how much the factors matter. But try finding hundreds of identical countries with persuadable national leaders (and at least one country must have a science ethics board that lets this study go forwards).

To make a metaphor with a different sort of power: the answers to our questions (on what effects are the most important in driving some phenomenon, or which intervention is the most effective) exist, sharp and clear, but the telescopes with which we try to see them aren’t good enough. The best we can do is interpret the smudges we do see, inferring as much as we can without the brute force of an RCT.

This is an obvious point, but an important one to keep in mind to temper the rush to say we can answer everything if only we run the right study.

Conclusions?

All this uncertainty might seem to imply two conclusions. I support one of them but not the other.

The first conclusion is that the goal of doing good is complicated and difficult (as is the subgoal of having accurate beliefs about the world). This is true, and important to remember. It is tempting to forget analysis and fall back on feelings of righteousness, or to switch to easier questions like “what feels right?” or “what does society say is right?”

The second conclusion is that this uncertainty means we should try less. This is wrong. Uncertainties may rightly redirect efforts towards more research, and reducing key uncertainties is probably one of the best things we can do, but there’s no reason why they should make us reduce our efforts.

Uncertainty and confusion are properties of minds, not reality; they exist on the map, not the territory. To every well-formed question there is an answer. We need only find it.

EA ideas 2: expected value and risk neutrality

2020-07-25T10:27:00.011+01:00

2.6k words (9 minutes) Posts in this series:
PREVIOUS | NEXT

The expected value (EV) of an event / choice / random variable is the sum, over all possible outcomes, of {value of outcome} times {probability of that outcome} (if all outcomes are equally likely, it is the average; if they’re not, it’s the probability-weighted average).

In general, a rational agent makes decisions that maximise the expected value of the things they care about. However, EV reasoning involves more subtleties than its mathematical simplicity suggests, in both the real world and in thought experiments.

Is a 50% chance of 1000€ exactly as good as a certain gain of 500€ (that is, are we risk-neutral?), or a 50% chance of 2000€ with a 50% chance of a 1000€ loss instead?

Not necessarily. A bunch of research (and common sense) says people put decreasing value on an additional unit of money: the thousandth euro is worth more than the ten-thousandth. For example, average happiness scales roughly logarithmically with per-capita GDP. The thing to maximise in a monetary tradeoff is not the money, but the value you place on money; with a logarithmic relationship, the diminishing returns mean that more certain bets are better than naive EV-of-money reasoning implies. A related reason is that people weight losses more than gains, which makes the third case look worse than the first even if you don’t assume a logarithmic money->value function.

However, a (selfish) rational agent will still maximise EV in such decisions – not of money, but of what they get from it.

(If you’re not selfish and live in a world where money can be transferred easily, the marginal benefit curve of efficiently targeted donations is essentially flat for a very long time – a single person will hit quickly diminishing returns after getting some amount of money, but there are enough poor people in the world that enormous resources are needed before you need to worry about everyone reaching the point of very low marginal benefit from more money. To fix the old saying, albeit with some hit to its catchiness: “money can buy happiness only (roughly) logarithmically for yourself, but (almost) linearly in the world at large, given efficient targeting”.)

In some cases, we don’t need to worry about wonky thing->value functions. Imagine the three scenarios above, but instead of euros we have lives. Each life has the same value; there’s no reasonable argument for the thousandth life being worth less than the first. Simple EV reasoning is the right tool.

Why expected value?

This conclusion easily invites a certain hesitation. Any decision involving hundreds of lives is a momentous one; how can we be sure of exactly the right way to value these decisions, even in simplified thought experiments? What’s so great about EV?

A strong argument is that maximising EV is the strategy that leads to the greatest good over many decisions. In a single decision, a risky but EV-maximising choice can backfire – you might take a 50-50 bet of saving 1000 lives and lose, in which case you’ll have done much worse than picking an option of certainly saving 400. However, it’s a mathematical fact that given enough such choices, the actual average value will tend towards the EV. So maximising EV is what results in the most value in the long run.

You might argue that we’re not often met with dozens of similar momentous decisions. Say that we’re reasonably confident the same choice will never pop up again, and certainly not many times; doesn’t the above argument no longer apply? Take a slightly broader view though, and consider which strategy gets you the most value across all decisions you make (of which there will realistically be many, even if no single decision occurs twice): the answer is still EV maximisation. We could go on to construct crazier thought experiments – toy universes in which only one decision ever occurs, for example – and then the argument really begins to break down (though you might try to save it by some wild scheme of imagining many hypothetical agents faced with the same choice and consider a Kantian / rule-utilitarian principle of deciding by answering the question of which strategy would be right if it were the one adopted across all countless hypothetical instances of this decision).

There are other arguments too. Imagine 1000 people are about to die of a disease, and you have to decide between a cure that will certainly cure 400 versus an experimental one that will either cure everyone or save no-one. Imagine you are one of these people. In the first scenario, you have a 40% chance of living; in the second, a 50% chance. Which would you prefer?

On a more mathematical level, von Neumann (an all-around polymath) and Morgenstern (co-founder of game theory with von Neumann) have proved that under fairly basic assumptions of what is rational behaviour, a rational agent acts as if they’re maximising the EV of some preference function.

Problems with EV

Diabolical philosophers have managed to dream up many challenges for EV reasoning. For example, imagine there’s two dollars on the table. You toss a coin; if it’s heads you take the money on the table, if it’s tails the money on the table doubles and you toss again. You have a 1/2 chance of winning 2 dollars, 1/4 chance of winning 4, 1/8 chance of winning 8, and so on, for a total EV of 1/2 x 2 + 1/4 x 4 + … = 1 + 1 + … . The sequence diverges to infinity.

Imagine a choice: one game of the “St. Petersburg lottery” described above, or a million dollars. You’d be crazy not to pick the latter.

Is this a challenge to the principle of maximising EV? Not in our universe. We know that whatever casino we’re playing at can’t have an infinite amount of money, so we’re wise to intuitively reject the St. Petersburg lottery. (This section on Wikipedia has a very nice demonstration of why, even if the casino is backed by Bill Gates’s net worth, the EV of the St. Petersburg game is less than $40.)

The St. Petersburg lottery isn’t the weirdest EV paradox by half, though. In the Pasadena game, the EV is undefined (see the link for a definition, analysis, and an argument that such scenarios are points against EV-only decision-making). Nick Bostrom writes about the problems of consequentialist ethics in an infinite universe (or a universe that has a finite probability of being infinite) here.

There’s also the classic: Pascal’s wager, the idea that even if the probability of god existing is extremely low, the benefits (an eternity in heaven) are great enough that you should seek to believe in god and live a life of Christian virtue.

Unlike even Bostrom’s infinite ethics, Pascal’s wager is straightforwardly silly. We have no reason to privilege the hypothesis of a Christian god over the hypothesis – equally probable given the evidence we have – that there’s a god who punishes us exactly for what the Christian god rewards us for, or that god is a chicken and condemns all chicken-eaters to an eternity of hell. So even if you accept the mathematically dubious multiplication of infinities, Pascal’s wager doesn’t let you make an informed decision one way or another.

However, the general format of Pascal’s wager – big values multiplied by small probabilities – is the cause of much of EV-related craziness, and dealing with such situations is a good example of how naive EV reasoning can go wrong. The more general case is often referred to as Pascal’s mugging, and exemplified by the scenario (see link) where a mugger threatens to torture an astronomical amount of people unless you give them a small amount of money.

Tempering EV extremeness with Bayesian updating

Something similar to Pascal’s mugging easily happens if you calculate EVs by multiplying together very rough guesses involving small probabilities and huge outcomes.

The best and most general approach to these sorts of issues is laid out here.

The key insight is to remember two things. First, every estimate is a probability distribution: if you measure a nail or estimate the effectiveness of a charity, the result isn’t just your best-guess value, but also the uncertainty surrounding it. Second, Bayesian updating is how you change your estimates when given new evidence (and hence you should pay attention to your prior: the estimate you have before getting the new information).

Using some maths detailed here, it can be shown that if your prior and measurement both follow normal distributions, then your new (Bayesian) estimate will be another normal distribution, with a mean (=expected value) that is an average of the prior and measurement means, weighted by the inverse variance of the two distributions. (Note that the link does it with log-normal distributions, but the result is the same; just switch between variables and their logarithms.)

Here’s an interactive graph that lets you visualise this.

The results are pretty intuitive. Let’s say our prior for the effectiveness of some intervention has a mean of zero. If we take a measurement with low variance, our updated estimate probability distribution will shift most of the way towards our new measurement, and its variance will decrease (it will become narrower):

Red is the probability distribution of our prior estimate. Green is our measurement. Black is our new belief, after a Bayesian update of our prior with the measurement. Dotted lines show the EV (=average, since the distributions are symmetrical) for each probability distribution. You can imagine the x-axis as either a linear or log scale.

If the same measurement has greater variance, our estimates shift less:

And if we have a very imprecise measurement – for example, we’ve multiplied a bunch of rough guesses together – the estimate barely shifts even if the estimate is high:

Of course, we can argue about what our priors should be – perhaps, for many of the hypothetical scenarios with potentially massive benefits (for instance concerning potential space colonisation in the future), the variance of our prior should be very large, in which case even highly uncertain guesses will shift our best-guess EV a lot. But the overall point still stands: if you go to your calculator, punch in some numbers, and conclude you’ve discovered something massively more important than anything else, it’s time to think very carefully about how much you can really conclude.

Overall, I think this is a good example of how a bit of maths can knock off quite a few teeth from a philosophical problem.

(Here’s a link to a wider look at pitfalls of overly simple EV reasoning with a different framing, by the same author as this earlier link. And here is another exploration of the special considerations involved with low-probability, high-stakes risks.)

Risk neutrality

An implication of EV maximisation as a decision framework is risk neutrality: when you’ve measured things in units of what you actually care about (e.g. converting money to the value it has for you as discussed above), you should be neutral about the choice between 10% chance of 10 value units and 100% chance of 1, and you really should prefer a 10% chance of 11 “value units” over a 100% chance of 1 “value unit”, or a 50-50 bet between losing 10 and gaining 20 over a certain gain of 14.

This is not an intuitive conclusion, but I think we can be fairly confident in its correctness. Not only do we have robust theoretical reasons for using EV, but we can point to specific bugs in our brains that makes us balk at risk-neutrality: biases like scope neglect, which makes humans underestimate the difference between big and small effects, or loss aversion, which makes losses more salient than gains, or a preference for certainty.

$$$%%IF YOU SEE DOLLAR SIGNS IN THE NEXT SECTION, EQUATION RENDERING VIA MATHJAX IS NOT WORKING IN YOUR BROWSER$$$

Stochastic dominance (an aside)

Risk neutrality is not necessarily specific to EV maximisation. There’s a far more lenient, though also far more incomplete, principle of rational decision making that goes under the clumsy name of “stochastic dominance”: given options $$A$$ and $$B$$, if the probability of a payoff of $$X$$ or greater is more under option $$A$$ than option $$B$$ for all values of $$X$$, then $$A$$ “stochastically dominates” option B and should be preferred. It’s very hard to argue against stochastic dominance.

Consider a risky and a safe bet; to be precise, call them option $$A$$, with a small probability $$p$$ of a large payoff $$L$$, and option $$B$$, with a certain small payoff $$S$$. Assume that $$pL > S$$, so EV maximising says to take option $$A$$. However, we don’t have stochastic dominance: the probability of getting a small amount of value $$v$$ ($$v < S$$) is greater with $$B$$ than $$A$$, whereas the probability of getting a large amount of value ($$S < v < L$$) is greater with option $$A$$.

The insight of this paper (summarised here) is that if we care about the total amount of value in the universe, are sufficiently uncertain about this total amount, and make some assumptions about its distribution, then stochastic dominance alone implies a high level of risk neutrality.

The argument goes as follows: we have some estimate of the probability distribution $$U$$ of value that might exist in the universe. We care about the entire universe, not just the local effects of our decision, so what we consider is $$A + U$$ and $$B + U$$ rather than $$A$$ and $$B$$. Now consider an amount of value $$v$$. The probability that $$A + U$$ exceeds $$v$$ is the probability that $$U > v$$, plus the probability that $$(v - L) < U < v$$ and $$A$$ pays off $$L$$ (we called this probability $$p$$ earlier). The probability that $$B + U$$ exceeds $$v$$ is the probability that $$U > v - S$$.

Is the first probability greater? This depends on the shape of the distribution of $$U$$ (to be precise, we’re asking whether $$P(U > v) + p P(v - L < U < v) > P(U > v - S)$$, which clearly depends on $$U$$). If you do a bunch of maths (which is present in the paper linked above; I haven’t looked through it), it turns out that this is true for all $$v$$ – and hence we have stochastic dominance of $$A$$ over $$B$$ – if the distribution of $$U$$ is wide enough and has a fat tail (i.e. trails off slowly as $$v$$ increases).

What’s especially neat is that this automatically excludes Pascal’s mugging. The smaller the probability $$p$$ of our payoff is, the more stringent the criteria get: we need a wider and wider distribution of $$U$$ before $$A$$ stochastically dominates $$B$$, and at some point even the most stringent Pascalian must admit $$U$$ can’t plausibly have that wide of a distribution.

It’s far from clear what $$U$$’s shape is, and hence how strong this reasoning is (see the links above for that). However, it is a good example of how easily benign background assumptions introduce risk neutrality into the problem of rational choice.

Implications of risk neutrality: hits-based giving

What does risk neutrality imply about real-world altruism? In short, that we should be willing to take risks.

A good overview of these considerations is given in this article. The key point:

[W]e suspect that, in fact, much of the best philanthropy is likely to fail.

For example, GiveWell thinks that Deworm the World Initiative probably has low impact, but still recommends them as one of their top charities because there’s a chance of massive impacts.

Hits-based giving comes with its own share of problems. As the article linked above notes, it can provide a cover for arrogance and make it harder to be open about decision-making. However, just as high-risk high-reward projects make up a disproportionate share of successes in scientific research and entrepreneurship, we shouldn’t be surprised if the bulk of returns on charity comes from a small number of risky bets.