Cold Takes

Good job opportunities for helping with the most important century

Holden Karnofsky — Thu, 18 Jan 2024 17:25:40 GMT

Yes, this is my first post in almost a year. I’m no longer prioritizing this blog, but I will still occasionally post something.

I wrote ~2 years ago that it was hard to point to concrete opportunities to help the most important century go well. That’s changing.

There are a good number of jobs available now that are both really promising opportunities to help (in my opinion) and are suitable for people without a lot of pre-existing knowledge of AI risk (or even AI). The jobs are demanding, but unlike many of the job openings that existed a couple of years ago, they are at well-developed organizations and involve relatively clear goals.

So if you’re someone who wants to help, but has been waiting for the right moment, this might be it. (Or not! I’ll probably keep making posts like this as the set of opportunities gets wider.)

Here are the jobs that best fit this description right now, as far as I can tell. The rest of this post will give a bit more detail on how these jobs can help, what skills they require and why these are the ones I listed.

Organization	Location	Jobs	Link
UK AI Safety Institute	London (remote work possible within the UK)	Engineering and frontend roles, cybersecurity roles	Here
AAAS, Horizon Institute for Public Service, Tech Congress	Washington, DC	Fellowships serving as entry points into US policy roles	Here
AI companies: Google DeepMind, OpenAI, Anthropic¹	San Francisco and London (with some other offices and remote work options)	Preparedness/Responsible Scaling roles; alignment research roles	Here, here, here, here
Model Evaluation and Threat Research (METR) (fewer roles available)	Berkeley (with remote work options)	Engineering and data roles	Here

Software engineering and development (and related areas) seem especially valuable right now, so think about whether you know folks with those skills who might be interested!

Added 6/25/2024: note that 80,000 Hours is working on a census of people interested in AI catastrophic risks; people who fill it out will be contacted if good-fit jobs come up. You can fill it out here.

How these help

A lot of these jobs (and the ones I know the most about) would be contributing toward a possible global standards regime for AI: AI systems should be subject to testing to see whether they present major risks, and training/deploying AI should be stopped (e.g., by regulation) when it can’t be done safely.

The basic hope is:

Teams will develop “evals”: tests of what AIs are capable of, particularly with respect to possible risks. For example, one eval might be prompting an AI to give a detailed description of how to build a bioweapon; the more detailed and accurate its response, the more risk the AI poses (while also possibly having more potential benefits as well, by virtue of being generally more knowledgeable/capable).
It will become common (through regulation, voluntary action by companies, industry standards, etc.) for cutting-edge AI systems to be subject to evals for dangerous capabilities.
When evals reveal risk, they will trigger required mitigations. For example:
1. An AI capable of bioweapons development should be (a) deployed in such a way that people can’t use it for that (including by “jailbreaking” it), and (b) kept under good security to stop would-be terrorists from circumventing the restrictions.
2. AIs with stronger and more dangerous capabilities might require very challenging mitigations, possibly beyond what anyone knows how to do today (for example, rigorous demonstrations that an AI won’t have dangerous unintended aims, even if this sort of thing is hard to measure).
Ideally, we’d eventually build a robust international governance regime (comparisons have been made to nuclear non-proliferation regimes) that reliably enforces rules like these, while safe and beneficial AI goes forward. But my view is that even dramatically weaker setups can still help a lot (some more on this theme here.)

The jobs above include designing and running evals (#1-#2); designing and implementing company policies² that can serve as early versions of #3 (I don’t think company self-enforcement is good enough in the long run, but I do think it’s a good start for iterating on the policies and practices); and national and international policy dialogues that could work toward #4.

I think of international standards like these as a central piece of the picture for helping the most important century go better, and it’s the piece I’ve been focused on over the last year. Some benefits of a good regime could include:

Catching early warnings of dangers from advanced AI, to build consensus worldwide around how to handle them. (Throughout this section, when I talk about “dangers” and “safety” I mean them very broadly - not just alignment risk but also concerns about power imbalances, new kinds of minds, etc.)
Delaying AI development if (and wherever) it can’t be done safely.
Changing incentives for AI developers worldwide: progress on protective measures (safety research, information security, and more) would become necessary to deploy powerful systems, so these things could become top priorities rather than side projects.
Having a framework in which, if the first highly advanced AIs are safe, it’s possible to (partly with AI help) prevent reckless or malicious actors from eventually building dangerous ones (more).

The organizations I’ve listed also offer a number of other jobs that involve working on AI safety research, working out other aspects of AI regulation, etc.

What skills are needed

A lot of these jobs are related to software engineering (and other software development, e.g. frontend and UX development). My sense is that this doesn’t have to be AI-specific experience - anyone with a strong background working on or with software engineering and software development might be a fit with several of these roles.

In addition, all of these are fairly large-scale projects and organizations that might have openings for generalists as well. I’d expect project management and research to be particularly useful background skills.

This isn’t everyone, but the set of potential candidates seems wider (and the jobs for such people more promising) than in the past!

So

If you might be a good candidate, please apply!

If not, something you can do is take 10 minutes to think of people who might be, and shoot them a note.

Thank you!

Appendix: how I decided which jobs to list

For this post, I wanted to focus on jobs suitable for people who aren’t already steeped in the world of AI and AI risk, and that would present smoother experiences than trying to start a new organization or join one that’s very new, small, and/or fluid in its goals. As one proxy for this, I looked for organizations with strong AI focus listing at least 10 open roles on the 80,000 Hours job board (which generally tries to list roles focused on AI risk rather than any and all AI roles). The set of jobs I found this way was a pretty good match for the set I would’ve brainstormed on my own based on the criteria.³ I added METR even though it’s smaller than the other orgs, because I advise METR regularly enough to know it fairly well, and felt comfortable advertising the roles.

I’m sure I didn’t choose these perfectly, and this post isn’t meant to be a remotely exhaustive list - just a plug for some jobs I understand especially well and think could be good opportunities for readers.

Footnotes

I am married to the President of Anthropic and have a financial interest in both Anthropic and OpenAI via my spouse. I listed these companies in order of the number of employees (descending) according to Google. To be clear, I am not involved in hiring at any of the listed organizations and can’t help anyone get a job there; I’m just broadcasting the opportunities. ↩
See Anthropic’s responsible scaling policy and OpenAI’s Preparedness Framework. ↩
There was one organization (aside from METR) that didn’t list more than 5 roles, but that I thought belonged anyway, so I just added it. ↩

What does Bing Chat tell us about AI risk?

Holden Karnofsky — Tue, 28 Feb 2023 17:38:58 GMT

Image from here via this tweet

ICYMI, Microsoft has released a beta version of an AI chatbot called “the new Bing” with both impressive capabilities and some scary behavior. (I don’t have access. I’m going off of tweets and articles.)

Zvi Mowshowitz lists examples here - highly recommended. Bing has threatened users, called them liars, insisted it was in love with one (and argued back when he said he loved his wife), and much more.

Are these the first signs of the risks I’ve written about? I’m not sure, but I’d say yes and no.

Let’s start with the “no” side.

My understanding of how Bing Chat was trained probably does not leave much room for the kinds of issues I address here. My best guess at why Bing Chat does some of these weird things is closer to “It’s acting out a kind of story it’s seen before” than to “It has developed its own goals due to ambitious, trial-and-error based development.” (Although “acting out a story” could be dangerous too!)
My (zero-inside-info) best guess at why Bing Chat acts so much weirder than ChatGPT is in line with Gwern’s guess here. To oversimplify, there’s a particular type of training that seems to make a chatbot generally more polite and cooperative and less prone to disturbing content, and it’s possible that Bing Chat incorporated less of this than ChatGPT. This could be straightforward to fix.
Bing Chat does not (even remotely) seem to pose a risk of global catastrophe itself.

On the other hand, there is a broader point that I think Bing Chat illustrates nicely: companies are racing to build bigger and bigger “digital brains” while having very little idea what’s going on inside those “brains.” The very fact that this situation is so unclear - that there’s been no clear explanation of why Bing Chat is behaving the way it is - seems central, and disturbing.

AI systems like this are (to simplify) designed something like this: “Show the AI a lot of words from the Internet; have it predict the next word it will see, and learn from its success or failure, a mind-bending number of times.” You can do something like that, and spend huge amounts of money and time on it, and out will pop some kind of AI. If it then turns out to be good or bad at writing, good or bad at math, polite or hostile, funny or serious (or all of these depending on just how you talk to it) ... you’ll have to speculate about why this is. You just don’t know what you just made.

We’re building more and more powerful AIs. Do they “want” things or “feel” things or aim for things, and what are those things? We can argue about it, but we don’t know. And if we keep going like this, these mysterious new minds will (I’m guessing) eventually be powerful enough to defeat all of humanity, if they were turned toward that goal.

And if nothing changes about attitudes and market dynamics, minds that powerful could end up rushed to customers in a mad dash to capture market share.

That’s the path the world seems to be on at the moment. It might end well and it might not, but it seems like we are on track for a heck of a roll of the dice.

(And to be clear, I do expect Bing Chat to act less weird over time. Changing an AI’s behavior is straightforward, but that might not be enough, and might even provide false reassurance.)

How major governments can help with the most important century

Holden Karnofsky — Fri, 24 Feb 2023 18:17:29 GMT

I’ve been writing about tangible things we can do today to help the most important century go well. Previously, I wrote about helpful messages to spread; how to help via full-time work; and how major AI companies can help.

What about major governments¹ - what can they be doing today to help?

I think governments could play crucial roles in the future. For example, see my discussion of standards and monitoring.

However, I’m honestly nervous about most possible ways that governments could get involved in AI development and regulation today.

I think we still know very little about what key future situations will look like, which is why my discussion of AI companies (previous piece) emphasizes doing things that have limited downsides and are useful in a wide variety of possible futures.
I think governments are “stickier” than companies - I think they have a much harder time getting rid of processes, rules, etc. that no longer make sense. So in many ways I’d rather see them keep their options open for the future by not committing to specific regulations, processes, projects, etc. now.
I worry that governments, at least as they stand today, are far too oriented toward the competition frame (“we have to develop powerful AI systems before other countries do”) and not receptive enough to the caution frame (“We should worry that AI systems could be dangerous to everyone at once, and consider cooperating internationally to reduce risk”). (This concern also applies to companies, but see footnote.²)

(Click to expand) The “competition” frame vs. the “caution” frame”

What AI companies can do today to help with the most important century

Holden Karnofsky — Mon, 20 Feb 2023 16:58:21 GMT

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

I’ve been writing about tangible things we can do today to help the most important century go well. Previously, I wrote about helpful messages to spread and how to help via full-time work.

This piece is about what major AI companies can do (and not do) to be helpful. By “major AI companies,” I mean the sorts of AI companies that are advancing the state of the art, and/or could play a major role in how very powerful AI systems end up getting used.¹

This piece could be useful to people who work at those companies, or people who are just curious.

Generally, these are not pie-in-the-sky suggestions - I can name² more than one AI company that has at least made a serious effort at each of the things I discuss below (beyond what it would do if everyone at the company were singularly focused on making a profit).³

I’ll cover:

Prioritizing alignment research, strong security, and safety standards (all of which I’ve written about previously).
Avoiding hype and acceleration, which I think could leave us with less time to prepare for key risks.
Preparing for difficult decisions ahead: setting up governance, employee expectations, investor expectations, etc. so that the company is capable of doing non-profit-maximizing things to help avoid catastrophe in the future.
Balancing these cautionary measures with conventional/financial success.
I’ll also list a few things that some AI companies present as important, but which I’m less excited about, e.g. censorship of AI models, and raising awareness of AI with governments and the public. I don’t think all these things are necessarily bad, but I think some are, and I’m skeptical that any are crucial for the risks I’ve focused on.

I previously laid out a summary of how I see the major risks of advanced AI, and four key things I think can help (alignment research; strong security; standards and monitoring; successful, careful AI projects). I won’t repeat that summary now, but it might be helpful for orienting you if you don’t remember the rest of this series too well; click here to read it.

Some basics: alignment research, strong security, safety standards

First off, AI companies can contribute to the “things that can help” I listed above:

They can prioritize alignment research (and other technical research, e.g. threat assessment research and misuse research).
- For example, they can prioritize hiring for safety teams, empowering these teams, encouraging their best flexible researchers to work on safety, aiming for high-quality research that targets crucial challenges, etc.
- It could also be important for AI companies to find ways to partner with outside safety researchers rather than rely solely on their own teams. As discussed previously, this could be challenging. But I generally expect that AI companies that care a lot about safety research partnerships will find ways to make them work.
They can help work toward a standards and monitoring regime. E.g., they can do their own work to come up with standards like "An AI system is dangerous if we observe that it's able to ___, and if we observe this we will take safety and security measures such as ____." They can also consult with others developing safety standards, voluntarily self-regulate beyond what’s required by law, etc.
They can prioritize strong security, beyond what normal commercial incentives would call for.
- It could easily take years to build secure enough systems, processes and technologies for very high-stakes AI.
- It could be important to hire not only people to handle everyday security needs, but people to experiment with more exotic setups that could be needed later, as the incentives to steal AI get stronger.

(Click to expand) The challenge of securing dangerous AI

Jobs that can help with the most important century

Holden Karnofsky — Fri, 10 Feb 2023 18:19:22 GMT

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

Let’s say you’re convinced that AI could make this the most important century of all time for humanity. What can you do to help things go well instead of poorly?

I think the biggest opportunities come from a full-time job (and/or the money you make from it). I think people are generally far better at their jobs than they are at anything else.

This piece will list the jobs I think are especially high-value. I expect things will change (a lot) from year to year - this is my picture at the moment.

Here’s a summary:

Role	Skills/assets you'd need
Research and engineering on AI safety	Technical ability (but not necessarily AI background)
Information security to reduce the odds powerful AI is leaked	Security expertise or willingness/ability to start in junior roles (likely not AI)
Other roles at AI companies	Suitable for generalists (but major pros and cons)
Govt and govt-facing think tanks	Suitable for generalists (but probably takes a long time to have impact)
Jobs in politics	Suitable for generalists if you have a clear view on which politicians to help
Forecasting to get a better handle on what’s coming	Strong forecasting track record (can be pursued part-time)
"Meta" careers	Misc / suitable for generalists
Low-guidance options	These ~only make sense if you read & instantly think "That's me"

A few notes before I give more detail:

These jobs aren’t the be-all/end-all. I expect a lot to change in the future, including a general increase in the number of helpful jobs available.
Most of today’s opportunities are concentrated in the US and UK, where the biggest AI companies (and AI-focused nonprofits) are. This may change down the line.
Most of these aren’t jobs where you can just take instructions and apply narrow skills.
- The issues here are tricky, and your work will almost certainly be useless (or harmful) according to someone.
- I recommend forming your own views on the key risks of AI - and/or working for an organization whose leadership you’re confident in.
Staying open-minded and adaptable is crucial.
- I think it’s bad to rush into a mediocre fit with one of these jobs, and better (if necessary) to stay out of AI-related jobs while skilling up and waiting for a great fit.
- I don’t think it’s helpful (and it could be harmful) to take a fanatical, “This is the most important time ever - time to be a hero” attitude. Better to work intensely but sustainably, stay mentally healthy and make good decisions.

The first section of this piece will recap my basic picture of the major risks, and the promising ways to reduce these risks (feel free to skip if you think you’ve got a handle on this).

The next section will elaborate on the options in the table above.

After that, I’ll talk about some of the things you can do if you aren’t ready for a full-time career switch yet, and give some general advice for avoiding doing harm and burnout.

Recapping the major risks, and some things that could help

This is a quick recap of the major risks from transformative AI. For a longer treatment, see How we could stumble into an AI catastrophe, and for an even longer one see the full series. To skip to the next section, click here.

The backdrop: transformative AI could be developed in the coming decades. If we develop AI that can automate all the things humans do to advance science and technology, this could cause explosive technological progress that could bring us more quickly than most people imagine to a radically unfamiliar future.

Such AI could also be capable of defeating all of humanity combined, if it were pointed toward that goal.

(Click to expand) The most important century

Spreading messages to help with the most important century

Holden Karnofsky — Wed, 25 Jan 2023 18:11:57 GMT

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

In the most important century series, I argued that the 21st century could be the most important century ever for humanity, via the development of advanced AI systems that could dramatically speed up scientific and technological advancement, getting us more quickly than most people imagine to a deeply unfamiliar future.

In this more recent series, I’ve been trying to help answer this question: “So what? What can I do to help?”

So far, I’ve just been trying to build a picture of some of the major risks we might face (especially the risk of misaligned AI that could defeat all of humanity), what might be challenging about these risks, and why we might succeed anyway. Now I’ve finally gotten to the part where I can start laying out tangible ideas for how to help (beyond the pretty lame suggestions I gave before).

This piece is about one broad way to help: spreading messages that ought to be more widely understood.

One reason I think this topic is worth a whole piece is that practically everyone can help with spreading messages at least some, via things like talking to friends; writing explanations of your own that will appeal to particular people; and, yes, posting to Facebook and Twitter and all of that. Call it slacktivism if you want, but I’d guess it can be a big deal: many extremely important AI-related ideas are understood by vanishingly small numbers of people, and a bit more awareness could snowball. Especially because these topics often feel too “weird” for people to feel comfortable talking about them! Engaging in credible, reasonable ways could contribute to an overall background sense that it’s OK to take these ideas seriously.

And then there are a lot of potential readers who might have special opportunities to spread messages. Maybe they are professional communicators (journalists, bloggers, TV writers, novelists, TikTokers, etc.), maybe they’re non-professionals who still have sizable audiences (e.g., on Twitter), maybe they have unusual personal and professional networks, etc. Overall, the more you feel you are good at communicating with some important audience (even a small one), the more this post is for you.

That said, I’m not excited about blasting around hyper-simplified messages. As I hope this series has shown, the challenges that could lie ahead of us are complex and daunting, and shouting stuff like “AI is the biggest deal ever!” or “AI development should be illegal!” could do more harm than good (if only by associating important ideas with being annoying). Relatedly, I think it’s generally not good enough to spread the most broad/relatable/easy-to-agree-to version of each key idea, like “AI systems could harm society.” Some of the unintuitive details are crucial.

Instead, the gauntlet I’m throwing is: “find ways to help people understand the core parts of the challenges we might face, in as much detail as is feasible.” That is: the goal is to try to help people get to the point where they could maintain a reasonable position in a detailed back-and-forth, not just to get them to repeat a few words or nod along to a high-level take like “AI safety is important.” This is a lot harder than shouting “AI is the biggest deal ever!”, but I think it’s worth it, so I’m encouraging people to rise to the challenge and stretch their communication skills.

Below, I will:

Outline some general challenges of this sort of message-spreading.
Go through some ideas I think it’s risky to spread too far, at least in isolation.
Go through some of the ideas I’d be most excited to see spread.
Talk a little bit about how to spread ideas - but this is mostly up to you.

Here’s a simplified story for how spreading messages could go badly.

You’re trying to convince your friend to care more about AI risk.
You’re planning to argue: (a) AI could be really powerful and important within our lifetimes; (b) Building AI too quickly/incautiously could be dangerous.
- Your friend just isn’t going to care about (b) if they aren’t sold on some version of (a). So you’re starting with (a).
Unfortunately, (a) is easier to understand than (b). So you end up convincing your friend of (a), and not (yet) (b).
Your friend announces, “Aha - I see that AI could be tremendously powerful and important! I need to make sure that people/countries I like are first to build it!” and runs off to help build powerful AI as fast as possible. They’ve chosen the competition frame (“will the right or the wrong people build powerful AI first?”) over the caution frame (“will we screw things up and all lose?”), because the competition frame is easier to understand.
Why is this bad? See previous pieces on the importance of caution.

(Click to expand) More on the “competition” frame vs. the “caution” frame”

How we could stumble into AI catastrophe

Holden Karnofsky — Fri, 13 Jan 2023 16:18:04 GMT

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

This post will lay out a couple of stylized stories about how, if transformative AI is developed relatively soon, this could result in global catastrophe. (By “transformative AI,” I mean AI powerful and capable enough to bring about the sort of world-changing consequences I write about in my most important century series.)

This piece is more about visualizing possibilities than about providing arguments. For the latter, I recommend the rest of this series.

In the stories I’ll be telling, the world doesn't do much advance preparation or careful consideration of risks I’ve discussed previously, especially re: misaligned AI (AI forming dangerous goals of its own).

People do try to “test” AI systems for safety, and they do need to achieve some level of “safety” to commercialize. When early problems arise, they react to these problems.
But this isn’t enough, because of some unique challenges of measuring whether an AI system is “safe,” and because of the strong incentives to race forward with scaling up and deploying AI systems as fast as possible.
So we end up with a world run by misaligned AI - or, even if we’re lucky enough to avoid that outcome, other catastrophes are possible.

After laying these catastrophic possibilities, I’ll briefly note a few key ways we could do better, mostly as a reminder (these topics were covered in previous posts). Future pieces will get more specific about what we can be doing today to prepare.

Backdrop

This piece takes a lot of previous writing I’ve done as backdrop. Two key important assumptions (click to expand) are below; for more, see the rest of this series.

(Click to expand) “Most important century” assumption: we’ll soon develop very powerful AI systems, along the lines of what I previously called PASTA.

Transformative AI issues (not just misalignment): an overview

Holden Karnofsky — Thu, 05 Jan 2023 20:16:53 GMT

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

If this ends up being the most important century due to advanced AI, what are the key factors in whether things go well or poorly?

(Click to expand) More detail on why AI could make this the most important century

Racing through a minefield: the AI deployment problem

Holden Karnofsky — Thu, 22 Dec 2022 16:06:37 GMT

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

In previous pieces, I argued that there's a real and large risk of AI systems' developing dangerous goals of their own and defeating all of humanity - at least in the absence of specific efforts to prevent this from happening. I discussed why it could be hard to build AI systems without this risk and how it might be doable.

The “AI alignment problem” refers¹ to a technical problem: how can we design a powerful AI system that behaves as intended, rather than forming its own dangerous aims? This post is going to outline a broader political/strategic problem, the “deployment problem”: if you’re someone who might be on the cusp of developing extremely powerful (and maybe dangerous) AI systems, what should you … do?

The basic challenge is this:

If you race forward with building and using powerful AI systems as fast as possible, you might cause a global catastrophe (see links above).
If you move too slowly, though, you might just be waiting around for someone else less cautious to develop and deploy powerful, dangerous AI systems.
And if you can get to the point where your own systems are both powerful and safe … what then? Other people still might be less-cautiously building dangerous ones - what should we do about that?

My current analogy for the deployment problem is racing through a minefield: each player is hoping to be ahead of others, but anyone moving too quickly can cause a disaster. (In this minefield, a single mine is big enough to endanger all the racers.)

This post gives a high-level overview of how I see the kinds of developments that can lead to a good outcome, despite the “racing through a minefield” dynamic. It is distilled from a more detailed post on the Alignment Forum.

First, I’ll flesh out how I see the challenge we’re contending with, based on the premises above.

Next, I’ll list a number of things I hope that “cautious actors” (AI companies, governments, etc.) might do in order to prevent catastrophe.

Many of the actions I’m picturing are not the kind of things normal market and commercial incentives would push toward, and as such, I think there’s room for a ton of variation in whether the “racing through a minefield” challenge is handled well. Whether key decision-makers understand things like the case for misalignment risk (and in particular, why it might be hard to measure) - and are willing to lower their own chances of “winning the race” to improve the odds of a good outcome for everyone - could be crucial.

The basic premises of “racing through a minefield”

This piece is going to lean on previous pieces and assume all of the following things:

Transformative AI soon. This century, something like PASTA could be developed: AI systems that can effectively automate everything humans do to advance science and technology. This brings the potential for explosive progress in science and tech, getting us more quickly than most people imagine to a deeply unfamiliar future. I’ve argued for this possibility in the Most Important Century series.
Misalignment risk. As argued previously, there’s a significant risk that such AI systems could end up with misaligned goals of their own, leading them to defeat all of humanity. And it could take significant extra effort to get AI systems to be safe.
Ambiguity. As argued previously, it could be hard to know whether AI systems are dangerously misaligned, for a number of reasons. In particular, when we train AI systems not to behave dangerously, we might be unwittingly training them to obscure their dangerous potential from humans, and take dangerous actions only when humans would not be able to stop them. At the same time, I expect powerful AI systems will present massive opportunities to make money and gain power, such that many people will want to race forward with building and deploying them as fast as possible (perhaps even if they believe that doing so is risky for the world!)

So, one can imagine a scenario where some company is in the following situation:

It has good reason to think it’s on the cusp of developing extraordinarily powerful AI systems.
If it deploys such systems hastily, global disaster could result.
But if it moves too slowly, other, less cautious actors could deploy dangerous systems of their own.

That seems like a tough enough, high-stakes-enough, and likely enough situation that it’s worth thinking about how one is supposed to handle it.

One simplified way of thinking about this problem:

We might classify “actors” (companies, government projects, whatever might develop powerful AI systems or play an important role in how they’re deployed) as cautious (taking misalignment risk very seriously) or incautious (not so much).
Our basic hope is that at any given point in time, cautious actors collectively have the power to “contain” incautious actors. By “contain,” I mean: stop them from deploying misaligned AI systems, and/or stop the misaligned systems from causing a catastrophe.
Importantly, it could be important for cautious actors to use powerful AI systems to help with “containment” in one way or another. If cautious actors refrain from AI development entirely, it seems likely that incautious actors will end up with more powerful systems than cautious ones, which doesn’t seem good.

In this setup, cautious actors need to move fast enough that they can’t be overpowered by others’ AI systems, but slowly enough that they don’t cause disaster themselves. Hence the “racing through a minefield” analogy.

What success looks like

In a non-Cold-Takes piece, I explore the possible actions available to cautious actors to win the race through a minefield. This section will summarize the general categories - and, crucially, why we shouldn’t expect that companies, governments, etc. will do the right thing simply from natural (commercial and other) incentives.

I’ll be going through each of the following:

Alignment (charting a safe path through the minefield). Putting lots of effort into technical work to reduce the risk of misaligned AI.
Threat assessment (alerting others about the mines). Putting lots of effort into assessing the risk of misaligned AI, and potentially demonstrating it (to other actors) as well.
Avoiding races (to move more cautiously through the minefield). If different actors are racing to deploy powerful AI systems, this could make it unnecessarily hard to be cautious.
Selective information sharing (so the incautious don’t catch up). Sharing some information widely (e.g., technical insights about how to reduce misalignment risk), some selectively (e.g., demonstrations of how powerful and dangerous AI systems might be), and some not at all (e.g., the specific code that, if accessed by a hacker, would allow the hacker to deploy potentially dangerous AI systems themselves).
Global monitoring (noticing people about to step on mines, and stopping them). Working toward worldwide state-led monitoring efforts to identify and prevent “incautious” projects racing toward deploying dangerous AI systems.
Defensive deployment (staying ahead in the race). Deploying AI systems only when they are unlikely to cause a catastrophe - but also deploying them with urgency once they are safe, in order to help prevent problems from AI systems developed by less cautious actors.

Alignment (charting a safe path through the minefield²)

I previously wrote about some of the ways we might reduce the dangers of advanced AI systems. Broadly speaking:

Cautious actors might try to primarily build limited AI systems - AI systems that lack the kind of ambitious aims that lead to danger. They might ultimately be able to use these AI systems to do things like automating further safety research, making future less-limited systems safer.
Cautious actors might use AI checks and balances - that is, using some AI systems to supervise, critique and identify dangerous behavior in others, with special care taken to make it hard for AI systems to coordinate with each other against humans.
Cautious actors might use a variety of other techniques for making AI systems safer - particularly techniques that incorporate “digital neuroscience,” gauging the safety of an AI system by “reading its mind” rather than simply by watching out for dangerous behavior (the latter might be unreliable, as noted above).

A key point here is that making AI systems safe enough to commercialize (with some initial success and profits) could be much less (and different) effort than making them robustly safe (no lurking risk of global catastrophe). The basic reasons for this are covered in my previous post on difficulties with AI safety research In brief:

If AI systems behave dangerously, we can “train out” that behavior by providing negative reinforcement for it.
The concern is that when we do this, we might be unwittingly training AI systems to obscure their dangerous potential from humans, and take dangerous actions only when humans would not be able to stop them. (I call this the “King Lear problem: it's hard to know how someone will behave when they have power over you, based only on observing how they behave when they don't.”)
So we could end up with AI systems that behave safely and helpfully as far as we can tell in normal circumstances, while ultimately having ambitious, dangerous “aims” that they pursue when they become powerful enough and have the right opportunities.

Well-meaning AI companies with active ethics boards might do a lot of AI safety work, by training AIs not to behave in unhelpful or dangerous ways. But if they want to address the risks I’m focused on here, this could require safety measures that look very different - e.g., measures more reliant on “checks and balances” and “digital neuroscience.”

Threat assessment (alerting others about the mines)

In addition to making AI systems safer, cautious actors can also put effort into measuring and demonstrating how dangerous they are (or aren’t).

For the same reasons given in the previous section, it could take special efforts to find and demonstrate the kinds of dangers I’ve been discussing. Simply monitoring AI systems in the real world for bad behavior might not do it. It may be necessary to examine (or manipulate) their digital brains,³ design AI systems specifically to audit other AI systems for signs of danger; deliberately train AI systems to demonstrate particular dangerous patterns (while not being too dangerous!); etc.

Learning and demonstrating that the danger is high could help convince many actors to move more slowly and cautiously. Learning that the danger is low could lessen some of the tough tradeoffs here and allow cautious actors to move forward more decisively with developing advanced AI systems; I think this could be a good thing in terms of what sorts of actors lead the way on transformative AI.

Avoiding races (to move more cautiously through the minefield)

Here’s a dynamic I’d be sad about:

Company A is getting close to building very powerful AI systems. It would love to move slowly and be careful with these AIs, but it worries that if it moves too slowly, Company B will get there first, have less caution, and do some combination of “causing danger to the world” and “beating company A if the AIs turn out safe.”
Company B is getting close to building very powerful AI systems. It would love to move slowly and be careful with these AIs, but it worries that if it moves too slowly, Company A will get there first, have less caution, and do some combination of “causing danger to the world” and “beating company B if the AIs turn out safe.”

(Similar dynamics could apply to Country A and B, with national AI development projects.)

If Companies A and B would both “love to move slowly and be careful” if they could, it’s a shame that they’re both racing to beat each other. Maybe there’s a way to avoid this dynamic. For example, perhaps Companies A and B could strike a deal - anything from “collaboration and safety-related information sharing” to a merger. This could allow both to focus more on precautionary measures rather than on beating the other. Another way to avoid this dynamic is discussed below, under standards and monitoring.

“Finding ways to avoid a furious race” is not the kind of dynamic that emerges naturally from markets! In fact, working together along these lines would have to be well-designed to avoid running afoul of antitrust regulation.

Cautious actors might want to share certain kinds of information quite widely:

It could be crucial to raise awareness about the dangers of AI (which, as I’ve argued, won’t necessarily be obvious).
They might also want to widely share information that could be useful for reducing the risks (e.g., safety techniques that have worked well.)

At the same time, as long as there are incautious actors out there, information can be dangerous too:

Information about what cutting-edge AI systems can do - especially if it is powerful and impressive - could spur incautious actors to race harder toward developing powerful AI of their own (or give them an idea of how to build powerful systems, by giving them an idea of what sorts of abilities to aim for).
An AI’s “weights” (you can think of this sort of like its source code, though not exactly⁴) are potentially very dangerous. If hackers (including from a state cyberwarfare program) gain unauthorized access to an AI’s weights, this could be tantamount to stealing the AI system, and the actor that steals the system could be much less cautious than the actor who built it. Achieving a level of cybersecurity that rules this out could be extremely difficult, and potentially well beyond what one would normally aim for in a commercial context.

The lines between these categories of information might end up fuzzy. Some information might be useful for demonstrating the dangers and capabilities of cutting-edge systems, or useful for making systems safer and for building them in the first place. So there could be a lot of hard judgment calls here.

This is another area where I worry that commercial incentives might not be enough on their own. For example, it is usually important for a commercial project to have some reasonable level of security against hackers, but not necessarily for it to be able to resist well-resourced attempts by states to steal its intellectual property.

Global monitoring (noticing people about to step on mines, and stopping them)

Ideally, cautious actors would learn of every case where someone is building a dangerous AI system (whether purposefully or unwittingly), and be able to stop the project. If this were done reliably enough, it could take the teeth out of the threat; a partial version could buy time.

Here’s one vision for how this sort of thing could come about:

We (humanity) develop a reasonable set of tests for whether an AI system might be dangerous.
Today’s leading AI companies self-regulate by committing not to build or deploy a system that’s dangerous according to such a test (e.g., see Google’s 2018 statement, "We will not design or deploy AI in weapons or other technologies whose principal purpose or implementation is to cause or directly facilitate injury to people”). Even if some people at the companies would like to do so, it’s hard to pull this off once the company has committed not to.
As more AI companies are started, they feel soft pressure to do similar self-regulation, and refusing to do so is off-putting to potential employees, investors, etc.
Eventually, similar principles are incorporated into various government regulations and enforceable treaties.
Governments could monitor for dangerous projects using regulation and even overseas operations. E.g., today the US monitors (without permission) for various signs that other states might be developing nuclear weapons, and might try to stop such development with methods ranging from threats of sanctions to cyberwarfare or even military attacks. It could do something similar for any AI development projects that are using huge amounts of compute and haven’t volunteered information about their safety practices.

If the situation becomes very dire - i.e., it seems that there’s a high risk of dangerous AI being deployed imminently - I see the latter bullet point as one of the main potential hopes. In this case, governments might have to take drastic actions to monitor and stop dangerous projects, based on limited information.

Defensive deployment (staying ahead in the race)

I’ve emphasized the importance of caution: not deploying AI systems when we can’t be confident enough that they’re safe.

But when confidence can be achieved (how much confidence? See footnote⁵), powerful-and-safe AI can help reduce risks from other actors in many possible ways.

Some of this would be by helping with all of the above. Once AI systems can do a significant fraction of the things humans can do today, they might be able to contribute to each of the activities I’ve listed so far:

Alignment. AI systems might be able to contribute to AI safety research (as humans do), producing increasingly robust techniques for reducing risks.
Threat assessment. AI systems could help produce evidence and demonstrations about potential risks. They could be potentially useful for tasks like “Produce detailed explanations and demonstrations of possible sequences of events that could lead to AIs doing harm.”
Avoiding races. AI projects might make deals in which e.g. each project is allowed to use its AI systems to monitor for signs of risk from the others (ideally such systems would be designed to only share relevant information).
Selective information sharing. AI systems might contribute to strong security (e.g., by finding and patching security holes), and to dissemination (including by helping to better communicate about the level of risk and the best ways to reduce it).
Global monitoring. AI systems might be used (e.g., by governments) to monitor for signs of dangerous AI projects worldwide, and even to interfere with such projects. They might also be used as part of large voluntary self-regulation projects, along the lines of what I wrote just above under “Avoiding races.”

Additionally, if safe AI systems are in wide use, it could be harder for dangerous (similarly powerful) AI systems to do harm. This could be via a wide variety of mechanisms. For example:

If there’s widespread use of AI systems to patch and find security holes, similarly powered AI systems might have a harder time finding security holes to cause trouble with.
Misaligned AI systems could have more trouble making money, gaining allies, etc. in worlds where they are competing with similarly powerful but safe AI systems.

So?

I’ve gone into some detail about why we might have a challenging situation (“racing through a minefield”) if powerful AI systems (a) are developed fairly soon; (b) present significant risk of misalignment leading to humanity being defeated; (c) are not particularly easy to measure the safety of.

I’ve also talked about what I see as some of the key ways that “cautious actors” concerned about misaligned AI might navigate this situation.

I talk about some of the implications in my more detailed piece. Here I’m just going to name a couple of observations that jump out at me from this analysis:

This seems hard. If we end up in the future envisioned in this piece, I imagine this being extremely stressful and difficult. I’m picturing a world in which many companies, and even governments, can see the huge power and profit they might reap from deploying powerful AI systems before others - but we’re hoping that they instead move with caution (but not too much caution!), take the kinds of actions described above, and that ultimately cautious actors “win the race” against less cautious ones.

Even if AI alignment ends up being relatively easy - such that a given AI project can make safe, powerful systems with about 10% more effort than making dangerous, powerful systems - the situation still looks pretty nerve-wracking, because of how many different players could end up trying to build systems of their own without putting in that 10%.

A lot of the most helpful actions might be “out of the ordinary.” When racing through a minefield, I hope key actors will:

Put more effort into alignment, threat assessment, and security than is required by commercial incentives;
Consider measures for avoiding races and global monitoring that could be very unusual, even unprecedented.
Do all of this in the possible presence of ambiguous, confusing information about the risks.

As such, it could be very important whether key decision-makers (at both companies and governments) understand the risks and are prepared to act on them. Currently, I think we’re unfortunately very far from a world where this is true.

Additionally, I think AI projects can and should be taking measures today to make unusual-but-important measures more practical in the future. This could include things like:

Getting practice with selective information sharing. For example, building internal processes to decide on whether research should be published, rather than having a rule of “Publish everything, we’re like a research university” or “Publish nothing, we don’t want competitors seeing it.”
- I expect that early attempts at this will often be clumsy and get things wrong!
Getting practice with ways that AI companies could avoid races.
Getting practice with threat assessment. Even if today’s AI systems don’t seem like they could possibly be dangerous yet … how sure are we, and how do we know?
Prioritizing building AI systems that could do especially helpful things, such as contributing to AI safety research and threat assessment and patching security holes.
Establishing governance that is capable of making hard, non-commercially-optimal decisions for the good of humanity. A standard corporation could be sued for not deploying AI that poses a risk of global catastrophe - if this means a sacrifice for its bottom line. And a lot of the people making the final call at AI companies might be primarily thinking about their duties to shareholders (or simply unaware of the potential stakes of powerful enough AI systems). I’m excited about AI companies that are investing heavily in setting up governance structures - and investing in executives and board members - capable of making the hard calls well.

Footnotes

Generally, or at least, this is what I’d like it to refer to. ↩
Thanks to beta reader Ted Sanders for suggesting this analogy in place of the older one, “removing mines from the minefield.” ↩
One genre of testing that might be interesting: manipulating an AI system’s “digital brain” in order to simulate circumstances in which it has an opportunity to take over the world, and seeing whether it does so. This could be a way of dealing with the King Lear problem. More here. ↩
Modern AI systems tend to be trained with lots of trial-and-error. The actual code that is used to train them might be fairly simple and not very valuable on its own; but an expensive training process then generates a set of “weights” which are ~all one needs to make a fully functioning, relatively cheap copy of the AI system. ↩
I mean, this is part of the challenge. In theory, you should deploy an AI system if the risks of not doing so are greater than the risks of doing so. That’s going to depend on hard-to-assess information about how safe your system is and how dangerous and imminent others’ are, and it’s going to be easy to be biased in favor of “My systems are safer than others’; I should go for it.” Seems hard. ↩

High-level hopes for AI alignment

Holden Karnofsky — Thu, 15 Dec 2022 17:53:43 GMT

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

In previous pieces, I argued that there's a real and large risk of AI systems' aiming to defeat all of humanity combined - and succeeding.

I first argued that this sort of catastrophe would be likely without specific countermeasures to prevent it. I then argued that countermeasures could be challenging, due to some key difficulties of AI safety research.

But while I think misalignment risk is serious and presents major challenges, I don’t agree with sentiments¹ along the lines of “We haven’t figured out how to align an AI, so if transformative AI comes soon, we’re doomed.” Here I’m going to talk about some of my high-level hopes for how we might end up avoiding this risk.

I’ll first recap the challenge, using Ajeya Cotra’s young businessperson analogy to give a sense of some of the core difficulties. In a nutshell, once AI systems get capable enough, it could be hard to test whether they’re safe, because they might be able to deceive and manipulate us into getting the wrong read. Thus, trying to determine whether they’re safe might be something like “being an eight-year-old trying to decide between adult job candidates (some of whom are manipulative).”

I’ll then go through what I see as three key possibilities for navigating this situation:

Digital neuroscience: perhaps we’ll be able to read (and/or even rewrite) the “digital brains” of AI systems, so that we can know (and change) what they’re “aiming” to do directly - rather than having to infer it from their behavior. (Perhaps the eight-year-old is a mind-reader, or even a young Professor X.)
Limited AI: perhaps we can make AI systems safe by making them limited in various ways - e.g., by leaving certain kinds of information out of their training, designing them to be “myopic” (focused on short-run as opposed to long-run goals), or something along those lines. Maybe we can make “limited AI” that is nonetheless able to carry out particular helpful tasks - such as doing lots more research on how to achieve safety without the limitations. (Perhaps the eight-year-old can limit the authority or knowledge of their hire, and still get the company run successfully.)
AI checks and balances: perhaps we’ll be able to employ some AI systems to critique, supervise, and even rewrite others. Even if no single AI system would be safe on its own, the right “checks and balances” setup could ensure that human interests win out. (Perhaps the eight-year-old is able to get the job candidates to evaluate and critique each other, such that all the eight-year-old needs to do is verify basic factual claims to know who the best candidate is.)

These are some of the main categories of hopes that are pretty easy to picture today. Further work on AI safety research might result in further ideas (and the above are not exhaustive - see my more detailed piece, posted to the Alignment Forum rather than Cold Takes, for more).

I’ll talk about both challenges and reasons for hope here. I think that for the most part, these hopes look much better if AI projects are moving cautiously rather than racing furiously.

I don’t think we’re at the point of having much sense of how the hopes and challenges net out; the best I can do at this point is to say: “I don’t currently have much sympathy for someone who’s highly confident that AI takeover would or would not happen (that is, for anyone who thinks the odds of AI takeover … are under 10% or over 90%).”

The challenge

This is all recapping previous pieces. If you remember them super well, skip to the next section.

In previous pieces, I argued that:

The coming decades could see the development of AI systems that could automate - and dramatically speed up - scientific and technological advancement, getting us more quickly than most people imagine to a deeply unfamiliar future. (More: The Most Important Century)
If we develop this sort of AI via ambitious use of the “black-box trial-and-error” common in AI development today, then there’s a substantial risk that:
- These AIs will develop unintended aims (states of the world they make calculations and plans toward, as a chess-playing AI "aims" for checkmate);
- These AIs will deceive, manipulate, and overpower humans as needed to achieve those aims;
- Eventually, this could reach the point where AIs take over the world from humans entirely.
People today are doing AI safety research to prevent this outcome, but such research has a number of deep difficulties:

“Great news - I’ve tested this AI and it looks safe.” Why might we still have a problem?
Problem	Key question	Explanation
The Lance Armstrong problem	Did we get the AI to be actually safe or good at hiding its dangerous actions?	When dealing with an intelligent agent, it’s hard to tell the difference between “behaving well” and “appearing to behave well.” When professional cycling was cracking down on performance-enhancing drugs, Lance Armstrong was very successful and seemed to be unusually “clean.” It later came out that he had been using drugs with an unusually sophisticated operation for concealing them.
The King Lear problem	The AI is (actually) well-behaved when humans are in control. Will this transfer to when AIs are in control?	It's hard to know how someone will behave when they have power over you, based only on observing how they behave when they don't. AIs might behave as intended as long as humans are in control - but at some future point, AI systems might be capable and widespread enough to have opportunities to take control of the world entirely. It's hard to know whether they'll take these opportunities, and we can't exactly run a clean test of the situation. Like King Lear trying to decide how much power to give each of his daughters before abdicating the throne.
The lab mice problem	Today's "subhuman" AIs are safe.What about future AIs with more human-like abilities?	Today's AI systems aren't advanced enough to exhibit the basic behaviors we want to study, such as deceiving and manipulating humans. Like trying to study medicine in humans by experimenting only on lab mice.
The first contact problem	Imagine that tomorrow's "human-like" AIs are safe. How will things go when AIs have capabilities far beyond humans'?	AI systems might (collectively) become vastly more capable than humans, and it's ... just really hard to have any idea what that's going to be like. As far as we know, there has never before been anything in the galaxy that's vastly more capable than humans in the relevant ways! No matter what we come up with to solve the first three problems, we can't be too confident that it'll keep working if AI advances (or just proliferates) a lot more. Like trying to plan for first contact with extraterrestrials (this barely feels like an analogy).

An analogy that incorporates these challenges is Ajeya Cotra’s “young businessperson” analogy:

Imagine you are an eight-year-old whose parents left you a $1 trillion company and no trusted adult to serve as your guide to the world. You must hire a smart adult to run your company as CEO, handle your life the way that a parent would (e.g. decide your school, where you’ll live, when you need to go to the dentist), and administer your vast wealth (e.g. decide where you’ll invest your money).

You have to hire these grownups based on a work trial or interview you come up with -- you don't get to see any resumes, don't get to do reference checks, etc. Because you're so rich, tons of people apply for all sorts of reasons. (More)

If your applicants are a mix of "saints" (people who genuinely want to help), "sycophants" (people who just want to make you happy in the short run, even when this is to your long-term detriment) and "schemers" (people who want to siphon off your wealth and power for themselves), how do you - an eight-year-old - tell the difference?

(Click to expand) More detail on why AI could make this the most important century

AI Safety Seems Hard to Measure

Holden Karnofsky — Thu, 08 Dec 2022 19:45:44 GMT

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

A young, growing field of AI safety research tries to reduce this risk, by finding ways to ensure that AI systems behave as intended (rather than forming ambitious aims of their own and deceiving and manipulating humans as needed to accomplish them).

Maybe we'll succeed in reducing the risk, and maybe we won't. Unfortunately, I think it could be hard to know either way. This piece is about four fairly distinct-seeming reasons that this could be the case - and that AI safety could be an unusually difficult sort of science.

This piece is aimed at a broad audience, because I think it's important for the challenges here to be broadly understood. I expect powerful, dangerous AI systems to have a lot of benefits (commercial, military, etc.), and to potentially appear safer than they are - so I think it will be hard to be as cautious about AI as we should be. I think our odds look better if many people understand, at a high level, some of the challenges in knowing whether AI systems are as safe as they appear.

First, I'll recap the basic challenge of AI safety research, and outline what I wish AI safety research could be like. I wish it had this basic form: "Apply a test to the AI system. If the test goes badly, try another AI development method and test that. If the test goes well, we're probably in good shape." I think car safety research mostly looks like this; I think AI capabilities research mostly looks like this.

Then, I’ll give four reasons that apparent success in AI safety can be misleading.

“Great news - I’ve tested this AI and it looks safe.” Why might we still have a problem?
Problem	Key question	Explanation
The Lance Armstrong problem	Did we get the AI to be actually safe or good at hiding its dangerous actions?	When dealing with an intelligent agent, it’s hard to tell the difference between “behaving well” and “appearing to behave well.” When professional cycling was cracking down on performance-enhancing drugs, Lance Armstrong was very successful and seemed to be unusually “clean.” It later came out that he had been using drugs with an unusually sophisticated operation for concealing them.
The King Lear problem	The AI is (actually) well-behaved when humans are in control. Will this transfer to when AIs are in control?	It's hard to know how someone will behave when they have power over you, based only on observing how they behave when they don't. AIs might behave as intended as long as humans are in control - but at some future point, AI systems might be capable and widespread enough to have opportunities to take control of the world entirely. It's hard to know whether they'll take these opportunities, and we can't exactly run a clean test of the situation. Like King Lear trying to decide how much power to give each of his daughters before abdicating the throne.
The lab mice problem	Today's "subhuman" AIs are safe.What about future AIs with more human-like abilities?	Today's AI systems aren't advanced enough to exhibit the basic behaviors we want to study, such as deceiving and manipulating humans. Like trying to study medicine in humans by experimenting only on lab mice.
The first contact problem	Imagine that tomorrow's "human-like" AIs are safe. How will things go when AIs have capabilities far beyond humans'?	AI systems might (collectively) become vastly more capable than humans, and it's ... just really hard to have any idea what that's going to be like. As far as we know, there has never before been anything in the galaxy that's vastly more capable than humans in the relevant ways! No matter what we come up with to solve the first three problems, we can't be too confident that it'll keep working if AI advances (or just proliferates) a lot more. Like trying to plan for first contact with extraterrestrials (this barely feels like an analogy).

I'll close with Ajeya Cotra's "young businessperson" analogy, which in some sense ties these concerns together. A future piece will discuss some reasons for hope, despite these problems.

Recap of the basic challenge

A previous piece laid out the basic case for concern about AI misalignment. In brief: if extremely capable AI systems are developed using methods like the ones AI developers use today, it seems like there's a substantial risk that:

These AIs will develop unintended aims (states of the world they make calculations and plans toward, as a chess-playing AI "aims" for checkmate);
These AIs will deceive, manipulate, and overpower humans as needed to achieve those aims;
Eventually, this could reach the point where AIs take over the world from humans entirely.

I see AI safety research as trying to design AI systems that won't aim to deceive, manipulate or defeat humans - even if and when these AI systems are extraordinarily capable (and would be very effective at deception/manipulation/defeat if they were to aim at it). That is: AI safety research is trying to reduce the risk of the above scenario, even if (as I've assumed) humans rush forward with training powerful AIs to do ever-more ambitious things.

(Click to expand) More detail on why AI could make this the most important century

Why Would AI "Aim" To Defeat Humanity?

Holden Karnofsky — Tue, 29 Nov 2022 19:20:10 GMT

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

I’ve argued that AI systems could defeat all of humanity combined, if (for whatever reason) they were directed toward that goal.

Here I’ll explain why I think they might - in fact - end up directed toward that goal. Even if they’re built and deployed with good intentions.

In fact, I’ll argue something a bit stronger than that they might end up aimed toward that goal. I’ll argue that if today’s AI development methods lead directly to powerful enough AI systems, disaster is likely¹ by default (in the absence of specific countermeasures).

Unlike other discussions of the AI alignment problem,³ this post will discuss the likelihood⁴ of AI systems defeating all of humanity (not more general concerns about AIs being misaligned with human intentions), while aiming for plain language, conciseness, and accessibility to laypeople, and focusing on modern AI development paradigms. I make no claims to originality, and list some key sources and inspirations in a footnote.⁵

Summary of the piece:

My basic assumptions. I assume the world could develop extraordinarily powerful AI systems in the coming decades. I previously examined this idea at length in the most important century series.

Furthermore, in order to simplify the analysis:

I assume that such systems will be developed using methods similar to today’s leading AI development methods, and in a world that’s otherwise similar to today’s. (I call this nearcasting.)
I assume that AI companies/projects race forward to build powerful AI systems, without specific attempts to prevent the problems I discuss in this piece. Future pieces will relax this assumption, but I think it is an important starting point to get clarity on what the default looks like.

AI “aims.” I talk a fair amount about why we might think of AI systems as “aiming” toward certain states of the world. I think this topic causes a lot of confusion, because:

Often, when people talk about AIs having goals and making plans, it sounds like they’re overly anthropomorphizing AI systems - as if they expect them to have human-like motivations and perhaps evil grins. This can make the whole topic sound wacky and out-of-nowhere.
But I think there are good reasons to expect that AI systems will “aim” for particular states of the world, much like a chess-playing AI “aims” for a checkmate position - making choices, calculations and even plans to get particular types of outcomes. For example, people might want AI assistants that can creatively come up with unexpected ways of accomplishing whatever goal they’re given (e.g., “Get me a great TV for a great price”), even in some cases manipulating other humans (e.g., by negotiating) to get there. This dynamic is core to the risks I’m most concerned about: I think something that aims for the wrong states of the world is much more dangerous than something that just does incidental or accidental damage.

Dangerous, unintended aims. I’ll examine what sorts of aims AI systems might end up with, if we use AI development methods like today’s - essentially, “training” them via trial-and-error to accomplish ambitious things humans want.

Because we ourselves will often be misinformed or confused, we will sometimes give negative reinforcement to AI systems that are actually acting in our best interests and/or giving accurate information, and positive reinforcement to AI systems whose behavior deceives us into thinking things are going well. This means we will be, unwittingly, training AI systems to deceive and manipulate us.
- The idea that AI systems could “deceive” humans - systematically making choices and taking actions that cause them to misunderstand what’s happening in the world - is core to the risk, so I’ll elaborate on this.
For this and other reasons, powerful AI systems will likely end up with aims other than the ones we intended. Training by trial-and-error is slippery: the positive and negative reinforcement we give AI systems will probably not end up training them just as we hoped.
If powerful AI systems have aims that are both unintended (by humans) and ambitious, this is dangerous. Whatever an AI system’s unintended aim:
- Making sure it can’t be turned off is likely helpful in accomplishing the aim.
- Controlling the whole world is useful for just about any aim one might have, and I’ve argued that advanced enough AI systems would be able to gain power over all of humanity.
- Overall, we should expect disaster if we have AI systems that are both (a) powerful enough to defeat humans and (b) aiming for states of the world that we didn’t intend.

Limited and/or ambiguous warning signs. The risk I’m describing is - by its nature - hard to observe, for similar reasons that a risk of a (normal, human) coup can be hard to observe: the risk comes from actors that can and will engage in deception, finding whatever behaviors will hide the risk. If this risk plays out, I do think we’d see some warning signs - but they could easily be confusing and ambiguous, in a fast-moving situation where there are lots of incentives to build and roll out powerful AI systems, as fast as possible. Below, I outline how this dynamic could result in disaster, even with companies encountering a number of warning signs that they try to respond to.

FAQ. An appendix will cover some related questions that often come up around this topic.

How could AI systems be “smart” enough to defeat all of humanity, but “dumb” enough to pursue the various silly-sounding “aims” this piece worries they might have? More
If there are lots of AI systems around the world with different goals, could they balance each other out so that no one AI system is able to defeat all of humanity? More
Does this kind of AI risk depend on AI systems’ being “conscious”?More
How can we get an AI system “aligned” with humans if we can’t agree on (or get much clarity on) what our values even are? More
How much do the arguments in this piece rely on “trial-and-error”-based AI development? What happens if AI systems are built in another way, and how likely is that? More
Can we avoid this risk by simply never building the kinds of AI systems that would pose this danger? More
What do others think about this topic - is the view in this piece something experts agree on? More
How “complicated” is the argument in this piece? More

Starting assumptions

I’ll be making a number of assumptions that some readers will find familiar, but others will find very unfamiliar.

Some of these assumptions are based on arguments I’ve already made (in the most important century series). Some are for the sake of simplifying the analysis, for now (with more nuance coming in future pieces).

Here I’ll summarize the assumptions briefly, and you can click to see more if it isn’t immediately clear what I’m assuming or why.

“Most important century” assumption: we’ll soon develop very powerful AI systems, along the lines of what I previously called PASTA. (Click to expand)

Beta Readers are Great

Holden Karnofsky — Mon, 05 Sep 2022 19:01:42 GMT

Back in January, I posted a call for "beta readers": people who read early drafts of my posts and give honest feedback.

The beta readers I picked up that way are one of my favorite things about having started Cold Takes.

Basically, one of my goals with Cold Takes has been to explain my weirdest views clearly, but it's hard to write clearly without detailed feedback on where I'm making sense and where I'm not. I have lots of preconceptions and assumptions that I don't naturally notice. And writing a blog alone doesn't get me that feedback, because:

Most people don't want to explain how they experienced a piece - if they aren't enjoying it, they just want to click away.
And the people who do want to help me out (e.g., friends and colleagues) aren't necessarily going to be honest enough, or representative enough of my target audience (which is basically "People who are interested in my topics but don't already have a ton of background on them").

I've tried a bunch of things to find good beta readers, from recruiting friends of friends (worked well for a bit, but I've written a lot of posts and it was hard to get sustained participation) to paying Mechanical Turk workers to give feedback (some was good, but in general they were uninterested in my weird topics and rushed through the readings and the feedback as fast they could).

The people who came in through the recruiting call in January have been just what I wanted: they're interested in the topics of Cold Takes, but they don't already know me and my thoughts on them, and they give impressively detailed, thoughtful feedback on their reactions to pieces - often a wonderful combination of "intelligent" and "honest that a lot of the stuff I was saying confused the hell out of them." Getting that kind of feedback has been a privilege.

So: THANK YOU to the following beta readers, each of whom has submitted at least 3 thoughtful reviews (and gave permission to be listed here):

Lars Axelsson

Jeremy Campbell

Kanad Chakrabarti

Craig Chatterton

Justin Dickerson

Ethan Edwards

Edward Gathuru

Stian Grønlund

Bridget Hanna

Tyler Heishman

Adam Jermyn

Elliot Jones

Ed William

Scott Leibrand

Evan R. Murphy

John O’Neill

Jaime Sevilla

Josh Simpson

Joshua Templeton

George Thoma

Martin Trouilloud

Morgan Wack

Kevin Whitaker

Arjun Yadav

Patrick Young

If you want to sign up as a beta reader, you can use this form. I have a bunch of drafts coming on AI, as I'm working on a sequel to the most important century series (working title is "The Most Important Century II: So What Do We Do?")

The Track Record of Futurists Seems ... Fine

Holden Karnofsky — Thu, 30 Jun 2022 19:38:21 GMT

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

I've argued that the development of advanced AI could make this the most important century for humanity. A common reaction to this idea is one laid out by Tyler Cowen here: "how good were past thinkers at predicting the future? Don’t just select on those who are famous because they got some big things right."

This is a common reason people give for being skeptical about the most important century - and, often, for skepticism about pretty much any attempt at futurism (trying to predict key events in the world a long time from now) or steering (trying to help the world navigate such key future events).

The idea is something like: "Even if we can't identify a particular weakness in arguments about key future events, perhaps we should be skeptical of our own ability to say anything meaningful at all about the long-run future. Hence, perhaps we should forget about theories of the future and focus on reducing suffering today, generally increasing humanity's capabilities, etc."

But are people generally bad at predicting future events? Including thoughtful people who are trying reasonably hard to be right? If we look back at prominent futurists' predictions, what's the actual track record? How bad is the situation?

I've looked pretty far and wide for systematic answers to this question, and Open Philanthropy's¹ Luke Muehlhauser has put a fair amount of effort into researching it; I discuss what we've found in an appendix. So far, we haven't turned up a whole lot - the main observation is that it's hard to judge the track record of futurists. (Luke discusses the difficulties here.)

Recently, I worked with Gavin Leech and Misha Yagudin at Arb Research to take another crack at this. I tried to keep things simpler than with past attempts - to look at a few past futurists who (a) had predicted things "kind of like" advances in AI (rather than e.g. predicting trends in world population); (b) probably were reasonably thoughtful about it; but (c) are very clearly not "just selected on those who are famous because they got things right." So, I asked Arb to look at predictions made by the "Big Three" science fiction writers of the mid-20th century: Isaac Asimov, Arthur C. Clarke, and Robert Heinlein.

These are people who thought a lot about science and the future, and made lots of predictions about future technologies - but they're famous for how entertaining their fiction was at the time, not how good their nonfiction predictions look in hindsight. I selected them by vaguely remembering that "the Big Three of science fiction" is a thing people say sometimes, googling it, and going with who came up - no hunting around for lots of sci-fi authors and picking the best or worst.²

So I think their track record should give us a decent sense for "what to expect from people who are not professional, specialized or notably lucky forecasters but are just giving it a reasonably thoughtful try." As I'll discuss below, I think this is many ways "unfair" as a comparison to today's forecasts about AI: I think these predictions are much less serious, less carefully considered and involve less work (especially work weighing different people and arguments against each other).

But my takeaway is that their track record looks ... fine! They made lots of pretty detailed, nonobvious-seeming predictions about the long-run future (30+, often 50+ years out); results ranged from "very impressive" (Asimov got about half of his right, with very nonobvious-seeming predictions) to "bad" (Heinlein was closer to 35%, and his hits don't seem very good) to "somewhere in between" (Clarke had a similar hit rate to Asimov, but his correct predictions don't seem as impressive). There are a number of seemingly impressive predictions and seemingly embarrassing ones.

(How do we determine what level of accuracy would be "fine" vs. "bad?" Unfortunately there's no clear quantitative benchmark - I think we just have to look at the predictions ourselves, how hard they seemed / how similar to today's predictions about AI, and make a judgment call. I could easily imagine others having a different interpretation than mine, which is why I give examples and link to the full prediction sets. I talk about this a bit more below.)

They weren't infallible oracles, but they weren't blindly casting about either. (Well, maybe Heinlein was.) Collectively, I think you could call them "mediocre," but you can't call them "hopeless" or "clueless" or "a warning sign to all who dare predict the long-run future." Overall, I think they did about as well as you might naively³ guess a reasonably thoughtful person would do at some random thing they tried to do?

Below, I'll:

Summarize the track records of Asimov, Clarke and Heinlein, while linking to Arb's full report.
Comment on why I think key predictions about transformative AI are probably better bets than the Asimov/Clarke/Heinlein predictions - although ultimately, if they're merely "equally good bets," I think that's enough to support my case that we should be paying a lot more attention to the "most important century" hypothesis.
Summarize other existing research on the track record of futurists, which I think is broadly consistent with this take (though mostly ambiguous).

For this investigation, Arb very quickly (in about 8 weeks) dug through many old sources, used pattern-matching and manual effort to find predictions, and worked with contractors to score the hundreds of predictions they found. Big thanks to them! Their full report is here. Note this bit: "If you spot something off, we’ll pay $5 per cell we update as a result. We’ll add all criticisms – where we agree and update or reject it – to this document for transparency."

The track records of the "Big Three"

Quick summary of how Arb created the data set

Arb collected "digital copies of as much of their [Asimov's, Clarke's, Heinlein's] nonfiction as possible (books, essays, interviews). The resulting intake is 475 files covering ~33% of their nonfiction corpuses."

Arb then used pattern-matching and manual inspection to pull out all of the predictions it could find, and scored these predictions by:

How many years away the prediction appeared to be. (Most did not have clear dates attached; in these cases Arb generally filled the average time horizon for predictions from the same author that did have clear dates attached.)
Whether the prediction now appears correct, incorrect, or ambiguous. (I didn't always agree with these scorings, but I generally have felt that "correct" predictions at least look "impressive and not silly" while "incorrect" predictions at least look "dicey.")
Whether the prediction was a pure prediction about what technology could do (most relevant), a prediction about the interaction of technology and the economy (medium), or a prediction about the interaction of technology and culture (least relevant). Predictions with no bearing on technology were dropped.
How "difficult" the prediction was (that is, how much the scorers guessed it diverged from conventional wisdom or "the obvious" at the time - details in footnote⁴).

Importantly, fiction was never used as a source of predictions, so this exercise is explicitly scoring people on what they were not famous for. This is more like an assessment of "whether people who like thinking about the future make good predictions" than an assessment of "whether professional or specialized forecasters make good predictions."

For reasons I touch on in an appendix below, I didn't ask Arb to try to identify how confident the Big Three were about their predictions. I'm more interested in whether their predictions were nonobvious and sometimes correct than in whether they were self-aware about their own uncertainty; I see these as different issues, and I suspect that past norms discouraged the latter more than today's norms do (at least within communities interested in Bayesian mindset and the science of forecasting).

More detail in Arb's report.

The numbers

The tables below summarize the numbers I think give the best high-level picture. See the full report and detailed files for the raw predictions and a number of other cuts; there are a lot of ways you can slice the data, but I don't think it changes the picture from what I give below.

Below, I present each predictor's track record on:

"All predictions": all resolved predictions 30 years out or more,⁵ including predictions where Arb had to fill in a time horizon.
"Tech predictions": like the above, but restricted to predictions specifically about technological capabilities (as opposed to technology/economy interactions or technology/culture interactions.
"Difficult predictions" predictions with "difficulty" of 4/5 or 5/5.
"Difficult + tech + definite date": the small set of predictions that met the strictest criteria (tech only, "hardness" 4/5 or 5/5, definite date attached).

Asimov

Category	# correct	# incorrect	# ambiguous/near-miss	Correct / (correct + incorrect)
All resolved predictions	23	29	14	44.23%
Tech predictions	11	4	8	73.33%
Difficult predictions	10	11	7	47.62%
Difficult + tech + definite date	5	1	4	83.33%

You can see the full set of predictions here, but to give a flavor, here are two "correct" and two "incorrect" predictions from the strictest category.⁶ All of these are predictions Asimov made in 1964, about the year 2014 (unless otherwise indicated).

Correct: "only unmanned ships will have landed on Mars, though a manned expedition will be in the works." Bingo, and impressive IMO.
Correct: "the screen [of a phone] can be used not only to see the people you call but also for studying documents and photographs and reading passages from books." I feel like this would've been an impressive prediction in 2004.
Incorrect: "there will be increasing emphasis on transportation that makes the least possible contact with the surface. There will be aircraft, of course, but even ground travel will increasingly take to the air a foot or two off the ground." So false that we now refer to things that don't hover as "hoverboards."
Incorrect: "transparent cubes will be making their appearance in which three-dimensional viewing will be possible. In fact, one popular exhibit at the 2014 World's Fair will be such a 3-D TV, built life-size, in which ballet performances will be seen. The cube will slowly revolve for viewing from all angles." Doesn't seem ridiculous, but doesn't seem right. Of course, a side point here is that he refers to the 2014 World's Fair, which didn't happen.

A general challenge with assessing prediction track records is that we don't know what to compare someone's track record to. Is getting about half your predictions right "good," or is it no more impressive than writing down a bunch of things that might happen and flipping a coin on each?

I think this comes down to how difficult the predictions are, which is hard to assess systematically. A nice thing about this study is that there are enough predictions to get a decent sample size, but the whole thing is contained enough that you can get a good qualitative feel for the predictions themselves. (This is why I give examples; you can also view all predictions for a given person by clicking on their name above the table.) In this case, I think Asimov tends to make nonobvious, detailed predictions, such that I consider it impressive to have gotten ~half of them to be right.

Clarke

Category	# correct	# incorrect	# ambiguous/near-miss	Correct / (correct + incorrect)
All predictions	129	148	48	46.57%
Tech predictions	85	82	29	50.90%
Difficult predictions	14	10	4	58.33%
Difficult + tech + definite date	6	5	2	54.55%

Examples (as above):⁷

Correct 1964 prediction about 2000: "[Communications satellites] will make possible a world in which we can make instant contact with each other wherever we may be. Where we can contact our friends anywhere on Earth, even if we don’t know their actual physical location. It will be possible in that age, perhaps only fifty years from now, for a [person] to conduct [their] business from Tahiti or Bali just as well as [they] could from London." (I assume that "conduct [their] business" refers to a business call rather than some sort of holistic claim that no productivity would be lost from remote work.)
Correct 1950 prediction about 2000: "Indeed, it may be assumed as fairly certain that the first reconnaissances of the planets will be by orbiting rockets which do not attempt a landing-perhaps expendable, unmanned machines with elaborate telemetering and television equipment." This doesn't seem like a super-bold prediction; a lot of his correct predictions have a general flavor of saying progress won't be too exciting, and I find these less impressive than most of Asimov's correct predictions.
Incorrect 1960 prediction about 2010: "One can imagine, perhaps before the end of this century, huge general-purpose factories using cheap power from thermonuclear reactors to extract pure water, salt, magnesium, bromine, strontium, rubidium, copper and many other metals from the sea. A notable exception from the list would be iron, which is far rarer in the oceans than under the continents."
Incorrect 1949 prediction about 1983: "Before this story is twice its present age, we will have robot explorers dotted all over Mars."

I generally found this data set less satisfying/educational than Asimov's: a lot of the predictions were pretty deep in the weeds of how rocketry might work or something, and a lot of them seemed pretty hard to interpret/score. I thought the bad predictions were pretty bad, and the good predictions were sometimes good but generally less impressive than Asimov's.

Heinlein

Category	# correct	# incorrect	# ambiguous/near-miss	Correct / (correct + incorrect)
All predictions	19	41	7	31.67%
Tech predictions	14	20	6	41.18%
Difficult predictions	1	16	1	5.88%
Difficult + tech + definite date	0	1	1	0.00%

This seems really bad, especially adjusted for difficulty: many of the "correct" ones seem either hard-to-interpret or just very obvious (e.g., no time travel). I was impressed by his prediction that "we probably will still be after a cure for the common cold" until I saw a prediction in a separate source saying "Cancer, the common cold, and tooth decay will all be conquered." Overall it seems like he did a lot of predicting outlandish stuff about space travel, and then anti-predicting things that are probably just impossible (e.g., no time travel).

He did have some decent ones, though, such as: "By 2000 A.D. we will know a great deal about how the brain functions ... whereas in 1900 what little we knew was wrong. I do not predict that the basic mystery of psychology--how mass arranged in certain complex patterns becomes aware of itself--will be solved by 2000 A.D. I hope so but do not expect it." He also predicted no human extinction and no end to war - I'd guess a lot of people disagreed with these at the time.

Overall picture

Looks like, of the "big three," we have:

One (Asimov) who looks quite impressive - plenty of misses, but a 50% hit rate on such nonobvious predictions seems pretty great.
One (Heinlein) who looks pretty unserious and inaccurate.
One (Clarke) who's a bit hard to judge but seems pretty solid overall (around half of his predictions look to be right, and they tend to be pretty nonobvious).

Today's futurism vs. these predictions

The above collect casual predictions - no probabilities given, little-to-no reasoning given, no apparent attempt to collect evidence and weigh arguments - by professional fiction writers.

Contrast this situation with my summary of the different lines of reasoning forecasting transformative AI. The latter includes:

Systematic surveys aggregating opinions from hundreds of AI researchers.
Reports that Open Philanthropy employees spent thousands of hours on, systematically presenting evidence and considering arguments and counterarguments.
A serious attempt to take advantage of the nascent literature on how to make good predictions; e.g., the authors (and I) have generally done calibration training,⁸ and have tried to use the language of probability to be specific about our uncertainty.

There's plenty of room for debate on how much these measures should be expected to improve our foresight, compared to what the "Big Three" were doing. My guess is that we should take forecasts about transformative AI a lot more seriously, partly because I think there's a big difference between putting in "extremely little effort" (basically guessing off the cuff without serious time examining arguments and counter-arguments, which is my impression of what the Big Three were mostly doing) and "putting in moderate effort" (considering expert opinion, surveying arguments and counter-arguments, explicitly thinking about one's degree of uncertainty).

But the "extremely little effort" version doesn't really look that bad.

If you look at forecasts about transformative AI and think "Maybe these are Asimov-ish predictions that have about a 50% hit rate on hard questions; maybe these are Heinlein-ish predictions that are basically crap," that still seems good enough to take the "most important century" hypothesis seriously.

Appendix: other studies of the track record of futurism

A 2013 project assessed Ray Kurzweil's 1999 predictions about 2009, and a 2020 followup assessed his 1999 predictions about 2019. Kurzweil is known for being interesting at the time rather than being right with hindsight, and a large number of predictions were found and scored, so I consider this study to have similar advantages to the above study.

The first set of predictions (about 2009, 10-year horizon) had about as many "true or weakly true" predictions as "false or weakly false" predictions.
The second (about 2019, 20-year horizon) was much worse, with 52% of predictions flatly "false," and "false or weakly false" predictions outnumbering "true or weakly true" predictions by almost 3-to-1.

Kurzweil is notorious for his very bold and contrarian predictions, and I'm overall inclined to call his track record something between "mediocre" and "fine" - too aggressive overall, but with some notable hits. (I think if the most important century hypothesis ends up true, he'll broadly look pretty prescient, just on the early side; if it doesn't, he'll broadly look quite off base. But that's TBD.)

A 2002 paper, summarized by Luke Muehlhauser here, assessed the track record of The Year 2000 by Herman Kahn and Anthony Wiener, "one of the most famous and respected products of professional futurism."

About 45% of the forecasts were judged as accurate.
Luke concludes that Kahn and Wiener were grossly overconfident, because he interprets them as making predictions with 90-95% confidence.
My takeaway is a bit different. I see a recurring theme that people often get 40-50% hit rates on interesting predictions about the future, but sometimes present these predictions with great confidence (which makes them look foolish).
I think we can separate "Past forecasters were overconfident" (which I suspect is partly due to clear expression and quantification of uncertainty being uncommon and/or discouraged in relevant contexts) from "Past forecasters weren't able to make interesting predictions that were reasonably likely to be right." The former seems true to me, but the latter doesn't.

Luke's 2019 survey on the track record of futurism identifies two other relevant papers (here and here); I haven't read these beyond the abstracts, but their overall accuracy rates were 76% and 37%, respectively. It's difficult to interpret those numbers without having a feel for how challenging the predictions were.

A 2021 EA Forum post looks at the aggregate track record of forecasters on PredictionBook and Metaculus, including specific analysis of forecasts 5+ years out, though I don't find it easy to draw conclusions about whether the performance was "good" or "bad" (or how similar the questions were to the ones I care about).

Footnotes

Disclosure: I'm co-CEO of Open Philanthropy.
↩
I also briefly Googled for their predictions to get a preliminary sense of whether they were the kinds of predictions that seemed relevant. I found a couple of articles listing a few examples of good and bad predictions, but nothing systematic. I claim I haven't done a similar exercise with anyone else and thrown it out. ↩
That is, if we didn't have a lot of memes in the background about how hard it is to predict the future. ↩
1 - was already generally known

2 - was expert consensus

3 - speculative but on trend

4 - above trend, or oddly detailed

5 - prescient, no trend to go off ↩
Very few predictions in the data set are for less than 30 years, and I just ignored them.
↩
Asimov actually only had one incorrect prediction in this category, so for the 2nd incorrect prediction I used one with difficulty "3" instead of "4." ↩
The first prediction in this list qualified for the strictest criteria when I first drafted this post, but it's now been rescored to difficulty=3/5, which I disagree with (I think it is an impressive prediction, more so than any of the remaining ones that qualify as difficulty=4/5). ↩
Also see this report on calibration for Open Philanthropy grant investigators (though this is a different set of people from the people who researched transformative AI timelines). ↩

Nonprofit Boards are Weird

Holden Karnofsky — Thu, 23 Jun 2022 14:39:52 GMT

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

Note: anything in this post that you think is me subtweeting your organization is actually about, like, at least 3 organizations. (I'm currently on 4 boards in addition to Open Philanthropy's; I've served on a bunch of other boards in the past; and more than half of my takes on boards are not based on any of this, but rather on my interactions with boards I'm not on via the many grants made by Open Philanthropy.)

Writing about ideal governance reminded me of how weird my experiences with nonprofit boards (as in "board of directors" - the set of people who formally control a nonprofit) have been.

I thought that was a pretty good intro. The rest of this piece will:

Try to articulate what's so weird about nonprofit boards, fundamentally. I think a lot of it is the combination of great power, unclear responsibility, and ~zero accountability; additionally, I haven't been able to find much in the way of clear, widely accepted statements of what makes a good board member.
Give my own thoughts on what makes a good board member: which core duties they should be trying to do really well, the importance of "staying out of the way" on other things, and some potentially helpful practices.

I am experienced with nonprofit boards but not with for-profit boards. I'm guessing that roughly half the things I say below will apply to for-profit boards, and that for-profit boards are roughly half as weird overall (so still quite weird), but I haven't put much effort into disentangling these things; I'm writing about what I've seen.

I can't really give real-life examples here (for reasons I think will be pretty clear) so this is just going to be me opining in the abstract.

Why nonprofit boards are weird

Here's how a nonprofit board works:

There are usually 3-10 people on the board (though sometimes much more). Most of them don't work for the nonprofit (they have other jobs).
They meet every few months. Nonprofit employees (especially the CEO¹) do a lot of the agenda-setting for the meeting. Employees present general updates and ask for the board's approval on various things the board needs to approve, such as the budget.
A majority vote of the directors can do anything: fire the CEO, dissolve the nonprofit, add and remove directors, etc. You can think of the board as the "owner" of the nonprofit - formally, it has final say in every decision.
In practice, though, the board rarely votes except on matters that feel fairly "rubber-stamp," and the board's presence doesn't tend to be felt day-to-day at a nonprofit. The CEO leads the decision-making. Occasionally, someone has a thought like "Wait, who does the CEO report to? Oh, the board of directors ... who's on the board again? I don't know if I've ever really spoken with any of those people."

In my experience, it's common for the whole thing to feel extremely weird. (This doesn't necessarily mean there's a better way to do it - footnote has more on what I mean by "weird."²)

Board members often know almost nothing about the organization they have complete power over.
Board meetings rarely feel like a good use of time.
When board members are energetically asking questions and making demands, it usually feels like they're causing chaos and wasting everyone's time and energy.
On the rare occasions when it seems like the board should do something (like replacing the CEO, or providing an independent check on some important decision), the board often seems checked out and it's unclear how they would even come to be aware of the situation.
Everyone constantly seems confused about what the board is and how it can and can't be useful. Employees, and others who interact with the nonprofit, have lots of exchanges like "I'm worried about X ... maybe we should ask the board what they think? ... Can we even ask them that? What is their job actually?"

(Reminder that this is not subtweeting a particular organization! More than one person - from more than one organization - read a draft and thought I was subtweeting them, because what's above describes a large number of boards.)

OK, so what's driving the weirdness?

I think there are a couple of things:

Nonprofit boards have great power, but low engagement (they don't have time to understand the organization as well as employees do); unclear responsibility (it's unclear which board member is responsible for what, and what the board as a whole is responsible for); and ~zero accountability (no one can fire board members except for the other board members!)
Nonprofit boards have unclear expectations and principles. I can't seem to find anyone with a clear, comprehensive, thought-out theory of what a board member's ... job is.

I'll take these one at a time.

Great power, low engagement, unclear responsibility, no accountability

In my experience/impression, the best way to run any organization (or project, or anything) is on an "ownership" model: for any given thing X that you want done well, you have one person who "owns" X. The "owner" of X has:

The power to make decisions to get X done well.
High engagement: they're going to have plenty of time and attention to devote to X.
The responsibility for X: everyone agrees that if X goes well, they should get the credit, and if X goes poorly, they should get the blame.
And accountability: if X goes poorly, there will be some sort of consequences for the "owner."

When these things come apart, I think you get problems. In a nutshell - when no one is responsible, nothing gets done; when someone is responsible but doesn't have power, that doesn't help much; when the person who is responsible + empowered isn't engaged (isn't paying much attention), or isn't held accountable, there's not much in the way of their doing a dreadful job.

A traditional company structure mostly does well at this. The CEO has power (they make decisions for the company), engagement (they are devoted to the company and spend tons of time on it), and responsibility+accountability (if the company does badly, everyone looks at the CEO). They manage a team of people who have power+engagement+responsibility+accountability for some aspect of the company; each of those people manage people with power+engagement+responsibility+accountability for some smaller piece; etc.

What about the board?

They have power to fire the CEO (or do anything else).
They tend to have low engagement. They have other jobs, and only spend a few hours a year on their board roles. They tend to know little about what's going on at the organization.
They have unclear responsibility.
- The board as a whole is responsible for the organization, but what is each individual board member responsible for? In my experience, this is often very unclear, and there are a lot of crucial moments where "bystander effects" seem strong.
- So far, these points apply to both nonprofit and for-profit boards. But at least at a for-profit company, board members know what they're collectively responsible for: maximizing financial value of the company. At a nonprofit, it's often unclear what success even means, beyond the nonprofit's often-vague mission statement, so board members are generally unclear (and don't necessarily agree) on what they're supposed to be ensuring.³
At a for-profit company, the board seems to have reasonable accountability: the shareholders, who ultimately own the company and gain or lose money depending on how it does, can replace the board if they aren't happy. At a nonprofit, the board members have zero accountability: the only way to fire a board member is by majority vote of the board!

So we have people who are spending very little time on the company, know very little about it, don't have much clarity on what they're responsible for either individually or collectively, and aren't accountable to anyone ... and those are the people with all of the power. Sound dysfunctional?⁴

In practice, I think it's often worse than it sounds, because board members aren't even chosen carefully - a lot of the time, a nonprofit just goes with an assortment of random famous people, big donors, etc.

What makes a good board member? Few people even have a hypothesis

I've searched a fair amount for books, papers, etc. that give convincing and/or widely-accepted answers to questions like:

When the CEO asks the board to approve something, how should they engage? When should they take a deferring attitude ("Sure, as long as I don't see any particular reason to say no"), a sanity check attitude ("I'll ask a few questions to make sure this is making sense, then approve if nothing jumps out at me"), a full ownership attitude ("I need to personally be convinced this is the best thing for the organization"), etc.?
How much should each board member invest in educating themselves about the organization? What's the best way to do that?
How does the board know whether the CEO is doing a good job? What kind of situation should trigger seriously considering looking for a new one?
How does a board member know whether the board is doing a good job? How should they decide when another board member should be replaced?

In my experience, most board members just aren't walking around with any particular thought-through take on questions like this. And as far as I can tell, there's a shortage of good⁵ guidance on questions like this for both for-profit and nonprofit boards. For example:

I've found no standard reference on topics like this, and very few resources that even seem aimed at directly and clearly answering such questions.
- The best book on this topic I've seen is Boards that Lead by Ram Charan, focused on for-profit boards (but pretty good IMO).
- But this isn't, like, a book everyone knows to read; I found it by asking lots of people for suggestions, coming up empty, Googling wildly around and skimming like 10 books that said they were about boards, and deciding that this one seemed pretty good.
One of the things I do as a board member is interview other prospective board members about their answers to questions like this. In my experience, they answer most of the above questions with something like "Huh, I don't really know. What do you think?"
Most boards I've seen seem to - by default - either:
- Get way too involved in lots of decisions to the point where it feels like they're micromanaging the CEO and/or just obsessively engaging on whatever topics the CEO happens to bring to their attention; or
- Take a "We're just here to help" attitude and rubber-stamp whatever the CEO suggests, including things I'll argue below should be core duties for the board (e.g., adding and removing board members).
I'm not sure I've ever seen a board with a formal, recurring process for reviewing each board member's performance. :/

To the extent I have seen a relatively common, coherent vision of "what board members are supposed to be doing," it's pretty well summarized in Reid Hoffman's interview in The High-Growth Handbook:

I use ... a red light, yellow light, green light framework between the board and the CEO. Roughly, green light is, “You’re the CEO. Make the call. We’re advisory.” Now, we may say that on very big things—selling the company—we should talk about it before you do it. And that may shift us from green light, if we don’t like the conversation. But a classic young, idiot board member will say, “Well, I’m giving you my expertise and advice. You should do X, Y, Z.” But the right framework for board members is: You’re the CEO. You make the call. We’re advisory.

Red lights also very easy. Once you get to red light, the CEO—who, by the way, may still be in place—won’t be the CEO in the future. The board knows they need a new CEO. It may be with the CEO’s knowledge, or without it. Obviously, it’s better if it’s collaborative ...

Yellow means, “I have a question about the CEO. Should we be at green light or not?” And what happens, again under inexperienced or bad board members, is they check a CEO into yellow indefinitely. They go, “Well, I’m not sure…” The important thing with yellow light is that you 1) coherently agree on it as a board and 2) coherently agree on what the exit conditions are. What is the limited amount of time that we’re going to be in yellow while we consider whether we move back to green or move to red? And how do we do that, so that we do not operate for a long time on yellow? Because with yellow light, you’re essentially hamstringing the CEO and hamstringing the company. It’s your obligation as a board to figure that out.

I like this quite a bit (hence the long blockquote), but I don't think it covers everything. The board is mostly there to oversee the CEO, and they should mostly be advisory when they're happy with the CEO. But I think there are things they ought to be actively thinking about and engaging in even during "green light."

So what DOES make a good board member?

Here is my current take, based on a combination of (a) my thoughts after serving on and interacting with a large number of nonprofit boards; (b) my attempts to adapt conventional wisdom about for-profit boards (especially from the book I mentioned above); (c) divine revelation.

I'll go through:

What I see as the main duties of the board specifically - things the board has to do well, and can't leave to the CEO and other staff.
My basic take that the ideal board should do these main duties well, while staying out of the way otherwise.
The main qualities I think the ideal board member should have - and some common ways of choosing board members that seem bad to me.
A few more random thoughts on board practices that seem especially important and/or promising.

(I don't claim any of these points are original, and almost everything can be found in some writing on boards somewhere, but I don't know of a reasonably comprehensive, concise place to get something similar to the below.)

The board's main duties

I agree with the basic spirit of Hoffman's philosophy above: the board should not be trying to "run the company" (they're too low-engagement and don't know enough about it), and should instead be focused on a small number of big-picture questions like "How is the CEO doing?"

And I do think the board's #1 and most fundamental job is evaluating the CEO's performance. The board is the only reliable source of accountability for the CEO - even more so at a nonprofit than a for-profit, since bad CEO performance won't necessarily show up via financial problems or unhappy shareholders.⁶ (As noted below, I think many nonprofit boards have no formal process for reviewing the CEO's performance, and the ones that do often have a lightweight/underwhelming one.)

But I think the board also needs to take a leading role - and not trust the judgment of the CEO and other staff - when it comes to:

Overseeing decisions that could importantly reduce the board's powers. The CEO might want to enter into an agreement with a third party that is binding on the nonprofit and therefore on the board (for example, "The nonprofit will now need permission from the third party in order to do X"); or transfer major activities and assets to affiliated organizations that the board doesn't control (for example, when Open Philanthropy split off from GiveWell); or revise the organization's mission statement, bylaws,⁷ etc.; or other things that significantly reduce the scope of what the board has control over. The board needs to represent its own interests in these cases, rather than deferring to the CEO (whose interests may be different).
Overseeing big-picture irreversible risks and decisions that could importantly affect future CEOs. For example, I think the board needs to be anticipating any major source of risk that a nonprofit collapses (financially or otherwise) - if this happens, the board can't simply replace the CEO and move on, because the collapse affects what a future CEO is able to do. (What risks and decisions are big enough? Some thoughts in a footnote.⁸)
All matters relating to the composition and performance of the board itself. Adding new board members, removing board members, and reviewing the board's own performance are things that the board needs to be responsible for, not the CEO. If the CEO is controlling the composition of the board, this is at odds with the board's role in overseeing the CEO.

Engaging on main duties, staying out of the way otherwise

I think the ideal board member's behavior is roughly along the lines of the following:

Actively, intensively engage in the main duties from the previous section. Board members should be knowledgeable about, and not defer to the CEO on, (a) how the CEO is performing; (b) how the board is performing, and who should be added and removed; (c) spotting (and scanning the horizon for) events that could reduce the board's powers, or lead to big enough problems and restrictions so as to irreversibly affect what future CEOs are able to do.

Ideally they should be focusing their questions in board meetings on these things, as well as having some way of gathering information about them that doesn't just rely on hearing directly from the CEO. (Some ideas for this are below.) When reviewing financial statements and budgets, they should be focused mostly on the risk of major irreversible problems (such as going bankrupt or failing to be compliant); when hearing about activities, they should be focused mostly on what they reflect about the CEO's performance; etc.

Be advisory ("stay out of the way") otherwise. Meetings might contain all sorts of updates and requests for reactions. I think a good template for a board member, when sharing an opinion or reaction, is either to (a) explain as they're talking why this topic is important for the board's main duties; or (b) say (or imply) something like "I'm curious / offering an opinion about ___, but if this isn't helpful, please ignore it, and please don't hesitate to move the meeting to the next topic as soon as this stops feeling productive."

The combination of intense engagement on core duties and "staying out of the way" otherwise can make this a very weird role. An organization will often go years without any serious questions about the CEO's performance or other matters involving core duties. So a board member ought to be ready to quietly nod along and stay out of the way for very long stretches of time, while being ready to get seriously involved and engaged when this makes sense.

Aim for division of labor. I think a major problem with nonprofit boards is that, by default, it's really unclear which board member is responsible for what. I think it's a good idea for board members to explicitly settle this via assigning:

Specialists ("Board member X is reviewing the financials; the rest of us are mostly checked-out and/or sanity-checking on that");
Subcommittees ("Board members X and Y will look into this particular aspect of the CEO's performance");
A Board Chair or Lead Independent Director⁹ who is the default person to take responsibility for making sure the board is doing its job well (this could include suggesting and assigning responsibility for some of the ideas I list below; helping to set the agenda for board meetings so it isn't just up to the CEO; etc.)

This can further help everyone find a balance between engaging and staying out of the way.

Who should be on the board?

One answer is that it should be whoever can do well at the duties outlined above - both in terms of substance (can they accurately evaluate the CEO's performance, identify big-picture irreversible risks, etc.?) and in terms of style (do they actively engage on their main duties and stay out of the way otherwise?)

But to make things a bit more boiled-down and concrete, I think perhaps the most important test for a board member is: they'll get the CEO replaced if this would be good for the nonprofit's mission, and they won't if it wouldn't be.

This is the most essential function of the board, and it implies a bunch of things about who makes a good board member:

They need to do a great job understanding and representing the nonprofit's mission, and care deeply about that mission - to the point of being ready to create conflict over it if needed (and only if needed).
- A key challenge of nonprofits is that they have no clear goal, only a mission statement that is open to interpretation. And if two different board members interpret the mission differently - or are focused on different aspects of it - this could intensely color how they evaluate the CEO, which could be a huge deal for the nonprofit.
- For example, if a nonprofit's mission is "Help animals everywhere," does this mean "Help as many animals as possible" (which might indicate a move toward focusing on farm animals) or "Help animals in the same way the nonprofit traditionally has" or something else? How does it imply the nonprofit should make tradeoffs between helping e.g. dogs, cats, elephants, chickens, fish or even insects? How a board member answers questions like this seems central to how their presence on the board is going to affect the nonprofit.
They need to have a personality and position capable of challenging the CEO (though also capable of staying out of the way).
- A common problem I see is that some board member is (a) not very engaged with the nonprofit itself, but (b) highly values their personal relationship with the CEO and other board members. This seems like a bad combination, but unfortunately a common one. Board members need to be willing and able to create conflict in order to do the right thing for the nonprofit.
- Limiting the number of board members who are employees (reporting to the CEO) seems important for this reason.
- If you can't picture a board member "making waves," they probably shouldn't be on the board - that attitude will seem fine more than 90% of the time, but it won't work well in the rare cases where the board really matters.
- On the other hand, if someone is only comfortable "making waves" and feels useless and out of sorts when they're just nodding along, that person shouldn't be on the board either. As noted above, board members need to be ready for a weird job that involves stepping up when the situation requires it, but staying out of the way when it doesn't.
They should probably have a well-developed take on what their job is as a board member. Board members who can't say much about where they expect to be highly engaged, vs. casually advisory - and how they expect to invest in getting the knowledge they need to do a good job leading on particular issues - don't seem like great bets to step up when they most need to (or stay out of the way when they should).

In my experience, most nonprofits are not looking for these qualities in board members. They are, instead, often looking for things like:

Celebrity and reputation - board members who are generally impressive and well-regarded and make the nonprofit look good. Unfortunately, I think such people often just don't have much time or interest for their job. Many are also uninterested in causing any conflict, which makes them basically useless as board members IMO.
Fundraising - a lot of nonprofits pretty much explicitly just try to put people on the board who will help raise money for them. This seems bad for governance.
Narrow expertise on some topic that is important for the nonprofit. I don't really think this is what nonprofits should be seeking from board members,¹⁰ except to the extent it ties deeply into the board members' core duties, e.g., where it's important to have an independent view on technical topic X in order to do a good job evaluating the CEO.

I think a good profile for a board member is someone who cares greatly about the nonprofit's mission, and wants it to succeed, to the point where they're ready to have tough conversations if they see the CEO falling short. Examples of such people might be major funders, or major stakeholders (e.g., a community leader from a community of people the nonprofit is trying to help).

A few practices that seem good

I'll anticlimactically close with a few practices that seem helpful to me. These are mostly pretty generic practices, useful for both for-profit and nonprofit boards, that I have seen working in practice but also seen too many boards going without. They don't fully address the weirdnesses discussed above (especially the stuff specific to nonprofit as opposed to for-profit boards), but they seem to make things some amount better.

Keeping it simple for low-stakes organizations. If a nonprofit is a year old and has 3 employees, it probably shouldn't be investing a ton of its energy in having a great board (especially since this is hard).

A key question is: "If the board just stays checked out and doesn't hold the CEO accountable, what's the worst thing that can happen?" If the answer is something like "The nonprofit's relatively modest budget is badly spent," then it might not be worth a huge investment in building a great board (and in taking some of the measures listed below). Early-stage nonprofits often have a board consisting of 2-3 people the founder trusts a lot (ideally in a "you'd fire me if it were the right thing to do" sense rather than in a "you've always got my back" sense), which seems fine. The rest of these ideas are for when the stakes are higher.

Formal board-staff communication channels. A very common problem I see is that:

Board members know almost nothing about the organization, and so are hesitant to engage in much of anything.
Employees of the organization know far more, but find the board members mysterious/unapproachable/scary, and don't share much information with them.

I've seen this dynamic improved some amount by things like a staff liaison: a board member who is designated with the duty, "Talk to employees a lot, offer them confidentiality as requested, try to build trust, and gather information about how things are going." Things like regular "office hours" and showing up to company events can help with this.

Viewing board seats as limited. It seems unlikely that a board should have more than 10 members (and even 10 seems like a lot), since it's hard to have a productive meeting past that point.¹¹ When considering a new addition to the board, I think the board should be asking something much closer to "Is this one of the 10 best people in the world to sit on this board?" than to "Is this person fine?"

Regular CEO reviews. Many nonprofits don't seem to have any formal, regular process for reviewing the CEO's performance; I think it's important to do this.

The most common format I've seen is something like: one board member interviews the CEO's direct reports, and perhaps some other people throughout the company, and integrates this with information about the organization's overall progress and accomplishments (often presented by the organization itself, but they might ask questions about it) to provide a report on what the CEO is doing well and could do better. I think this approach has a lot of limitations - staff are often hesitant to be forthcoming with a board member (even when promised anonymity), and the board member often lacks a lot of key information - but even with those issues, it tends to be a useful exercise.

Closed sessions. I think it's important for the board to have "closed sessions" where board members can talk frankly without the CEO, other employees, etc. hearing. I think a common mistake is to ask "Does anyone want the closed session today or can we skip it?" - this puts the onus on board members to say "Yes, I would like a closed session," which then implies they have something negative to say. I think it's better for whoever's running the meetings to identify logical closed sessions (e.g., "The board minus employees"), allocate time for them and force them to happen.

Regular board reviews. It seems like it would be a good idea for board members to regularly assess each other's performance, and the performance of the board as a whole. But I've actually seen very little of this done in practice and I can't point to versions of it that seem to have some track record of working well. It does seem like a good idea though!

Conclusion

The board is the only body at a nonprofit that can hold the CEO accountable to accomplishing the mission. I broadly feel like most nonprofit boards just aren't very well-suited to this duty, or necessarily to much of anything. It's an inherently weird structure that seems difficult to make work.

I wish someone would do a great job studying and laying out how nonprofit boards should be assembled, how they should do their job and how they can be held accountable. You can think of this post as my quick, informal shot at that.

Footnotes

I'm using the term "CEO" throughout, although the chief executive at a non profit sometimes has another title, such as "Executive Director." ↩
A lot of this piece is about how the fundamental setup of a nonprofit board leads to the kinds of problems and dynamics I'm describing. This doesn't mean we should necessarily think there's any way to fix it or any better alternative. It just means that this setup seems to bring a lot of friction points and challenges that most relationships between supervisor-and-supervised don't seem to have, which can make the experience of interacting with a board feel vaguely unlike what we're used to in other contexts, or "weird."

People who have interacted with tons of boards might get so used to these dynamics that they no longer feel weird. I haven't reached that point yet myself though.
↩
The fact that the nonprofit's goals aren't clearly defined and have no clear metric (and often aren't susceptible to measurement at all) is a pretty general challenge of nonprofits, but I think it especially shows up for a structure (the board) that is already weird in the various other ways I'm describing. ↩
Superficially, you could make most of the same complaints about shareholders of a for-profit company. But:
- Shareholders are the people who ultimately make or lose money if the company does well or poorly (you can think of this as a form of accountability). By contrast, nonprofit board members often have very little (or only an idiosyncratic) personal connection to and investment in the organization.
- Shareholders compensate for their low engagement by picking representatives (a board) whom they can hold accountable for the company's performance. Nonprofit board members are the representatives, and aren't accountable to anyone. ↩
Especially "good and concise." Most of the points I make here can be found in some writings on boards somewhere, but it's hard to find sensible-seeming and comprehensive discussions of what the board should be doing and who should be on it. ↩
Part of the CEO's job is fundraising, and if they do a bad job of this, it's going to be obvious. But that's only part of the job. At a nonprofit, a CEO could easily be bringing in plenty of money and just doing a horrible job at the mission - and if the board isn't able to learn this and act on it, it seems like very bad news. ↩
The charter and bylaws are like the "constitution" of a nonprofit, laying out how its governance works. ↩
This is a judgment call, and one way to approach it would be to reserve something like 1 hour of full-board meeting time per year for talking about these sorts of things (and pouring in more time if at least, like, 1/3 of the board thinks something is a big deal).

Some examples of things I think are and aren't usually a big enough deal to start paying serious attention to:
- Big enough deal: financial decisions that increase the odds of going "belly-up" (running out of money and having to fold) by at least 10 percentage points. Not a big enough deal: spending money in ways that are arguably bad uses of money, having a lowish-but-not-too-far-off-of-peer-organizations amount of runway.
- Big enough deal: deficiencies in financial controls that an auditor is highlighting, or a lack of audit altogether, until a plan is agreed to to address these things. Not a big enough deal: most other stuff in this category.
- Big enough deal: organizations with substantial "PR risk" exposure should have a good team for assessing this and a "crisis plan" in case something happens. Not a big enough deal: specific organizational decisions and practices that you are not personally offended by or find unethical, but could imagine a negative article about. (If you do find them substantively unethical, I think that's a big enough deal.)
- Big enough deal: transferring like 1/3 or more of valuable things the nonprofit has (intellectual property, money, etc.) to another entity not controlled by the board. Not a big enough deal: starting an affiliate organization primarily for taking donations in another country or something.
- Big enough deal: doubling or halving the workforce. Not a big enough deal: smaller hirings and firings.
↩
Sometimes the Board Chair is the CEO, and sometimes the Chair is an employee of the company who also sits on the board. In these cases, I think it's good for there to be a separate Lead Independent Director who is not employed by the company and is therefore exclusively representing the Board. They can help set agendas, lead meetings, and take responsibility by default when it's otherwise unclear who would do so. ↩
Nonprofits can get expertise on topic X by hiring experts on X to advise them. The question is: when is it important to have an expert on X evaluating the CEO? ↩
Though it could be fine and even interesting to have giant boards - 20 people, 50 or more - that have some sort of "executive committee" of 10 or fewer people doing basically all of the meetings and all of the work (with the rest functioning just as very passive, occasionally-voting equivalents of "shareholders"). Just assume I'm talking about the "executive committee" type thing here. ↩

Cold Takes

Good job opportunities for helping with the most important century

How these help

What skills are needed

So

Appendix: how I decided which jobs to list

Footnotes

What does Bing Chat tell us about AI risk?

How major governments can help with the most important century

What AI companies can do today to help with the most important century

Some basics: alignment research, strong security, safety standards

Jobs that can help with the most important century

Recapping the major risks, and some things that could help

Spreading messages to help with the most important century

Challenges of AI-related messages

How we could stumble into AI catastrophe

Backdrop

Transformative AI issues (not just misalignment): an overview

Racing through a minefield: the AI deployment problem

The basic premises of “racing through a minefield”

What success looks like

Alignment (charting a safe path through the minefield2)

Threat assessment (alerting others about the mines)

Avoiding races (to move more cautiously through the minefield)

Selective information sharing - including security (so the incautious don’t catch up)

Global monitoring (noticing people about to step on mines, and stopping them)

Defensive deployment (staying ahead in the race)

So?

Footnotes

High-level hopes for AI alignment

The challenge

AI Safety Seems Hard to Measure

Recap of the basic challenge

Why Would AI "Aim" To Defeat Humanity?

Starting assumptions

Beta Readers are Great

The Track Record of Futurists Seems ... Fine

The track records of the "Big Three"

Quick summary of how Arb created the data set

The numbers

Overall picture

Today's futurism vs. these predictions

Appendix: other studies of the track record of futurism

Footnotes

Nonprofit Boards are Weird

Why nonprofit boards are weird

Great power, low engagement, unclear responsibility, no accountability

What makes a good board member? Few people even have a hypothesis

So what DOES make a good board member?

The board's main duties

Engaging on main duties, staying out of the way otherwise

Who should be on the board?

A few practices that seem good

Conclusion

Footnotes

Alignment (charting a safe path through the minefield²)