Why is alignment difficult?

Why is alignment difficult?

💡

Getting an AI to do what we “want it to do” is difficult because it’s incredibly difficult to specify exactly what we “want it to do.”

After reading this, you will understand why choosing good goals for AI is challenging and important.

A generally intelligent AI agent will have goals and choose actions to further those goals. The more intelligent the agent, the more effective it is at achieving its goals.

In the previous page, we defined AI alignment as the problem of ensuring that a powerful AI system is trying to do what its human operator wants it to do.

Wait, why is that difficult?

Getting an AI to make you a cup of tea

  1. You want your AI to get you a cup of tea. You specify this as its goal.
  2. But there’s a vase in the way, and the robot plows through it. This is because it only cares about one variable: the cup of tea.
  3. So you turn it off, modify it, and it cares about the vase. But there is always another thing.
  4. Now, it values the vase and realizes a human is in the environment that may move around and knock over it. So it determines that it’s best to kill the human then.

This is a convergent instrumental goal (it doesn’t matter what the end goal is), as are self-preservation, goal preservation, resource acquisition, and self-improvement. Thus, Artificial General Intelligence is dangerous by default. (More in Unaligned AI: default dangerous.)

We aren’t on track to build this safely: AI today is misaligned.

Models like GPT-4 can’t take over the world but are still misaligned (they will often do things their designers didn’t want, such as lie or be offensive.)

We also may only get one shot (Why this is urgent).

logo

Alignment Guide

© 2024 The Alignment Guide Project