Why is alignment difficult?

💡

Getting an AI to do what we “want it to do” is difficult because it’s incredibly difficult to specify exactly what we “want it to do.”

After reading this, you will understand why choosing good goals for AI is challenging and important.

A generally intelligent AI agent will have goals and choose actions to further those goals. The more intelligent the agent, the more effective it is at achieving its goals.

In the previous page, we defined AI alignment as the problem of ensuring that a powerful AI system is trying to do what its human operator wants it to do.

Wait, why is that difficult?

Getting an AI to make you a cup of tea

You want your AI to get you a cup of tea. You specify this as its goal.
But there’s a vase in the way, and the robot plows through it. This is because it only cares about one variable: the cup of tea.
So you turn it off, modify it, and it cares about the vase. But there is always another thing.
Now, it values the vase and realizes a human is in the environment that may move around and knock over it. So it determines that it’s best to kill the human then.

This is a convergent instrumental goal (it doesn’t matter what the end goal is), as are self-preservation, goal preservation, resource acquisition, and self-improvement. Thus, Artificial General Intelligence is dangerous by default. (More in Unaligned AI: default dangerous.)

We aren’t on track to build this safely: AI today is misaligned.

Models like GPT-4 can’t take over the world but are still misaligned (they will often do things their designers didn’t want, such as lie or be offensive.)

We also may only get one shot (Why this is urgent).

What exactly is AI “alignment”?Unaligned AI: default dangerous

Why is alignment difficult?

Because it’s difficult to choose good goals.

Getting an AI to make you a cup of tea

Anything it doesn’t value will be lost forever. How can we ensure that a powerful AI system doesn’t take actions that conflict with things humans care about?

But that scenario is unrealistic in many ways. One important way is that you wouldn’t be able to “turn it off.”

We aren’t on track to build this safely: AI today is misaligned.

This is concerning because we are tasked with aligning something far more intelligent than us when we cannot properly align far weaker AI.

Sources and resources