Research directions
We expand upon the specific technical research directions that prominent organizations are currently pursuing to tackle alignment.
The research problem
How can we steer and control AI systems that are much smarter than us?
While reinforcement learning from human feedback (RLHF) has been largely successful in aligning today’s models, it won’t reliably scale to AI systems much smarter than us.
We will need new methods and scientific breakthroughs to ensure superhuman AI systems reliably follow their operator’s intent.
Directions
Comments
OpenAI, for their superalignment grants, has published a Research directions page with much more information, covering W2SG, interpretability (mechanistic and top-down), scalable oversight, and other directions.
New researchers have the opportunity to make enormous contributions. The alignment field is a very early one with many tractable research problems and new opportunities.
“There has never been a better time to start working on alignment.” - OpenAI