Alignment

From Tournesol
Jump to navigation Jump to search

An algorithm is said to be aligned if its objective function matches what we would really want to optimize.

The core of AI Safety?

Alignment has been argued to be the central solution to AI Safety, especially as algorithms become harder and harder to monitor [CITATION NEEDED].

Though somewhat controversial, alignment has also been argued to already be critical for recommendation algorithms ElMhamdi-Hoang-19 StrayAM-20 Hoang-20.

Volitions and social choice

It has been argued that [CITATION NEEDED], instead of aligning algorithms with our immediate preferences, we should aim to align them with our volitions. Roughly and intuitively, volitions would correspond to what we would want to want, if we thought longer and better.

Another difficulty is that we should consider the probable case where even our volitions disagree about what an algorithm ought to do. This calls for social choice solutions.