# Tournesol's model

Tournesol collects pairwise content comparisons along different quality criteria, and infers individual and global scores based on such comparisons. To do this, Tournesol's model combines the Bradley-Terry model and the Licchavi framework for robust personalized collaborative learning.

Note: A paper on Licchavi is currently being finalized. It will be available within a month.

Upcoming...

## Basic mathematical formulation

Denote $[N]=\{1,\ldots ,N\}$ the set of contributors and $[V]=\{1,\ldots ,V\}$ the set of videos on Tournesol. For simplicity, for now, we assume that there is only one quality criterion.

Each contributor $n\in [N]$ then provides a dataset ${\mathcal {D}}_{n}$ of ratings, each of which is of the form $(v,w,r)$ , where $v,w\in [V]$ are two videos to be compared, and $r\in [-1,1]$ is the rating provided by the contributor. The value $r=-1$ means that the contributor vastly prefers $v$ to $w$ .

By slightly generalizing the Bradley-Terry model, we assume that that each contributor $n$ implicitly assigns a score $\theta _{nv}\in \mathbb {R}$ to video $v$ . Intuitively, we then assume that the odds that contributor rates $v$ above $w$ is exponentially large in the difference $\theta _{nv}-\theta _{nw}$ between the implicit scores of the two videos. More formally, we assume that the law of rating $r$ given the implicit scores of contributor $n$ is given by the probability density function $p(r)={\frac {1}{1+\exp(r(\theta _{nv}-\theta _{nw}))}}$ .

Assuming that the contributor's ratings are independent (conditionally to the videos selected $v$ and $w$ and to the parameters $\theta _{n}$ ), the negative log-likelihood of the dataset ${\mathcal {D}}_{n}$ is then given by $L_{n}(\theta _{n},{\mathcal {D}}_{n})=\sum _{(v,w,r)\in {\mathcal {D}}_{n}}\ln(1+\exp(r\theta _{nv}-r\theta _{nw}))$ .

As proposed by the Licchavi framework, we introduce global scores $\rho _{v}\in \mathbb {R}$ for all videos $v\in [V]$ . We then penalize the discrepancies between the global scores and the individual scores, as well as a regularization on the global scores to guarantee the uniqueness (and robustness) of global scores. This leads us to define the following global loss: $Loss(\rho ,{\vec {\theta }},{\vec {\mathcal {D}}})=\sum _{n\in [N]}L_{n}(\theta _{n},{\mathcal {D}}_{n})+\lambda \sum _{n\in [N]}\sum _{v\in [V]}w_{nv}|\theta _{nv}-\rho _{v}|+\mu \sum _{v\in [V]}\rho _{v}^{2}$ .

The weights $w_{nv}\in [0,1]$ are defined by $w_{nv}={\frac {R_{nv}}{C+R_{nv}}}$ , where $R_{nv}$ is the number of ratings of video $v$ by contributor $n$ . They initially increase linearly in $R_{nv}$ , as contributor $n$ provides more ratings, but then saturate at 1, thereby giving the contributor a bounded maximal voting power.

Note that the global loss is convex. We currently solve it using gradient descent, which currently takes us less than 10 minutes on a CPU, for ~ 5,000 ratings. We are also investigating solutions to scale the optimization of the loss function to solve it for millions or billions of ratings.

Currently the hyperparameters are set as $\lambda =1$ , $\mu =1$ and $C=3$ .

## Resilience to a small number of malicious contributors A single contributor can have an effect of at most ${\frac {\lambda }{2\mu }}{\frac {R_{nv}}{C+R_{nv}}}$ on the scores. For $\lambda =\mu =1$ , $C=3$ and $R_{nv}=8$ ratings, this equals ~0.364. This is why some scores of this video are stuck at 1.364.

This implies that, currently, a voter can affect the global score on a video $v$ by at most ${\frac {\lambda }{2\mu }}{\frac {R_{nv}}{C+R_{nv}}}\leq 1/2$ points. In fact, a contributor that provides $C=3$ ratings would only be able to influence global scores by 1/4 point.

Larger values of $\mu /\lambda$ increase the robustness of our model to malicious contributors. As the number of contributors grow, we plan to increase this ratio as well.

## Research

Tournesol's model is currently being studied both theoretically and empirically. To contribute to this research, please reach out to Lê Nguyên Hoang, e.g. on Discord.