recbole.evaluator.metrics

recbole.evaluator.metrics.auc_(trues, preds)[source]

AUC (also known as Area Under Curve) is used to evaluate the two-class model, referring to the area under the ROC curve

Note

This metric does not calculate group-based AUC which considers the AUC scores averaged across users. It is also not limited to k. Instead, it calculates the scores on the entire prediction results regardless the users.

\[\mathrm {AUC} = \frac{\sum\limits_{i=1}^M rank_{i} - {{M} \times {(M+1)}}} {{M} \times {N}}\]

\(M\) is the number of positive samples. \(N\) is the number of negative samples. \(rank_i\) is the rank of the ith positive sample.

recbole.evaluator.metrics.hit_(pos_index, pos_len)[source]

Hit (also known as hit ratio at \(N\)) is a way of calculating how many ‘hits’ you have in an n-sized list of ranked items.

\[\mathrm {HR@K} =\frac{Number \space of \space Hits @K}{|GT|}\]

\(HR\) is the number of users with a positive sample in the recommendation list. \(GT\) is the total number of samples in the test set.

recbole.evaluator.metrics.log_loss_(trues, preds)[source]

Log loss, aka logistic loss or cross-entropy loss

\[-\log {P(y_t|y_p)} = -(({y_t}\ \log{y_p}) + {(1-y_t)}\ \log{(1 - y_p)})\]

For a single sample, \(y_t\) is true label in \(\{0,1\}\). \(y_p\) is the estimated probability that \(y_t = 1\).

recbole.evaluator.metrics.mae_(trues, preds)[source]

Mean absolute error regression loss

\[\mathrm{MAE}=\frac{1}{|{T}|} \sum_{(u, i) \in {T}}\left|\hat{r}_{u i}-r_{u i}\right|\]

\(T\) is the test set, \(\hat{r}_{u i}\) is the score predicted by the model, and \(r_{u i}\) the actual score of the test set.

recbole.evaluator.metrics.map_(pos_index, pos_len)[source]

MAP (also known as Mean Average Precision) The MAP is meant to calculate Avg. Precision for the relevant items.

Note

In this case the normalization factor used is \(\frac{1}{\min (m,N)}\), which prevents your AP score from being unfairly suppressed when your number of recommendations couldn’t possibly capture all the correct ones.

\[\begin{split}\begin{align*} \mathrm{AP@N} &= \frac{1}{\mathrm{min}(m,N)}\sum_{k=1}^N P(k) \cdot rel(k) \\ \mathrm{MAP@N}& = \frac{1}{|U|}\sum_{u=1}^{|U|}(\mathrm{AP@N})_u \end{align*}\end{split}\]
recbole.evaluator.metrics.mrr_(pos_index, pos_len)[source]

The MRR (also known as mean reciprocal rank) is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries, ordered by probability of correctness.

\[\mathrm {MRR} = \frac{1}{|{U}|} \sum_{i=1}^{|{U}|} \frac{1}{rank_i}\]

\(U\) is the number of users, \(rank_i\) is the rank of the first item in the recommendation list in the test set results for user \(i\).

recbole.evaluator.metrics.ndcg_(pos_index, pos_len)[source]

NDCG (also known as normalized discounted cumulative gain) is a measure of ranking quality. Through normalizing the score, users and their recommendation list results in the whole test set can be evaluated.

\[\begin{split}\begin{gather} \mathrm {DCG@K}=\sum_{i=1}^{K} \frac{2^{rel_i}-1}{\log_{2}{(i+1)}}\\ \mathrm {IDCG@K}=\sum_{i=1}^{K}\frac{1}{\log_{2}{(i+1)}}\\ \mathrm {NDCG_u@K}=\frac{DCG_u@K}{IDCG_u@K}\\ \mathrm {NDCG@K}=\frac{\sum \nolimits_{u \in u^{te}NDCG_u@K}}{|u^{te}|} \end{gather}\end{split}\]

\(K\) stands for recommending \(K\) items. And the \(rel_i\) is the relevance of the item in position \(i\) in the recommendation list. \(2^{rel_i}\) equals to 1 if the item hits otherwise 0. \(U^{te}\) is for all users in the test set.

recbole.evaluator.metrics.precision_(pos_index, pos_len)[source]

Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances

\[\mathrm {Precision@K} = \frac{|Rel_u \cap Rec_u|}{Rec_u}\]

\(Rel_u\) is the set of items relavent to user \(U\), \(Rec_u\) is the top K items recommended to users. We obtain the result by calculating the average \(Precision@K\) of each user.

recbole.evaluator.metrics.recall_(pos_index, pos_len)[source]

Recall (also known as sensitivity) is the fraction of the total amount of relevant instances that were actually retrieved

\[\mathrm {Recall@K} = \frac{|Rel_u\cap Rec_u|}{Rel_u}\]

\(Rel_u\) is the set of items relavent to user \(U\), \(Rec_u\) is the top K items recommended to users. We obtain the result by calculating the average \(Recall@K\) of each user.

recbole.evaluator.metrics.rmse_(trues, preds)[source]

Mean std error regression loss

\[\mathrm{RMSE} = \sqrt{\frac{1}{|{T}|} \sum_{(u, i) \in {T}}(\hat{r}_{u i}-r_{u i})^{2}}\]

\(T\) is the test set, \(\hat{r}_{u i}\) is the score predicted by the model, and \(r_{u i}\) the actual score of the test set.