recbole.evaluator.metrics¶
-
recbole.evaluator.metrics.
auc_
(trues, preds)[source]¶ AUC (also known as Area Under Curve) is used to evaluate the two-class model, referring to the area under the ROC curve
Note
This metric does not calculate group-based AUC which considers the AUC scores averaged across users. It is also not limited to k. Instead, it calculates the scores on the entire prediction results regardless the users.
\[\mathrm {AUC} = \frac{\sum\limits_{i=1}^M rank_{i} - {{M} \times {(M+1)}}} {{M} \times {N}}\]\(M\) is the number of positive samples. \(N\) is the number of negative samples. \(rank_i\) is the rank of the ith positive sample.
-
recbole.evaluator.metrics.
hit_
(pos_index, pos_len)[source]¶ Hit (also known as hit ratio at \(N\)) is a way of calculating how many ‘hits’ you have in an n-sized list of ranked items.
\[\mathrm {HR@K} =\frac{Number \space of \space Hits @K}{|GT|}\]\(HR\) is the number of users with a positive sample in the recommendation list. \(GT\) is the total number of samples in the test set.
-
recbole.evaluator.metrics.
log_loss_
(trues, preds)[source]¶ Log loss, aka logistic loss or cross-entropy loss
\[-\log {P(y_t|y_p)} = -(({y_t}\ \log{y_p}) + {(1-y_t)}\ \log{(1 - y_p)})\]For a single sample, \(y_t\) is true label in \(\{0,1\}\). \(y_p\) is the estimated probability that \(y_t = 1\).
-
recbole.evaluator.metrics.
mae_
(trues, preds)[source]¶ Mean absolute error regression loss
\[\mathrm{MAE}=\frac{1}{|{T}|} \sum_{(u, i) \in {T}}\left|\hat{r}_{u i}-r_{u i}\right|\]\(T\) is the test set, \(\hat{r}_{u i}\) is the score predicted by the model, and \(r_{u i}\) the actual score of the test set.
-
recbole.evaluator.metrics.
map_
(pos_index, pos_len)[source]¶ MAP (also known as Mean Average Precision) The MAP is meant to calculate Avg. Precision for the relevant items.
Note
In this case the normalization factor used is \(\frac{1}{\min (m,N)}\), which prevents your AP score from being unfairly suppressed when your number of recommendations couldn’t possibly capture all the correct ones.
\[\begin{split}\begin{align*} \mathrm{AP@N} &= \frac{1}{\mathrm{min}(m,N)}\sum_{k=1}^N P(k) \cdot rel(k) \\ \mathrm{MAP@N}& = \frac{1}{|U|}\sum_{u=1}^{|U|}(\mathrm{AP@N})_u \end{align*}\end{split}\]
-
recbole.evaluator.metrics.
mrr_
(pos_index, pos_len)[source]¶ The MRR (also known as mean reciprocal rank) is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries, ordered by probability of correctness.
\[\mathrm {MRR} = \frac{1}{|{U}|} \sum_{i=1}^{|{U}|} \frac{1}{rank_i}\]\(U\) is the number of users, \(rank_i\) is the rank of the first item in the recommendation list in the test set results for user \(i\).
-
recbole.evaluator.metrics.
ndcg_
(pos_index, pos_len)[source]¶ NDCG (also known as normalized discounted cumulative gain) is a measure of ranking quality. Through normalizing the score, users and their recommendation list results in the whole test set can be evaluated.
\[\begin{split}\begin{gather} \mathrm {DCG@K}=\sum_{i=1}^{K} \frac{2^{rel_i}-1}{\log_{2}{(i+1)}}\\ \mathrm {IDCG@K}=\sum_{i=1}^{K}\frac{1}{\log_{2}{(i+1)}}\\ \mathrm {NDCG_u@K}=\frac{DCG_u@K}{IDCG_u@K}\\ \mathrm {NDCG@K}=\frac{\sum \nolimits_{u \in u^{te}NDCG_u@K}}{|u^{te}|} \end{gather}\end{split}\]\(K\) stands for recommending \(K\) items. And the \(rel_i\) is the relevance of the item in position \(i\) in the recommendation list. \(2^{rel_i}\) equals to 1 if the item hits otherwise 0. \(U^{te}\) is for all users in the test set.
-
recbole.evaluator.metrics.
precision_
(pos_index, pos_len)[source]¶ Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances
\[\mathrm {Precision@K} = \frac{|Rel_u \cap Rec_u|}{Rec_u}\]\(Rel_u\) is the set of items relavent to user \(U\), \(Rec_u\) is the top K items recommended to users. We obtain the result by calculating the average \(Precision@K\) of each user.
-
recbole.evaluator.metrics.
recall_
(pos_index, pos_len)[source]¶ Recall (also known as sensitivity) is the fraction of the total amount of relevant instances that were actually retrieved
\[\mathrm {Recall@K} = \frac{|Rel_u\cap Rec_u|}{Rel_u}\]\(Rel_u\) is the set of items relavent to user \(U\), \(Rec_u\) is the top K items recommended to users. We obtain the result by calculating the average \(Recall@K\) of each user.
-
recbole.evaluator.metrics.
rmse_
(trues, preds)[source]¶ Mean std error regression loss
\[\mathrm{RMSE} = \sqrt{\frac{1}{|{T}|} \sum_{(u, i) \in {T}}(\hat{r}_{u i}-r_{u i})^{2}}\]\(T\) is the test set, \(\hat{r}_{u i}\) is the score predicted by the model, and \(r_{u i}\) the actual score of the test set.