recbole.config.eval_setting¶
-
class
recbole.config.eval_setting.
EvalSetting
(config)[source]¶ Bases:
object
Class containing settings about model evaluation.
- Evaluation setting contains four parts:
Group
Sort
Split
Negative Sample
APIs are provided for users to set up or modify their evaluation setting easily and clearly.
Besides, some presets are provided, which is more recommended.
- For example:
RO: Random Ordering TO: Temporal Ordering
RS: Ratio-based Splitting LS: Leave-one-out Splitting
full: adopt the entire item set (excluding ground-truth items) for ranking uniXX: uniform sampling XX items while negative sampling popXX: popularity-based sampling XX items while negative sampling
Note that records are grouped by user_id by default if you use these presets.
Thus you can use RO_RS, full to represent Shuffle, Grouped by user, Ratio-based Splitting and Evaluate all non-ground-truth items.
Check out Revisiting Alternative Experimental Settings for Evaluating Top-N Item Recommendation Algorithms Wayne Xin Zhao et.al. CIKM 2020 to figure out the details about presets of evaluation settings.
- Parameters
config (Config) – Global configuration object.
-
group_field
¶ Don’t group if
None
, else group by field before splitting. Usually records are grouped by user id.- Type
str or None
-
ordering_args
¶ Args about ordering. Usually records are sorted by timestamp, or shuffled.
- Type
dict
-
split_args
¶ Args about splitting. usually records are splitted by ratio (eg. 8:1:1), or by ‘leave one out’ strategy, which means the last purchase record of one user is used for evaluation.
- Type
dict
-
neg_sample_args
¶ Args about negative sampling. Negative sample is used wildly in training and evaluating.
We provide two strategies:
neg_sample_by
: sample several negative records for each positive records.full_sort
: don’t negative sample, while all unused items are used for evaluation.
- Type
dict
-
RO_LS
(leave_one_num=1, group_by_user=True)[source]¶ Preset about Random Ordering and Leave-one-out Splitting.
- Parameters
leave_one_num (int) – number of sub datasets for evaluation. E.g.
leave_one_num=2
if you have one validation dataset and one test dataset.group_by_user (bool) – set group field to user_id if True
-
RO_RS
(ratios=0.8, 0.1, 0.1, group_by_user=True)[source]¶ Preset about Random Ordering and Ratio-based Splitting.
- Parameters
ratios (list of float) – ratio of each part. No need to normalize. It’s ok with either
[0.8, 0.1, 0.1]
,[8, 1, 1]
or[56, 7, 7]
group_by_user (bool) – set group field to user_id if True
-
TO_LS
(leave_one_num=1, group_by_user=True)[source]¶ Preset about Temporal Ordering and Leave-one-out Splitting.
- Parameters
leave_one_num (int) – number of sub datasets for evaluation. E.g.
leave_one_num=2
if you have one validation dataset and one test dataset.group_by_user (bool) – set group field to user_id if True
-
TO_RS
(ratios=0.8, 0.1, 0.1, group_by_user=True)[source]¶ Preset about Temporal Ordering and Ratio-based Splitting.
- Parameters
ratios (list of float) – ratio of each part. No need to normalize. It’s ok with either
[0.8, 0.1, 0.1]
,[8, 1, 1]
or[56, 7, 7]
group_by_user (bool) – set group field to user_id if True
-
group_by
(field=None)[source]¶ Setting about group
- Parameters
field (str) – The field of dataset grouped by, default None (Not Grouping)
Example
>>> es.group_by('month') >>> es.group_by_user()
-
leave_one_out
(leave_one_num=1)[source]¶ Setting about Splitting by ‘leave-one-out’ strategy.
Note
Requires setting group by.
- Parameters
leave_one_num (int) – number of sub datasets for evaluation. E.g.
leave_one_num = 2
if you have one validation dataset and one test dataset.
-
neg_sample_by
(by, distribution='uniform')[source]¶ Setting about negative sampling by, which means sample several negative records for each positive records.
- Parameters
by (int) – The number of neg cases for one pos case.
distribution (str) – distribution of sampler, either
uniform
orpopularity
.
-
pop100
()[source]¶ Preset about popularity-based sampling 100 items for each positive records while negative sampling.
-
pop1000
()[source]¶ Preset about popularity-based sampling 1000 items for each positive records while negative sampling.
-
set_neg_sampling
(strategy='none', distribution='uniform', **kwargs)[source]¶ Setting about negative sampling
- Parameters
strategy (str) – Either
none
,full
orby
.by (int) – Negative Sampling by neg cases for one pos case.
distribution (str) – distribution of sampler, either ‘uniform’ or ‘popularity’.
Example
>>> es.neg_sample_to(100) >>> es.neg_sample_by(1)
-
set_ordering
(strategy='none', **kwargs)[source]¶ Setting about ordering
- Parameters
strategy (str) – Either
none
,shuffle
orby
field (str or list of str) – Name or list of names
ascending (bool or list of bool) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the field
Example
>>> es.set_ordering('shuffle') >>> es.set_ordering('by', field='timestamp') >>> es.set_ordering('by', field=['timestamp', 'price'], ascending=[True, False])
or
>>> es.random_ordering() >>> es.sort_by('timestamp') # ascending default >>> es.sort_by(field=['timestamp', 'price'], ascending=[True, False])
-
set_splitting
(strategy='none', **kwargs)[source]¶ Setting about split method
- Parameters
strategy (str) – Either
none
,by_ratio
,by_value
orloo
.ratios (list of float) – Dataset will be splited into len(ratios) parts.
field (str) – Split by values of field.
values (list of float or float) – Dataset will be splited into len(values) + 1 parts. The first part will be interactions whose field value in (*, values[0]].
ascending (bool) – Order of values after splitting.
Example
>>> es.leave_one_out() >>> es.split_by_ratio(ratios=[0.8, 0.1, 0.1]) >>> es.split_by_value(field='month', values=[6, 7], ascending=False) # (*, 7], (7, 6], (6, *)
-
sort_by
(field, ascending=None)[source]¶ Setting about Sorting.
Similar with pandas’ sort_values
- Parameters
field (str or list of str) – Name or list of names
ascending (bool or list of bool) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the field
-
split_by_ratio
(ratios)[source]¶ Setting about Ratio-based Splitting.
- Parameters
ratios (list of float) – ratio of each part. No need to normalize. It’s ok with either [0.8, 0.1, 0.1], [8, 1, 1] or [56, 7, 7]