recbole.config.eval_setting¶

class recbole.config.eval_setting.EvalSetting(config)[source]¶

Bases: object

Class containing settings about model evaluation.

Evaluation setting contains four parts:

Group
Sort
Split
Negative Sample

APIs are provided for users to set up or modify their evaluation setting easily and clearly.

Besides, some presets are provided, which is more recommended.

For example:

RO: Random Ordering TO: Temporal Ordering

RS: Ratio-based Splitting LS: Leave-one-out Splitting

full: adopt the entire item set (excluding ground-truth items) for ranking uniXX: uniform sampling XX items while negative sampling popXX: popularity-based sampling XX items while negative sampling

Note that records are grouped by user_id by default if you use these presets.

Thus you can use RO_RS, full to represent Shuffle, Grouped by user, Ratio-based Splitting and Evaluate all non-ground-truth items.

Check out Revisiting Alternative Experimental Settings for Evaluating Top-N Item Recommendation Algorithms Wayne Xin Zhao et.al. CIKM 2020 to figure out the details about presets of evaluation settings.

Parameters: config (Config) – Global configuration object.

group_field¶

Don’t group if None, else group by field before splitting. Usually records are grouped by user id.

Type: str or None

ordering_args¶

Args about ordering. Usually records are sorted by timestamp, or shuffled.

Type: dict

split_args¶

Args about splitting. usually records are splitted by ratio (eg. 8:1:1), or by ‘leave one out’ strategy, which means the last purchase record of one user is used for evaluation.

Type: dict

neg_sample_args¶

Args about negative sampling. Negative sample is used wildly in training and evaluating.

We provide two strategies:

neg_sample_by: sample several negative records for each positive records.
full_sort: don’t negative sample, while all unused items are used for evaluation.

Type: dict

RO_LS(leave_one_num=1, group_by_user=True)[source]¶

Preset about Random Ordering and Leave-one-out Splitting.

Parameters

leave_one_num (int) – number of sub datasets for evaluation. E.g. leave_one_num=2 if you have one validation dataset and one test dataset.
group_by_user (bool) – set group field to user_id if True

RO_RS(ratios=0.8, 0.1, 0.1, group_by_user=True)[source]¶

Preset about Random Ordering and Ratio-based Splitting.

Parameters

ratios (list of float) – ratio of each part. No need to normalize. It’s ok with either [0.8, 0.1, 0.1], [8, 1, 1] or [56, 7, 7]
group_by_user (bool) – set group field to user_id if True

TO_LS(leave_one_num=1, group_by_user=True)[source]¶

Preset about Temporal Ordering and Leave-one-out Splitting.

Parameters

leave_one_num (int) – number of sub datasets for evaluation. E.g. leave_one_num=2 if you have one validation dataset and one test dataset.
group_by_user (bool) – set group field to user_id if True

TO_RS(ratios=0.8, 0.1, 0.1, group_by_user=True)[source]¶

Preset about Temporal Ordering and Ratio-based Splitting.

Parameters

ratios (list of float) – ratio of each part. No need to normalize. It’s ok with either [0.8, 0.1, 0.1], [8, 1, 1] or [56, 7, 7]
group_by_user (bool) – set group field to user_id if True

full()[source]¶: Preset about adopt the entire item set (excluding ground-truth items) for ranking.

group_by(field=None)[source]¶

Setting about group

Parameters: field (str) – The field of dataset grouped by, default None (Not Grouping)

Example

>>> es.group_by('month')
>>> es.group_by_user()

group_by_user()[source]¶: Group by user

Note

Requires USER_ID_FIELD in config

leave_one_out(leave_one_num=1)[source]¶

Setting about Splitting by ‘leave-one-out’ strategy.

Note

Requires setting group by.

Parameters: leave_one_num (int) – number of sub datasets for evaluation. E.g. leave_one_num = 2 if you have one validation dataset and one test dataset.

neg_sample_by(by, distribution='uniform')[source]¶

Setting about negative sampling by, which means sample several negative records for each positive records.

Parameters

by (int) – The number of neg cases for one pos case.
distribution (str) – distribution of sampler, either uniform or popularity.

pop100()[source]¶: Preset about popularity-based sampling 100 items for each positive records while negative sampling.

pop1000()[source]¶: Preset about popularity-based sampling 1000 items for each positive records while negative sampling.

random_ordering()[source]¶: Shuffle Setting

set_neg_sampling(strategy='none', distribution='uniform', **kwargs)[source]¶

Setting about negative sampling

Parameters

strategy (str) – Either none, full or by.
by (int) – Negative Sampling by neg cases for one pos case.
distribution (str) – distribution of sampler, either ‘uniform’ or ‘popularity’.

Example

>>> es.neg_sample_to(100)
>>> es.neg_sample_by(1)

set_ordering(strategy='none', **kwargs)[source]¶

Setting about ordering

Parameters

strategy (str) – Either none, shuffle or by
field (str or list of str) – Name or list of names
ascending (bool or list of bool) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the field

Example

>>> es.set_ordering('shuffle')
>>> es.set_ordering('by', field='timestamp')
>>> es.set_ordering('by', field=['timestamp', 'price'], ascending=[True, False])

or

>>> es.random_ordering()
>>> es.sort_by('timestamp') # ascending default
>>> es.sort_by(field=['timestamp', 'price'], ascending=[True, False])

set_splitting(strategy='none', **kwargs)[source]¶

Setting about split method

Parameters

strategy (str) – Either none, by_ratio, by_value or loo.
ratios (list of float) – Dataset will be splited into len(ratios) parts.
field (str) – Split by values of field.
values (list of float or float) – Dataset will be splited into len(values) + 1 parts. The first part will be interactions whose field value in (*, values[0]].
ascending (bool) – Order of values after splitting.

Example

>>> es.leave_one_out()
>>> es.split_by_ratio(ratios=[0.8, 0.1, 0.1])
>>> es.split_by_value(field='month', values=[6, 7], ascending=False)    # (*, 7], (7, 6], (6, *)

sort_by(field, ascending=None)[source]¶

Setting about Sorting.

Similar with pandas’ sort_values

Parameters

field (str or list of str) – Name or list of names
ascending (bool or list of bool) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the field

split_by_ratio(ratios)[source]¶

Setting about Ratio-based Splitting.

Parameters: ratios (list of float) – ratio of each part. No need to normalize. It’s ok with either [0.8, 0.1, 0.1], [8, 1, 1] or [56, 7, 7]

temporal_ordering()[source]¶: Setting about Sorting by timestamp.

Note

Requires TIME_FIELD in config

uni100()[source]¶: Preset about uniform sampling 100 items for each positive records while negative sampling.

uni1000()[source]¶: Preset about uniform sampling 1000 items for each positive records while negative sampling.