recbole.config.eval_setting

class recbole.config.eval_setting.EvalSetting(config)[source]

Bases: object

Class containing settings about model evaluation.

Evaluation setting contains four parts:
  • Group

  • Sort

  • Split

  • Negative Sample

APIs are provided for users to set up or modify their evaluation setting easily and clearly.

Besides, some presets are provided, which is more recommended.

For example:

RO: Random Ordering TO: Temporal Ordering

RS: Ratio-based Splitting LS: Leave-one-out Splitting

full: adopt the entire item set (excluding ground-truth items) for ranking uniXX: uniform sampling XX items while negative sampling popXX: popularity-based sampling XX items while negative sampling

Note that records are grouped by user_id by default if you use these presets.

Thus you can use RO_RS, full to represent Shuffle, Grouped by user, Ratio-based Splitting and Evaluate all non-ground-truth items.

Check out Revisiting Alternative Experimental Settings for Evaluating Top-N Item Recommendation Algorithms Wayne Xin Zhao et.al. CIKM 2020 to figure out the details about presets of evaluation settings.

Parameters

config (Config) – Global configuration object.

group_field

Don’t group if None, else group by field before splitting. Usually records are grouped by user id.

Type

str or None

ordering_args

Args about ordering. Usually records are sorted by timestamp, or shuffled.

Type

dict

split_args

Args about splitting. usually records are splitted by ratio (eg. 8:1:1), or by ‘leave one out’ strategy, which means the last purchase record of one user is used for evaluation.

Type

dict

neg_sample_args

Args about negative sampling. Negative sample is used wildly in training and evaluating.

We provide two strategies:

  • neg_sample_by: sample several negative records for each positive records.

  • full_sort: don’t negative sample, while all unused items are used for evaluation.

Type

dict

RO_LS(leave_one_num=1, group_by_user=True)[source]

Preset about Random Ordering and Leave-one-out Splitting.

Parameters
  • leave_one_num (int) – number of sub datasets for evaluation. E.g. leave_one_num=2 if you have one validation dataset and one test dataset.

  • group_by_user (bool) – set group field to user_id if True

RO_RS(ratios=0.8, 0.1, 0.1, group_by_user=True)[source]

Preset about Random Ordering and Ratio-based Splitting.

Parameters
  • ratios (list of float) – ratio of each part. No need to normalize. It’s ok with either [0.8, 0.1, 0.1], [8, 1, 1] or [56, 7, 7]

  • group_by_user (bool) – set group field to user_id if True

TO_LS(leave_one_num=1, group_by_user=True)[source]

Preset about Temporal Ordering and Leave-one-out Splitting.

Parameters
  • leave_one_num (int) – number of sub datasets for evaluation. E.g. leave_one_num=2 if you have one validation dataset and one test dataset.

  • group_by_user (bool) – set group field to user_id if True

TO_RS(ratios=0.8, 0.1, 0.1, group_by_user=True)[source]

Preset about Temporal Ordering and Ratio-based Splitting.

Parameters
  • ratios (list of float) – ratio of each part. No need to normalize. It’s ok with either [0.8, 0.1, 0.1], [8, 1, 1] or [56, 7, 7]

  • group_by_user (bool) – set group field to user_id if True

full()[source]

Preset about adopt the entire item set (excluding ground-truth items) for ranking.

group_by(field=None)[source]

Setting about group

Parameters

field (str) – The field of dataset grouped by, default None (Not Grouping)

Example

>>> es.group_by('month')
>>> es.group_by_user()
group_by_user()[source]

Group by user

Note

Requires USER_ID_FIELD in config

leave_one_out(leave_one_num=1)[source]

Setting about Splitting by ‘leave-one-out’ strategy.

Note

Requires setting group by.

Parameters

leave_one_num (int) – number of sub datasets for evaluation. E.g. leave_one_num = 2 if you have one validation dataset and one test dataset.

neg_sample_by(by, distribution='uniform')[source]

Setting about negative sampling by, which means sample several negative records for each positive records.

Parameters
  • by (int) – The number of neg cases for one pos case.

  • distribution (str) – distribution of sampler, either uniform or popularity.

pop100()[source]

Preset about popularity-based sampling 100 items for each positive records while negative sampling.

pop1000()[source]

Preset about popularity-based sampling 1000 items for each positive records while negative sampling.

random_ordering()[source]

Shuffle Setting

set_neg_sampling(strategy='none', distribution='uniform', **kwargs)[source]

Setting about negative sampling

Parameters
  • strategy (str) – Either none, full or by.

  • by (int) – Negative Sampling by neg cases for one pos case.

  • distribution (str) – distribution of sampler, either ‘uniform’ or ‘popularity’.

Example

>>> es.neg_sample_to(100)
>>> es.neg_sample_by(1)
set_ordering(strategy='none', **kwargs)[source]

Setting about ordering

Parameters
  • strategy (str) – Either none, shuffle or by

  • field (str or list of str) – Name or list of names

  • ascending (bool or list of bool) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the field

Example

>>> es.set_ordering('shuffle')
>>> es.set_ordering('by', field='timestamp')
>>> es.set_ordering('by', field=['timestamp', 'price'], ascending=[True, False])

or

>>> es.random_ordering()
>>> es.sort_by('timestamp') # ascending default
>>> es.sort_by(field=['timestamp', 'price'], ascending=[True, False])
set_splitting(strategy='none', **kwargs)[source]

Setting about split method

Parameters
  • strategy (str) – Either none, by_ratio, by_value or loo.

  • ratios (list of float) – Dataset will be splited into len(ratios) parts.

  • field (str) – Split by values of field.

  • values (list of float or float) – Dataset will be splited into len(values) + 1 parts. The first part will be interactions whose field value in (*, values[0]].

  • ascending (bool) – Order of values after splitting.

Example

>>> es.leave_one_out()
>>> es.split_by_ratio(ratios=[0.8, 0.1, 0.1])
>>> es.split_by_value(field='month', values=[6, 7], ascending=False)    # (*, 7], (7, 6], (6, *)
sort_by(field, ascending=None)[source]

Setting about Sorting.

Similar with pandas’ sort_values

Parameters
  • field (str or list of str) – Name or list of names

  • ascending (bool or list of bool) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the field

split_by_ratio(ratios)[source]

Setting about Ratio-based Splitting.

Parameters

ratios (list of float) – ratio of each part. No need to normalize. It’s ok with either [0.8, 0.1, 0.1], [8, 1, 1] or [56, 7, 7]

temporal_ordering()[source]

Setting about Sorting by timestamp.

Note

Requires TIME_FIELD in config

uni100()[source]

Preset about uniform sampling 100 items for each positive records while negative sampling.

uni1000()[source]

Preset about uniform sampling 1000 items for each positive records while negative sampling.