Config Settings

RecBole is able to config different parameters for controlling the experiment setup (e.g., data processing, data splitting, training and evaluation). The users can select the settings according to their own requirements.

The introduction of different parameter configurations are presented as follows:

Parameters Introduction

The parameters in RecBole can be divided into three categories: Basic Parameters, Dataset Parameters and Model Parameters.

Basic Parameters

Basic parameters are used to build the general environment including the settings for model training and evaluation.

Environment Setting

  • gpu_id (int or str) : The id of GPU device. Defaults to 0.

  • use_gpu (bool) : Whether or not to use GPU. If True, using GPU, else using CPU. Defaults to True.

  • seed (int) : Random seed. Defaults to 2020.

  • state (str) : Logging level. Defaults to 'INFO'. Range in ['INFO', 'DEBUG', 'WARNING', 'ERROR', 'CRITICAL'].

  • reproducibility (bool) : If True, the tool will use deterministic convolution algorithms, which makes the result reproducible. If False, the tool will benchmark multiple convolution algorithms and select the fastest one, which makes the result not reproducible but can speed up model training in some case. Defaults to True.

  • data_path (str) : The path of input dataset. Defaults to 'dataset/'.

  • checkpoint_dir (str) : The path to save checkpoint file. Defaults to 'saved/'.

Training Setting

  • epochs (int) : The number of training epochs. Defaults to 300.

  • train_batch_size (int) : The training batch size. Defaults to 2048.

  • learner (str) : The name of used optimizer. Defaults to 'adam'. Range in ['adam', 'sgd', 'adagrad', 'rmsprop', 'sparse_adam'].

  • learning_rate (float) : Learning rate. Defaults to 0.001.

  • training_neg_sample_num (int) : The number of negative samples during training. If it is set to 0, the negative sampling operation will not be performed. Defaults to 1.

  • training_neg_sample_distribution(str) : Distribution of the negative items in training phase. Default to uniform. Range in ['uniform', 'popularity'].

  • eval_step (int) : The number of training epochs before a evaluation on the valid dataset. If it is less than 1, the model will not be evaluated on the valid dataset. Defaults to 1.

  • stopping_step (int) : The threshold for validation-based early stopping. Defaults to 10.

  • clip_grad_norm (dict) : The args of clip_grad_norm_ which will clips gradient norm of model. Defaults to None.

Evaluation Setting

  • eval_setting (str): The evaluation settings. Defaults to 'RO_RS,full'. The parameter has two parts. The first part control the splitting methods, the range is ['RO_RS','TO_LS','RO_LS','TO_RS']. The second part(optional) control the ranking mechanism, the range is ['full','uni100','uni1000'].

  • group_by_user (bool): Whether or not to group the users. It must be True when eval_setting is in ['RO_LS', 'TO_LS']. Defaults to True.

  • spilt_ratio (list): The split ratio between train data, valid data and test data. It only take effects when the first part of eval_setting is in ['RO_RS', 'TO_RS']. Defaults to [0.8, 0.1, 0.1].

  • leave_one_num (int): It only take effects when the first part of eval_setting is in ['RO_LS', 'TO_LS']. Defaults to 2.

  • metrics (list or str): Evaluation metrics. Defaults to ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']. Range in ['Recall', 'MRR', 'NDCG', 'Hit', 'MAP', 'Precision', 'AUC', 'MAE', 'RMSE', 'LogLoss'].

  • topk (list or int or None): The value of k for topk evaluation metrics. Defaults to 10.

  • valid_metric (str): The evaluation metrics for early stopping, it must be one of used metrics. Defaults to 'MRR@10'.

  • eval_batch_size (int): The evaluation batch size. Defaults to 4096.

Pleaser refer to Evaluation Support for more details about the parameters in Evaluation Setting.

Dataset Parameters

Dataset Parameters are used to describe the dataset information and control the dataset loading and filtering.

Please refer to Args for Data for more details.

Model Parameters

Model Parameters are used to describe the model structures.

Please refer to Model Introduction for more details.

Parameters Configuration

RecBole supports three types of parameter configurations: Config files, Parameter Dicts and Command Line. The parameters are assigned via the Configuration module.

Config Files

Config Files should be organized in the format of yaml. The users should write their parameters according to the rules aligned with yaml, and the final config files are processed by the configuration module to complete the parameter settings.

To begin with, we write the parameters into the yaml files (e.g. example.yaml).

gpu_id: 1
training_batch_size: 1024

Then, the yaml files are conveyed to the configuration module to finish the parameter settings.

from recbole.config import Config

config = Config(model='BPR', dataset='ml-100k', config_file_list=['example.yaml'])
print('gpu_id: ', config['gpu_id'])
print('training_batch_size: ', config['training_batch_size'])

output:

gpu_id: 1
training_batch_size: 1024

The parameter config_file_list supports multiple yaml files.

For more details on yaml, please refer to YAML.

When using our toolkit, the parameters belonging to Dataset parameters and Evaluation Settings of Basic Parameters are recommended to be written into the config files, which may be convenient for reusing the configurations.

Parameter Dicts

Parameter Dict is realized by the dict data structure in python, where the key is the parameter name,and the value is the parameter. The users can write their parameters into a dict, and input it into the configuration module.

An example is as follows:

from recbole.config import Config

parameter_dict = {
    'gpu_id': 2,
    'training_batch_size': 512
}
config = Config(model='BPR', dataset='ml-100k', config_dict=parameter_dict)
print('gpu_id: ', config['gpu_id'])
print('training_batch_size: ', config['training_batch_size'])

output:

gpu_id: 2
training_batch_size: 512

Command Line

We can also assign parameters based on the command line. The parameters in the command line can be read from the configuration module. The format is: -–parameter_name=[parameter_value].

Write the following code to the python file (e.g. run.py):

from recbole.config import Config

config = Config(model='BPR', dataset='ml-100k')
print('gpu_id: ', config['gpu_id'])
print('training_batch_size: ', config['training_batch_size'])

Running:

python run.py --gpu_id=3 --training_batch_size=256

output:

gpu_id: 3
training_batch_size: 256

Priority

RecBole supports the combination of three types of parameter configurations.

The priority of the configuration methods is: Command Line > Parameter Dicts > Config Files > Default Settings

A example is as follows:

example.yaml:

gpu_id: 1
training_batch_size: 1024

run.py:

from recbole.config import Config

parameter_dict = {
    'gpu_id': 2,
    'training_batch_size': 512
}
config = Config(model='BPR', dataset='ml-100k', config_file_list=['example.yaml'], config_dict=parameter_dict)
print('gpu_id: ', config['gpu_id'])
print('training_batch_size: ', config['training_batch_size'])

Running:

python run.py --gpu_id=3 --training_batch_size=256

output:

gpu_id: 3
training_batch_size: 256