Config Settings =================== RecBole is able to config different parameters for controlling the experiment setup (e.g., data processing, data splitting, training and evaluation). The users can select the settings according to their own requirements. The introduction of different parameter configurations are presented as follows: Parameters Introduction ----------------------------- The parameters in RecBole can be divided into three categories: Basic Parameters, Dataset Parameters and Model Parameters. Basic Parameters ^^^^^^^^^^^^^^^^^^^^^^ Basic parameters are used to build the general environment including the settings for model training and evaluation. **Environment Setting** - ``gpu_id (int or str)`` : The id of GPU device. Defaults to ``0``. - ``use_gpu (bool)`` : Whether or not to use GPU. If True, using GPU, else using CPU. Defaults to ``True``. - ``seed (int)`` : Random seed. Defaults to ``2020``. - ``state (str)`` : Logging level. Defaults to ``'INFO'``. Range in ``['INFO', 'DEBUG', 'WARNING', 'ERROR', 'CRITICAL']``. - ``reproducibility (bool)`` : If True, the tool will use deterministic convolution algorithms, which makes the result reproducible. If False, the tool will benchmark multiple convolution algorithms and select the fastest one, which makes the result not reproducible but can speed up model training in some case. Defaults to ``True``. - ``data_path (str)`` : The path of input dataset. Defaults to ``'dataset/'``. - ``checkpoint_dir (str)`` : The path to save checkpoint file. Defaults to ``'saved/'``. **Training Setting** - ``epochs (int)`` : The number of training epochs. Defaults to ``300``. - ``train_batch_size (int)`` : The training batch size. Defaults to ``2048``. - ``learner (str)`` : The name of used optimizer. Defaults to ``'adam'``. Range in ``['adam', 'sgd', 'adagrad', 'rmsprop', 'sparse_adam']``. - ``learning_rate (float)`` : Learning rate. Defaults to ``0.001``. - ``training_neg_sample_num (int)`` : The number of negative samples during training. If it is set to 0, the negative sampling operation will not be performed. Defaults to ``1``. - ``training_neg_sample_distribution(str)`` : Distribution of the negative items in training phase. Default to ``uniform``. Range in ``['uniform', 'popularity']``. - ``eval_step (int)`` : The number of training epochs before a evaluation on the valid dataset. If it is less than 1, the model will not be evaluated on the valid dataset. Defaults to ``1``. - ``stopping_step (int)`` : The threshold for validation-based early stopping. Defaults to ``10``. - ``clip_grad_norm (dict)`` : The args of `clip_grad_norm_ `_ which will clips gradient norm of model. Defaults to ``None``. **Evaluation Setting** - ``eval_setting (str)``: The evaluation settings. Defaults to ``'RO_RS,full'``. The parameter has two parts. The first part control the splitting methods, the range is ``['RO_RS','TO_LS','RO_LS','TO_RS']``. The second part(optional) control the ranking mechanism, the range is ``['full','uni100','uni1000']``. - ``group_by_user (bool)``: Whether or not to group the users. It must be ``True`` when ``eval_setting`` is in ``['RO_LS', 'TO_LS']``. Defaults to ``True``. - ``spilt_ratio (list)``: The split ratio between train data, valid data and test data. It only take effects when the first part of ``eval_setting`` is in ``['RO_RS', 'TO_RS']``. Defaults to ``[0.8, 0.1, 0.1]``. - ``leave_one_num (int)``: It only take effects when the first part of ``eval_setting`` is in ``['RO_LS', 'TO_LS']``. Defaults to ``2``. - ``metrics (list or str)``: Evaluation metrics. Defaults to ``['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']``. Range in ``['Recall', 'MRR', 'NDCG', 'Hit', 'MAP', 'Precision', 'AUC', 'MAE', 'RMSE', 'LogLoss']``. - ``topk (list or int or None)``: The value of k for topk evaluation metrics. Defaults to ``10``. - ``valid_metric (str)``: The evaluation metrics for early stopping, it must be one of used ``metrics``. Defaults to ``'MRR@10'``. - ``eval_batch_size (int)``: The evaluation batch size. Defaults to ``4096``. Pleaser refer to :doc:`evaluation_support` for more details about the parameters in Evaluation Setting. Dataset Parameters ^^^^^^^^^^^^^^^^^^^^^^^ Dataset Parameters are used to describe the dataset information and control the dataset loading and filtering. Please refer to :doc:`data/data_args` for more details. Model Parameters ^^^^^^^^^^^^^^^^^^^^^ Model Parameters are used to describe the model structures. Please refer to :doc:`model_intro` for more details. Parameters Configuration ------------------------------ RecBole supports three types of parameter configurations: Config files, Parameter Dicts and Command Line. The parameters are assigned via the Configuration module. Config Files ^^^^^^^^^^^^^^^^ Config Files should be organized in the format of yaml. The users should write their parameters according to the rules aligned with yaml, and the final config files are processed by the configuration module to complete the parameter settings. To begin with, we write the parameters into the yaml files (e.g. `example.yaml`). .. code:: yaml gpu_id: 1 training_batch_size: 1024 Then, the yaml files are conveyed to the configuration module to finish the parameter settings. .. code:: python from recbole.config import Config config = Config(model='BPR', dataset='ml-100k', config_file_list=['example.yaml']) print('gpu_id: ', config['gpu_id']) print('training_batch_size: ', config['training_batch_size']) output: .. code:: bash gpu_id: 1 training_batch_size: 1024 The parameter ``config_file_list`` supports multiple yaml files. For more details on yaml, please refer to YAML_. .. _YAML: https://yaml.org/ When using our toolkit, the parameters belonging to **Dataset parameters** and Evaluation Settings of **Basic Parameters** are recommended to be written into the config files, which may be convenient for reusing the configurations. Parameter Dicts ^^^^^^^^^^^^^^^^^^ Parameter Dict is realized by the dict data structure in python, where the key is the parameter name,and the value is the parameter. The users can write their parameters into a dict, and input it into the configuration module. An example is as follows: .. code:: python from recbole.config import Config parameter_dict = { 'gpu_id': 2, 'training_batch_size': 512 } config = Config(model='BPR', dataset='ml-100k', config_dict=parameter_dict) print('gpu_id: ', config['gpu_id']) print('training_batch_size: ', config['training_batch_size']) output: .. code:: bash gpu_id: 2 training_batch_size: 512 Command Line ^^^^^^^^^^^^^^^^^^^^^^^^ We can also assign parameters based on the command line. The parameters in the command line can be read from the configuration module. The format is: `-–parameter_name=[parameter_value]`. Write the following code to the python file (e.g. `run.py`): .. code:: python from recbole.config import Config config = Config(model='BPR', dataset='ml-100k') print('gpu_id: ', config['gpu_id']) print('training_batch_size: ', config['training_batch_size']) Running: .. code:: bash python run.py --gpu_id=3 --training_batch_size=256 output: .. code:: bash gpu_id: 3 training_batch_size: 256 Priority ^^^^^^^^^^^^^^^^^ RecBole supports the combination of three types of parameter configurations. The priority of the configuration methods is: Command Line > Parameter Dicts > Config Files > Default Settings A example is as follows: `example.yaml`: .. code:: yaml gpu_id: 1 training_batch_size: 1024 `run.py`: .. code:: python from recbole.config import Config parameter_dict = { 'gpu_id': 2, 'training_batch_size': 512 } config = Config(model='BPR', dataset='ml-100k', config_file_list=['example.yaml'], config_dict=parameter_dict) print('gpu_id: ', config['gpu_id']) print('training_batch_size: ', config['training_batch_size']) Running: .. code:: bash python run.py --gpu_id=3 --training_batch_size=256 output: .. code:: bash gpu_id: 3 training_batch_size: 256