Training SettingsΒΆ
Training settings are designed to set parameters about model training.
epochs (int)
: The number of training epochs. Defaults to300
.train_batch_size (int)
: The training batch size. Defaults to2048
.learner (str)
: The name of used optimizer. Defaults to'adam'
. Range in['adam', 'sgd', 'adagrad', 'rmsprop', 'sparse_adam']
.learning_rate (float)
: Learning rate. Defaults to0.001
.train_neg_sample_args (dict)
: This parameter have 4 keys:distribution
,sample_num
,dynamic
, andcandidate_num
.distribution (str)
: decides the distribution of negative items in sampling pools. Now we support two kinds of distribution:['uniform', 'popularity']
.uniform
means uniformly select negative items whilepopularity
means select negative items based on their popularity (Counter(item) in .inter file). The default value isuniform
.sample_num (int)
: decides the number of negative samples we intend to take. The default value is1
.dynamic (bool)
: decides whether we adopt dynamic negative sampling. The default value isFalse
.candidate_num (int)
: decides the number of candidate negative items when dynamic negative sampling. The default value is0
.
eval_step (int)
: The number of training epochs before an evaluation on the valid dataset. If it is less than 1, the model will not be evaluated on the valid dataset. Defaults to1
.stopping_step (int)
: The threshold for validation-based early stopping. Defaults to10
.clip_grad_norm (dict)
: The args of clip_grad_norm_ which will clip gradient norm of model. Defaults toNone
.loss_decimal_place(int)
: The decimal place of training loss. Defaults to4
.weight_decay (float)
: The weight decay (L2 penalty), used for optimizer. Default to0.0
.require_pow (bool)
: The sign identifies whether the power operation is performed based on the norm in EmbLoss. Defaults toFalse
.enable_amp (bool)
: The parameter determines whether to use mixed precision training. Defaults toFalse
.enable_scaler (bool)
: The parameter determines whether to use GradScaler that is often used with mixed precision training to avoid gradient precision overflow. Defaults toFalse
.