Parameter Tuning¶
RecBole is featured in the capability of automatic parameter (or hyper-parameter) tuning. One can readily optimize a given model according to the provided hyper-parameter spaces.
The general steps are given as follows:
To begin with, the user has to claim a
HyperTuning
instance in the running python file (e.g., run.py):
from recbole.trainer import HyperTuning
from recbole.quick_start import objective_function
hp = HyperTuning(objective_function=objective_function, algo='exhaustive',
params_file='model.hyper', fixed_config_file_list=['example.yaml'])
objective_function
is the optimization objective,
the input of objective_function
is the parameter,
and the output is the optimal result of these parameters.
The users can design this objective_function
according to their own requirements.
The user can also use an encapsulated objective_function
, that is:
def objective_function(config_dict=None, config_file_list=None):
config = Config(config_dict=config_dict, config_file_list=config_file_list)
init_seed(config['seed'])
dataset = create_dataset(config)
train_data, valid_data, test_data = data_preparation(config, dataset)
model = get_model(config['model'])(config, train_data).to(config['device'])
trainer = get_trainer(config['MODEL_TYPE'], config['model'])(config, model)
best_valid_score, best_valid_result = trainer.fit(train_data, valid_data, verbose=False)
test_result = trainer.evaluate(test_data)
return {
'best_valid_score': best_valid_score,
'valid_score_bigger': config['valid_metric_bigger'],
'best_valid_result': best_valid_result,
'test_result': test_result
}
algo
is the optimization algorithm. RecBole realizes this module based
on hyperopt. In addition, we also support grid search tunning method.
from hyperopt import tpe
# hyperopt 自带的优化算法
hp1 = HyperTuning(algo=tpe.suggest)
# Grid Search
hp2 = HyperTuning(algo='exhaustive')
params_file
is the ranges of the parameters, which is exampled as
(e.g., model.hyper):
learning_rate loguniform -8,0
embedding_size choice [64,96,128]
mlp_hidden_size choice ['[64,64,64]','[128,128]']
Each line represents a parameter and the corresponding search range. There are three components: parameter name, range type, range.
HyperTuning
supports four range types,
the details are as follows:
range type |
range |
discription |
---|---|---|
choice |
options(list) |
search in options |
uniform |
low(int),high(int) |
search in uniform distribution: (low,high) |
loguniform |
low(int),high(int) |
search in uniform distribution: exp(uniform(low,high)) |
quniform |
low(int),high(int),q(int) |
search in uniform distribution: round(uniform(low,high)/q)*q |
It should be noted that if the parameters are list and the range type is choice,
then the inner list should be quoted, e.g., mlp_hidden_size
in model.hyper.
fixed_config_file_list
is the fixed parameters, e.g., dataset related parameters and evaluation parameters.
These parameters should be aligned with the format in config_file_list
. See details as Config Introduction.
Calling method of HyperTuning like:
from recbole.trainer import HyperTuning
from recbole.quick_start import objective_function
hp = HyperTuning(objective_function=objective_function, algo='exhaustive',
params_file='model.hyper', fixed_config_file_list=['example.yaml'])
# run
hp.run()
# export result to the file
hp.export_result(output_file='hyper_example.result')
# print best parameters
print('best params: ', hp.best_params)
# print best result
print('best result: ')
print(hp.params2result[hp.params2str(hp.best_params)])
Run like:
python run.py --dataset=[dataset_name] --model=[model_name]
dataset_name
is the dataset name, model_name
is the model name, which can be controlled by the command line or the yaml configuration files.
For example:
dataset: ml-100k
model: BPR
A simple example is to search the learning_rate
and embedding_size
in BPR, that is,
running_parameters:
{'embedding_size': 128, 'learning_rate': 0.005}
current best valid score: 0.3795
current best valid result:
{'recall@10': 0.2008, 'mrr@10': 0.3795, 'ndcg@10': 0.2151, 'hit@10': 0.7306, 'precision@10': 0.1466}
current test result:
{'recall@10': 0.2186, 'mrr@10': 0.4388, 'ndcg@10': 0.2591, 'hit@10': 0.7381, 'precision@10': 0.1784}
...
best params: {'embedding_size': 64, 'learning_rate': 0.001}
best result: {
'best_valid_result': {'recall@10': 0.2169, 'mrr@10': 0.4005, 'ndcg@10': 0.235, 'hit@10': 0.7582, 'precision@10': 0.1598}
'test_result': {'recall@10': 0.2368, 'mrr@10': 0.4519, 'ndcg@10': 0.2768, 'hit@10': 0.7614, 'precision@10': 0.1901}
}