Parameter Tuning¶
RecBole is featured in the capability of automatic parameter (or hyper-parameter) tuning. RecBole introduces Hyperopt and Ray for parameter tuning. One can readily optimize a given model according to the provided hyper-parameter spaces.
The general steps are given as follows:
To begin with Hyperopt, the user has to claim a
HyperTuning
instance in the running python file (e.g., run_hyper.py):
from recbole.trainer import HyperTuning
from recbole.quick_start import objective_function
hp = HyperTuning(objective_function=objective_function, algo='exhaustive', early_stop=10,
max_evals=100, params_file='model.hyper', fixed_config_file_list=['example.yaml'])
objective_function
is the optimization objective,
the input of objective_function
is the parameter,
and the output is the optimal result of these parameters.
The users can design this objective_function
according to their own requirements.
The user can also use an encapsulated objective_function
, that is:
def objective_function(config_dict=None, config_file_list=None):
config = Config(config_dict=config_dict, config_file_list=config_file_list)
init_seed(config['seed'])
dataset = create_dataset(config)
train_data, valid_data, test_data = data_preparation(config, dataset)
model_name = config['model']
model = get_model(model_name)(config, train_data._dataset).to(config['device'])
trainer = get_trainer(config['MODEL_TYPE'], config['model'])(config, model)
best_valid_score, best_valid_result = trainer.fit(train_data, valid_data, verbose=False)
test_result = trainer.evaluate(test_data)
return {
'model': model_name,
'best_valid_score': best_valid_score,
'valid_score_bigger': config['valid_metric_bigger'],
'best_valid_result': best_valid_result,
'test_result': test_result
}
algo
is the optimization algorithm. RecBole support three tunning methods:
‘exhaustive’: grid search, in this case, ‘max_evals’ is auto set;
‘random’: random search, in this case, ‘max_evals’ needs to be set manually;
‘bayes’: Bayesian HyperOpt, in this case, ‘max_evals’ needs to be set manually.
In addition, we also support user-defined tunning method.
# Grid Search
hp1 = HyperTuning(algo='exhaustive')
# Random Search
hp2 = HyperTuning(algo='random')
# Bayesian HyperOpt
hp3 = HyperTuning(algo='bayes')
# User-Defined Search
hp4 = HyperTuning(algo=your_function)
params_file
is the ranges of the parameters, which is exampled as
(e.g., model.hyper):
learning_rate loguniform -8,0
embedding_size choice [64,96,128]
mlp_hidden_size choice ['[64,64,64]','[128,128]']
Each line represents a parameter and the corresponding search range. There are three components: parameter name, range type, range.
HyperTuning
supports four range types,
the details are as follows:
range type |
range |
discription |
---|---|---|
choice |
options(list) |
search in options |
uniform |
low(int),high(int) |
search in uniform distribution: (low,high) |
loguniform |
low(int),high(int) |
search in uniform distribution: exp(uniform(low,high)) |
quniform |
low(int),high(int),q(int) |
search in uniform distribution: round(uniform(low,high)/q)*q |
It should be noted that if the parameters are list and the range type is choice,
then the inner list should be quoted, e.g., mlp_hidden_size
in model.hyper.
fixed_config_file_list
is the fixed parameters, e.g., dataset related parameters and evaluation parameters.
These parameters should be aligned with the format in config_file_list
. See details as Config Introduction.
Calling method of HyperTuning like:
from recbole.trainer import HyperTuning
from recbole.quick_start import objective_function
hp = HyperTuning(objective_function=objective_function, algo='exhaustive', early_stop=10,
max_evals=100, params_file='model.hyper', fixed_config_file_list=['example.yaml'])
# run
hp.run()
# export result to the file
hp.export_result(output_file='hyper_example.result')
# print best parameters
print('best params: ', hp.best_params)
# print best result
print('best result: ')
print(hp.params2result[hp.params2str(hp.best_params)])
Run like:
python run_hyper.py --config_files=[config_files] --params_file=[params_file] --output_file=[output_file] --tool=Hyperopt
config_files
is the config files containing fixed parameters, params_file
is the file containing fixed parameters,:attr:output_file is the output file containing the results,
tool
decides whether to use H or R should be selected in ['Hyperopt','Ray']
,which can be controlled by the command line or the yaml configuration files.
For example:
dataset: ml-100k
model: BPR
A simple example is to search the learning_rate
and embedding_size
in BPR, that is,
running_parameters:
{'embedding_size': 128, 'learning_rate': 0.005}
current best valid score: 0.3795
current best valid result:
{'recall@10': 0.2008, 'mrr@10': 0.3795, 'ndcg@10': 0.2151, 'hit@10': 0.7306, 'precision@10': 0.1466}
current test result:
{'recall@10': 0.2186, 'mrr@10': 0.4388, 'ndcg@10': 0.2591, 'hit@10': 0.7381, 'precision@10': 0.1784}
...
best params: {'embedding_size': 64, 'learning_rate': 0.001}
best result: {
'best_valid_result': {'recall@10': 0.2169, 'mrr@10': 0.4005, 'ndcg@10': 0.235, 'hit@10': 0.7582, 'precision@10': 0.1598}
'test_result': {'recall@10': 0.2368, 'mrr@10': 0.4519, 'ndcg@10': 0.2768, 'hit@10': 0.7614, 'precision@10': 0.1901}
}
After running, we will also generate an HTML file, which contains a line chart to show the process of hyper parameter search.
To begin with ray, the user has to initialize ray in the running pyhton file(e.g., run_hyper.py):
import ray
ray.init()
Similar to Hyperopt, ray also requires objective_function
as optimization target.
For the details of the objective_function
, please refer to the introduction in Hyperopt above.
Schedulers
is optimization algorithms which can early terminate bad trials, pause trials, clone trials, and alter hyperparameters of a running trial.
All Trial Schedulers take in a metric, which is a value returned in the result dict of your Trainable and is maximized or minimized according to mode.
from ray.tune.schedulers import ASHAScheduler
scheduler = ASHAScheduler(
metric="recall@10",
mode="max",
max_t=100,
grace_period=1,
reduction_factor=2)
tune.run( ... , scheduler=asha_scheduler)
Calling tune.run
for analyzing result like:
from ray import tune
result = tune.run(
tune.with_parameters(objective_function, config_file_list=config_file_list),
config=config,
num_samples=5,
log_to_file=args.output_file,
scheduler=scheduler,
local_dir=local_dir,
resources_per_trial={
"gpu": 1
}
)
best_trial = result.get_best_trial("recall@10", "max", "last")
print("best params: ",best_trial.config)
print("best result: ",best_trial.last_result)
To leverage GPUs, you must set gpu
in resources_per_trial
.
This will automatically set CUDA_VISIBLE_DEVICES
for each trial.
Run like:
python run_hyper.py --config_files=[config_files] --output_file=[output_file] --tool=Ray
Note that when using Ray to tune parameters, the working directory will become the local_dir
which is set in run_hyper.py
,
so you need to set the absolute path of the dataset in the config file.
For example:
dataset: ml-100k
model: BPR
data_path: /home/user/RecBole/dataset
A simple example is to search the learning_rate
and embedding_size
in BPR, that is,
== Status ==
Current time: 2022-07-23 22:33:19 (running for 00:02:12.90)
Memory usage on this node: 19.5/125.8 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: None
Resources requested: 5.0/40 CPUs, 0/2 GPUs, 0.0/77.29 GiB heap, 0.0/37.12 GiB objects (0.0/1.0 accelerator_type:K40)
Result logdir: /home/wangzhenlei/wanglei/dev-bole/RecBole/ray_log/objective_function_2022-07-23_22-31-06
Number of trials: 5/5 (5 RUNNING)
+--------------------------------+----------+----------------------+------------------+-----------------+
| Trial name | status | loc | embedding_size | learning_rate |
|--------------------------------+----------+----------------------+------------------+-----------------|
| objective_function_16400_00000 | RUNNING | ***.***.***.**:21392 | 8 | 0.0542264 |
| objective_function_16400_00001 | RUNNING | ***.***.***.**:21443 | 8 | 0.00055313 |
| objective_function_16400_00002 | RUNNING | ***.***.***.**:21446 | 8 | 0.000639818 |
| objective_function_16400_00003 | RUNNING | ***.***.***.**:21448 | 8 | 0.00456223 |
| objective_function_16400_00004 | RUNNING | ***.***.***.**:21449 | 8 | 0.00265045 |
+--------------------------------+----------+----------------------+------------------+-----------------+
...
2022-07-23 22:35:22,868 INFO tune.py:748 -- Total run time: 256.58 seconds (256.42 seconds for the tuning loop).
best params: {'embedding_size': 8, 'learning_rate': 0.004562228847261371}
best result: {'recall@10': 0.2148, 'mrr@10': 0.4161, 'ndcg@10': 0.2489, 'hit@10': 0.7444, 'precision@10': 0.1761, 'time_this_iter_s': 227.5052626132965, 'done': True, 'timesteps_total': None, 'episodes_total': None, 'training_iteration': 1, 'trial_id': '16400_00003', 'experiment_id': '3864900644e743d5b75c67a2e904183a', 'date': '2022-07-23_22-34-59', 'timestamp': 1658586899, 'time_total_s': 227.5052626132965, 'pid': 21448, 'hostname': 'aibox-94', 'node_ip': '183.174.228.94', 'config': {'embedding_size': 8, 'learning_rate': 0.004562228847261371}, 'time_since_restore': 227.5052626132965, 'timesteps_since_restore': 0, 'iterations_since_restore': 1, 'warmup_time': 0.004939079284667969, 'experiment_tag': '3_embedding_size=8,learning_rate=0.0046'}
Users can use ray distributed tuning by changing ray.init
as follows:
import ray
ray.init(address='auto')
For details, please refer to Ray’s official website https://docs.ray.io .