Quick Start: Sequential Recommendation¶
For sequential recommendation, we choose GRU4Rec model to show you how to train and test it on the ml-100k dataset from both API and source code.
Quick-start From API¶
1. Prepare your data:¶
Before running a model, firstly you need to prepare and load data. To help users quickly get start, RecBole has a build-in dataset ml-100k and you can directly use it. However, if you want to use other datasets, you can read Running New Dataset for more information.
Then, you need to set data config for data loading. You can create a yaml file called test.yaml and write the following settings:
# dataset config : Sequential Recommendation
USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
TIME_FIELD: timestamp
load_col:
inter: [user_id, item_id,timestamp]
ITEM_LIST_LENGTH_FIELD: item_length
LIST_SUFFIX: _list
MAX_ITEM_LIST_LENGTH: 50
As you can see, unlike the example of BPR,sequential recommendation models utilize the historical interaction sequences to predict the next item, so it needs to specify and load the user, item and time columns of the dataset. And you should set the maximum length of the sequence.
For sequential dataset, RecBole supports Augmentation processing. For example, suppose u1
interacts with i1,i2,i3,i4,i5
in time and MAX_ITEM_LIST_LENGTH = 3
user_id:token item_id_list:token_seq item_id:token
0 0 1 2 3 4 5
After augmentation, sequential dataset will generate these cases.
user_id:token item_id_list:token_seq item_id:token
0 1 0 0 2
0 1 2 0 3
0 1 2 3 4
0 2 3 4 5
For more details of data config, please refer to Data settings.
2. Choose a model:¶
You can choose a model from our Model Introduction. Here we choose GRU4Rec model to demonstrate how to train and test the sequence recommendation model.
Then, you need to set the parameter for GRU4Rec model. You can check the GRU4Rec and add the model settings into the test.yaml, like:
# model config
embedding_size: 64
hidden_size: 128
num_layers: 1
dropout_prob: 0.3
loss_type: 'CE'
If you want to run different models, you can read Running Different Models for more information.
3. Set training and evaluation config:¶
In RecBole, we support multiple training and evaluation methods. You can choose how to train and test model by simply setting the config.
Here we want to train and test the GRU4Rec model in training-validation-test method (optimize model parameters on the training set, do parameter selection according to the results on the validation set, and finally report the results on the test set) and evaluate the model performance by full ranking with all item candidates, so we can add the following settings into the test.yaml.
# Training and evaluation config
epochs: 500
train_batch_size: 4096
eval_batch_size: 4096
train_neg_sample_args: ~
eval_args:
group_by: user
order: TO
split: {'LS': 'valid_and_test'}
mode: full
metrics: ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']
topk: 10
valid_metric: MRR@10
For more details of training and evaluation config, please refer to Training Settings and Evaluation Settings.
4. Run the model and collect the result¶
Now you have finished all the preparations, it’s time to run the model!
You can create a new python file (e.g., run.py), and write the following code:
from recbole.quick_start import run_recbole
run_recbole(model='GRU4Rec', dataset='ml-100k', config_file_list=['test.yaml'])
Then run the following command:
python run.py
And you will obtain the output like:
16 Jul 21:12 INFO ml-100k
The number of users: 944
Average actions of users: 106.04453870625663
The number of items: 1683
Average actions of items: 59.45303210463734
The number of inters: 100000
The sparsity of the dataset: 93.70575143257098%
Remain Fields: ['user_id', 'item_id', 'timestamp']
16 Jul 21:12 INFO [Training]: train_batch_size = [4096] negative sampling: [None]
16 Jul 21:12 INFO [Evaluation]: eval_batch_size = [4096] eval_args: [{'group_by': 'user', 'order': 'TO', 'split': {'LS': 'valid_and_test'}, 'mode': 'full'}]
16 Jul 21:12 INFO GRU4Rec(
(item_embedding): Embedding(1683, 64, padding_idx=0)
(emb_dropout): Dropout(p=0.3, inplace=False)
(gru_layers): GRU(64, 128, bias=False, batch_first=True)
(dense): Linear(in_features=128, out_features=64, bias=True)
(loss_fct): CrossEntropyLoss()
)
Trainable parameters: 189696
Train 0: 100%|█████████████████████████| 24/24 [00:01<00:00, 15.97it/s, GPU RAM: 1.46 G/31.75 G]
16 Jul 21:12 INFO epoch 0 training [time: 1.50s, train loss: 176.3402]
Evaluate : 100%|██████████████████████████| 1/1 [00:00<00:00, 106.42it/s, GPU RAM: 1.46 G/31.75 G]
16 Jul 21:12 INFO epoch 0 evaluating [time: 0.02s, valid_score: 0.008100]
......
Train 43: 100%|█████████████████████████| 24/24 [00:01<00:00, 17.43it/s, GPU RAM: 1.46 G/31.75 G]
16 Jul 21:13 INFO epoch 43 training [time: 1.38s, train loss: 134.4222]
Evaluate : 100%|███████████████████████████| 1/1 [00:00<00:00, 86.71it/s, GPU RAM: 1.46 G/31.75 G]
16 Jul 21:13 INFO epoch 43 evaluating [time: 0.02s, valid_score: 0.043600]
16 Jul 21:13 INFO valid result:
recall@10 : 0.1326 mrr@10 : 0.0436 ndcg@10 : 0.0641 hit@10 : 0.1326 precision@10 : 0.0133
16 Jul 21:13 INFO Finished training, best eval result in epoch 32
16 Jul 21:13 INFO Loading model structure and parameters from saved/GRU4Rec-Jul-16-2022_21-12-43.pth
Evaluate : 100%|██████████████████████████| 1/1 [00:00<00:00, 238.76it/s, GPU RAM: 1.46 G/31.75 G]
16 Jul 21:13 INFO best valid : OrderedDict([('recall@10', 0.1442), ('mrr@10', 0.0501), ('ndcg@10', 0.0717), ('hit@10', 0.1442), ('precision@10', 0.0144)])
16 Jul 21:13 INFO test result: OrderedDict([('recall@10', 0.1103), ('mrr@10', 0.0337), ('ndcg@10', 0.0513), ('hit@10', 0.1103), ('precision@10', 0.011)])
Finally you will get the model’s performance on the test set and the model file will be saved under the /saved. Besides, RecBole allows tracking and visualizing train loss and valid score with TensorBoard, please read the Use Tensorboard for more details.
The above is the whole process of running a model in RecBole, and you can read other docs for depth usage.
Quick-start From Source¶
Besides using API, you can also directly run the source code of RecBole. The whole process is similar to Quick-start From API. You can create a yaml file called test.yaml and set all the config as follow:
# dataset config : Sequential Recommendation
USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
TIME_FIELD: timestamp
load_col:
inter: [user_id, item_id,timestamp]
ITEM_LIST_LENGTH_FIELD: item_length
LIST_SUFFIX: _list
MAX_ITEM_LIST_LENGTH: 50
# model config
embedding_size: 64
hidden_size: 128
num_layers: 1
dropout_prob: 0.3
loss_type: 'CE'
# Training and evaluation config
epochs: 500
train_batch_size: 4096
eval_batch_size: 4096
train_neg_sample_args: ~
eval_args:
group_by: user
order: TO
split: {'LS': 'valid_and_test'}
mode: full
metrics: ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']
topk: 10
valid_metric: MRR@10
metric_decimal_place: 4
Then run the following command:
python run_recbole.py --model=GRU4Rec --dataset=ml-100k --config_files=test.yaml
And you will get the output of running the GRU4Rec model on the ml-100k dataset.
If you want to change the parameters, such as embedding_size
, just
set the additional command parameters as you need:
python run_recbole.py --model=GRU4Rec --dataset=ml-100k --config_files=test.yaml --embedding_size=100