Label of data¶
In recommendation filed, there are two kinds of data scenes: explicit feedback scene and implicit feedback scene.
Explicit feedback, like rating for items, has explicit label for model training. While for implicit feedback, like clicks and purchases, the label of data is vague, and generally we will regard all the observed interaction as the positive samples and select negative samples from unobserved interactions (known as negative sampling).
To supports both explicit feedback scene and implicit feedback scene, RecBole design three ways to set label of data.
1. Set label field¶
If your data has already been labeled, you only need to set LABEL_FIELD
to tell the model
which column represents the label of data, and then set train_neg_sample_args as None.
For example, if your .inter file is like:
user_id:token |
item_id:token |
label:float |
timestamp:float |
---|---|---|---|
1 |
1193 |
1 |
978300760 |
1 |
661 |
0 |
978302109 |
2 |
11 |
1 |
978302009 |
2 |
112 |
1 |
978312344 |
2 |
555 |
0 |
978302321 |
3 |
234 |
1 |
978302109 |
Then, you can set the config like:
LABEL_FIELD: label
train_neg_sample_args: None
Note that the value of your label column should only be 0 or 1 (0 represents the negative label and 1 represents the positive label).
2. Set threshold¶
If your data doesn’t have labels but has users’ feedback information (like rating for items) to show their preferences, a general way to label them is to set threshold.
For example, if you .inter file is like:
user_id:token |
item_id:token |
rating:float |
timestamp:float |
---|---|---|---|
1 |
1193 |
5 |
978300760 |
1 |
661 |
1 |
978302109 |
2 |
11 |
4 |
978302009 |
2 |
112 |
4 |
978312344 |
2 |
555 |
1 |
978302321 |
3 |
234 |
3 |
978302109 |
To set label for these interactions, you can set 3 as the threshold of rating, and the interactions will be labeled as positive if their rating no less than 3.
You can set the config like:
threshold:
rating: 3
train_neg_sample_args: None
And then RecBole will automatically set label for interactions based on their rating column.
3. Negative sampling¶
If your only have implicit feedback data, without label or users’ feedback information. A general way to label these kinds of data is negative sampling. We will assume that for each user, all the observed interactions are positive, and the unobserved ones are negative. And then, we will set positive label for all the observed interactions, and select some negative samples from the unobserved interactions according to a certain strategy.
You can set the config like:
train_neg_sample_args:
uniform: 1
And then, RecBole will automatically select one negative sample for each positive sample uniformly from the unobserved interactions.
At last, for more details about the label config, please read Data settings and Training Settings.