Label of data

In recommendation filed, there are two kinds of data scenes: explicit feedback scene and implicit feedback scene.

Explicit feedback, like rating for items, has explicit label for model training. While for implicit feedback, like clicks and purchases, the label of data is vague, and generally we will regard all the observed interaction as the positive samples and select negative samples from unobserved interactions (known as negative sampling).

To supports both explicit feedback scene and implicit feedback scene, RecBole design three ways to set label of data.

1. Set label field

If your data has already been labeled, you only need to set LABEL_FIELD to tell the model which column represents the label of data, and then set train_neg_sample_args as None.

For example, if your .inter file is like:

user_id:token

item_id:token

label:float

timestamp:float

1

1193

1

978300760

1

661

0

978302109

2

11

1

978302009

2

112

1

978312344

2

555

0

978302321

3

234

1

978302109

Then, you can set the config like:

LABEL_FIELD: label
train_neg_sample_args: None

Note that the value of your label column should only be 0 or 1 (0 represents the negative label and 1 represents the positive label).

2. Set threshold

If your data doesn’t have labels but has users’ feedback information (like rating for items) to show their preferences, a general way to label them is to set threshold.

For example, if you .inter file is like:

user_id:token

item_id:token

rating:float

timestamp:float

1

1193

5

978300760

1

661

1

978302109

2

11

4

978302009

2

112

4

978312344

2

555

1

978302321

3

234

3

978302109

To set label for these interactions, you can set 3 as the threshold of rating, and the interactions will be labeled as positive if their rating no less than 3.

You can set the config like:

threshold:
    rating: 3
train_neg_sample_args: None

And then RecBole will automatically set label for interactions based on their rating column.

3. Negative sampling

If your only have implicit feedback data, without label or users’ feedback information. A general way to label these kinds of data is negative sampling. We will assume that for each user, all the observed interactions are positive, and the unobserved ones are negative. And then, we will set positive label for all the observed interactions, and select some negative samples from the unobserved interactions according to a certain strategy.

You can set the config like:

train_neg_sample_args:
    uniform: 1

And then, RecBole will automatically select one negative sample for each positive sample uniformly from the unobserved interactions.

At last, for more details about the label config, please read Data settings and Training Settings.