recbole.data.sequential_dataset¶

class recbole.data.dataset.sequential_dataset.SequentialDataset(config, saved_dataset=None)[source]¶

Bases: recbole.data.dataset.dataset.Dataset

SequentialDataset is based on Dataset, and provides augmentation interface to adapt to Sequential Recommendation, which can accelerate the data loader.

uid_list¶

List of user id after augmentation.

Type: numpy.ndarray

item_list_index¶

List of indexes of item sequence after augmentation.

Type: numpy.ndarray

target_index¶

List of indexes of target item id after augmentation.

Type: numpy.ndarray

item_list_length¶

List of item sequences’ length after augmentation.

Type: numpy.ndarray

leave_one_out(group_by, leave_one_num=1)[source]¶

Split interaction records by leave one out strategy.

Parameters

group_by (str) – Field name that interaction records should grouped by before splitting.
leave_one_num (int, optional) – Number of parts whose length is expected to be 1. Defaults to 1.

Returns

List of Dataset, whose interaction features has been splitted.

Return type

list

prepare_data_augmentation()[source]¶

Augmentation processing for sequential dataset.

E.g., u1 has purchase sequence <i1, i2, i3, i4>, then after augmentation, we will generate three cases.

u1, <i1> | i2

(Which means given user_id u1 and item_seq <i1>, we need to predict the next item i2.)

The other cases are below:

u1, <i1, i2> | i3

u1, <i1, i2, i3> | i4

Returns: Tuple of self.uid_list, self.item_list_index, self.target_index, self.item_list_length. See SequentialDataset’s attributes for details.

Note

Actually, we do not realy generate these new item sequences. One user’s item sequence is stored only once in memory. We store the index (slice) of each item sequence after augmentation, which saves memory and accelerates a lot.