recbole.data.sequential_dataset

class recbole.data.dataset.sequential_dataset.SequentialDataset(config, saved_dataset=None)[source]

Bases: recbole.data.dataset.dataset.Dataset

SequentialDataset is based on Dataset, and provides augmentation interface to adapt to Sequential Recommendation, which can accelerate the data loader.

uid_list

List of user id after augmentation.

Type

numpy.ndarray

item_list_index

List of indexes of item sequence after augmentation.

Type

numpy.ndarray

target_index

List of indexes of target item id after augmentation.

Type

numpy.ndarray

item_list_length

List of item sequences’ length after augmentation.

Type

numpy.ndarray

build(eval_setting)[source]

Processing dataset according to evaluation setting, including Group, Order and Split. See EvalSetting for details.

Parameters

eval_setting (EvalSetting) – Object contains evaluation settings, which guide the data processing procedure.

Returns

List of built Dataset.

Return type

list

inter_matrix(form='coo', value_field=None)[source]

Get sparse matrix that describe interactions between user_id and item_id. Sparse matrix has shape (user_num, item_num). For a row of <src, tgt>, matrix[src, tgt] = 1 if value_field is None, else matrix[src, tgt] = self.inter_feat[src, tgt]. :param form: Sparse matrix format. Defaults to coo. :type form: str, optional :param value_field: Data of sparse matrix, which should exist in df_feat.

Defaults to None.

Returns

Sparse matrix in form coo or csr.

Return type

scipy.sparse

leave_one_out(group_by, leave_one_num=1)[source]

Split interaction records by leave one out strategy.

Parameters
  • group_by (str) – Field name that interaction records should grouped by before splitting.

  • leave_one_num (int, optional) – Number of parts whose length is expected to be 1. Defaults to 1.

Returns

List of Dataset, whose interaction features has been split.

Return type

list

prepare_data_augmentation()[source]

Augmentation processing for sequential dataset.

E.g., u1 has purchase sequence <i1, i2, i3, i4>, then after augmentation, we will generate three cases.

u1, <i1> | i2

(Which means given user_id u1 and item_seq <i1>, we need to predict the next item i2.)

The other cases are below:

u1, <i1, i2> | i3

u1, <i1, i2, i3> | i4

Note

Actually, we do not really generate these new item sequences. One user’s item sequence is stored only once in memory. We store the index (slice) of each item sequence after augmentation, which saves memory and accelerates a lot.