recbole.data.sequential_dataset¶
-
class
recbole.data.dataset.sequential_dataset.
SequentialDataset
(config)[source]¶ Bases:
recbole.data.dataset.dataset.Dataset
SequentialDataset
is based onDataset
, and provides augmentation interface to adapt to Sequential Recommendation, which can accelerate the data loader.-
uid_list
¶ List of user id after augmentation.
- Type
numpy.ndarray
-
item_list_index
¶ List of indexes of item sequence after augmentation.
- Type
numpy.ndarray
-
target_index
¶ List of indexes of target item id after augmentation.
- Type
numpy.ndarray
-
item_list_length
¶ List of item sequences’ length after augmentation.
- Type
numpy.ndarray
-
build
(eval_setting)[source]¶ Processing dataset according to evaluation setting, including Group, Order and Split. See
EvalSetting
for details.- Parameters
eval_setting (
EvalSetting
) – Object contains evaluation settings, which guide the data processing procedure.- Returns
List of built
Dataset
.- Return type
list
-
inter_matrix
(form='coo', value_field=None)[source]¶ Get sparse matrix that describe interactions between user_id and item_id. Sparse matrix has shape (user_num, item_num). For a row of <src, tgt>,
matrix[src, tgt] = 1
ifvalue_field
isNone
, elsematrix[src, tgt] = self.inter_feat[src, tgt]
.- Parameters
form (str, optional) – Sparse matrix format. Defaults to
coo
.value_field (str, optional) – Data of sparse matrix, which should exist in
df_feat
. Defaults toNone
.
- Returns
Sparse matrix in form
coo
orcsr
.- Return type
scipy.sparse
-
leave_one_out
(group_by, leave_one_num=1)[source]¶ Split interaction records by leave one out strategy.
- Parameters
group_by (str) – Field name that interaction records should grouped by before splitting.
leave_one_num (int, optional) – Number of parts whose length is expected to be
1
. Defaults to1
.
- Returns
List of
Dataset
, whose interaction features has been split.- Return type
list
-
prepare_data_augmentation
()[source]¶ Augmentation processing for sequential dataset.
E.g.,
u1
has purchase sequence<i1, i2, i3, i4>
, then after augmentation, we will generate three cases.u1, <i1> | i2
(Which means given user_id
u1
and item_seq<i1>
, we need to predict the next itemi2
.)The other cases are below:
u1, <i1, i2> | i3
u1, <i1, i2, i3> | i4
Note
Actually, we do not really generate these new item sequences. One user’s item sequence is stored only once in memory. We store the index (slice) of each item sequence after augmentation, which saves memory and accelerates a lot.
-