For extensibility and reusability, our data module designs an elegant data flow that transforms raw data into the model input.
The overall data flow can be described as follows:
The details are as follows:
Dataset: Mainly based on the primary data structure of
pandas.DataFramein the library of
Pandas. During the transformation step from atomic files to class
Dataset, we provide many useful functions that support a series of preprocessing functions in recommender systems, such as k-core data filtering and missing value imputation. Detailed in [ API ].
DataLoader: Mainly based on a general internal data structure implemented by our library, called
Interactionis the internal data structural that is fed into the recommendation algorithms. It is implemented as a new abstract data type based on
python.dict, which is a key-value indexed data structure. The keys correspond to features from input, which can be conveniently referenced with feature names when writing the recommendation algorithms; and the values correspond to tensors (implemented by
torch.Tensor), which will be used for the update and computation in learning algorithms. Specially, the value entry for a specific key stores all the corresponding tensor data in a batch or mini-batch. Detailed in [ API ].