Clarifications on some practical issues

Q1:

Why the result of Dataset.item_num always one plus of the actual number of items in the dataset?

A1:

We add [PAD] for all the token like fields. Thus after remapping ID, 0 will be reserved for [PAD], which makes the result of Dataset.item_num more than the actual number.

Note that for Knowledge-based models, we add one more relation called U-I Relation. It describes the history interactions which will be used in recbole.data.dataset.kg_dataset.KnowledgeBasedDataset.ckg_graph(). Thus the result of KGDataset.relation_num is two more than the actual number of relations.

Q2:

Why are the test results usually better than the best valid results?

A2:

For more rigorous evaluation, those user-item interaction records in validation sets will not be ranked while testing. Thus the distribution of validation & test sets may be inconsistent.

However, this doesn’t affect the comparison between models.