Atomic files are introduced to format the input of mainstream recommendation tasks in a flexible way.
So far, our library introduces six atomic file types, we identify different files by their suffixes.
Suffix | Content | Example Format |
---|---|---|
.inter | User-item interaction | user_id, item_id, rating, timestamp, review |
.user | User feature | user_id, age, gender |
.item | Item feature | item_id, category |
.kg | Triplets in a knowledge graph | head_entity, tail_entity, relation |
.link | Item-entity linkage data | entity, item_id |
.net | Social graph data | source, target |
Atomic files are combined to support the input of different recommendation tasks. One can write the
suffixes into the config arg load_col
to load the corresponding atomic files.
For each recommendation task, we have to provide several mandatory files:
Tasks | Mandatory atomic files |
---|---|
General | .inter |
Context-aware | .inter, .user, .item |
Knowledge-aware | .inter, .kg, .link |
Sequential | .inter |
Social | .inter, .net |
Each atomic file can be viewed as a `m \times n` table (except header), where `n` is the number of
features and `m` is the number of data records.
The first row corresponds to feature names, in which each entry has the form of
feat_name:feat_type
,indicating the feature name and feature type.
We support four feature types, which can be processed by tensors in batch.
feat_type | Explanations | Examples |
---|---|---|
token | single discrete feature | user_id, age |
token_seq | discrete features sequence | review |
float | single continuous feature | rating, timestamp |
float_seq | continuous feature sequence | vector |
As an example, we present the formatted ML-1M dataset below.
ml-1m.inter
user_id:token | item_id:token | rating:float | timestamp:float |
---|---|---|---|
1 | 1193 | 5 | 978300760 |
1 | 661 | 3 | 978302109 |
ml-1m.user
user_id:token | age:token | gender:token | occupation:token | zip_code:token |
---|---|---|---|---|
1 | 1 | F | 10 | 48067 |
2 | 56 | M | 16 | 70072 |
ml-1m.item
item_id:token | movie_title:token_seq | release_year:token | genre:token_seq |
---|---|---|---|
1 | Toy Story | 1995 | Animation Children's Comedy |
2 | Jumanji | 1995 | Adventure Children's Fantasy |
ml-1m.kg
head_id:token | relation_id:token | tail_id:token |
---|---|---|
m.0gs6m | film.film_genre.films_in_this_genre | m.01b195 |
m.052_dz | film.film.actor | m.02nrdp |
ml-1m.link
item_id:token | entity_id:token |
---|---|
2694 | m.02hxhz |
2079 | m.0kvcr9 |