Atomic Files

Atomic files are introduced to format the input of mainstream recommendation tasks in a flexible way.

So far, our library introduces six atomic file types, we identify different files by their suffixes.

Suffix Content Example Format
.inter User-item interaction user_id, item_id, rating, timestamp, review
.user User feature user_id, age, gender
.item Item feature item_id, category
.kg Triplets in a knowledge graph head_entity, tail_entity, relation
.link Item-entity linkage data entity, item_id
.net Social graph data source, target

Atomic files are combined to support the input of different recommendation tasks. One can write the suffixes into the config arg load_col to load the corresponding atomic files.

For each recommendation task, we have to provide several mandatory files:

Tasks Mandatory atomic files
General .inter
Context-aware .inter, .user, .item
Knowledge-aware .inter, .kg, .link
Sequential .inter
Social .inter, .net

Each atomic file can be viewed as a `m \times n` table (except header), where `n` is the number of features and `m` is the number of data records. The first row corresponds to feature names, in which each entry has the form of feat_name:feat_type´╝îindicating the feature name and feature type. We support four feature types, which can be processed by tensors in batch.

feat_type Explanations Examples
token single discrete feature user_id, age
token_seq discrete features sequence review
float single continuous feature rating, timestamp
float_seq continuous feature sequence vector


As an example, we present the formatted ML-1M dataset below.


user_id:token item_id:token rating:float timestamp:float
1 1193 5 978300760
1 661 3 978302109


user_id:token age:token gender:token occupation:token zip_code:token
1 1 F 10 48067
2 56 M 16 70072


item_id:token movie_title:token_seq release_year:token genre:token_seq
1 Toy Story 1995 Animation Children's Comedy
2 Jumanji 1995 Adventure Children's Fantasy

head_id:token relation_id:token tail_id:token
m.0gs6m film.film_genre.films_in_this_genre m.01b195
m.052_dz m.02nrdp

item_id:token entity_id:token
2694 m.02hxhz
2079 m.0kvcr9