Atomic Files =================== Atomic files are introduced to format the input of mainstream recommendation tasks in a flexible way. So far, our library introduces six atomic file types, and we identify different files by their suffixes. ========= ============================== ======================================================== Suffix Content Example Format ========= ============================== ======================================================== `.inter` User-item interaction `user_id`, `item_id`, `rating`, `timestamp`, `review` `.user` User feature `user_id`, `age`, `gender` `.item` Item feature `item_id`, `category` `.kg` Triplets in a knowledge graph `head_entity`, `tail_entity`, `relation` `.link` Item-entity linkage data `entity`, `item_id` `.net` Social graph data `source`, `target` ========= ============================== ======================================================== Atomic files are combined to support the input of different recommendation tasks. One can write the suffixes into the config arg ``load_col`` to load the corresponding atomic files. For each recommendation task, we have to provide several mandatory files: ================ ================================ Tasks Mandatory atomic files ================ ================================ General `.inter` Context-aware `.inter`, `.user`, `.item` Knowledge-aware `.inter`, `.kg`, `.link` Sequential `.inter` Social `.inter`, `.net` ================ ================================ Format -------- Each atomic file can be viewed as a m x n table, where n is the number of features and m-1 is the number of data records(one line for header). The first row corresponds to feature names, in which each entry has the form of ``feat_name:feat_type``,indicating the feature name and feature type. We support four feature types, which can be processed by tensors in batch. ============ =========================== ===================== feat_type Explanations Examples ============ =========================== ===================== `token` single discrete feature `user_id`, `age` `token_seq` discrete features sequence `review` `float` single continuous feature `rating`, `timestamp` `float_seq` continuous feature sequence `vector` ============ =========================== ===================== Examples ---------- We present three example data rows in the formatted ML-1M dataset. **ml-1m.inter** ============= ============= ============ =============== user_id:token item_id:token rating:float timestamp:float ============= ============= ============ =============== 1 1193 5 978300760 1 661 3 978302109 ============= ============= ============ =============== **ml-1m.user** ============= ========= ============ ================ ============== user_id:token age:token gender:token occupation:token zip_code:token ============= ========= ============ ================ ============== 1 1 F 10 48067 2 56 M 16 70072 ============= ========= ============ ================ ============== **ml-1m.item** ============= ===================== ================== ============================ item_id:token movie_title:token_seq release_year:token genre:token_seq ============= ===================== ================== ============================ 1 Toy Story 1995 Animation Children's Comedy 2 Jumanji 1995 Adventure Children's Fantasy ============= ===================== ================== ============================ **ml-1m.kg** ============= =================================== ============= head_id:token relation_id:token tail_id:token ============= =================================== ============= m.0gs6m film.film_genre.films_in_this_genre m.01b195 m.052_dz film.film.actor m.02nrdp ============= =================================== ============= **ml-1m.link** ============= =============== item_id:token entity_id:token ============= =============== 2694 m.02hxhz 2079 m.0kvcr9 ============= =============== Additional Atomic Files ---------------------------- For users who want to load features from additional atomic files (e.g. pretrained entity embeddings), we provide a simple way as following. Firstly, prepare your additional atomic file (e.g. ``ml-1m.ent``). ============= =============================== ent_id:token ent_emb:float_seq ============= =============================== m.0gs6m -115.08 13.60 113.69 m.01b195 -130.97 263.05 -129.88 ============= =============================== Secondly, update the args as: .. code:: yaml additional_feat_suffix: [ent] load_col: # inter/user/item/...: As usual ent: [ent_id, ent_emb] Then, this additional atomic file will be loaded into the :class:`Dataset` object. These new features can be used as following. .. code:: python dataset = create_dataset(config) print(dataset.ent_feat) Note that these features can be preprocessed by the same way as the other features. For example, if you want to map the tokens of ``ent_id`` into the same space of ``entity_id``, then update the args as: .. code:: yaml additional_feat_suffix: [ent] load_col: # inter/user/item/...: As usual ent: [ent_id, ent_emb] alias_of_entity_id: [ent_id]