recbole.data.kg_dataset

class recbole.data.dataset.kg_dataset.KnowledgeBasedDataset(config, saved_dataset=None)[source]

Bases: recbole.data.dataset.dataset.Dataset

KnowledgeBasedDataset is based on Dataset, and load .kg and .link additionally.

Entities are remapped together with item_id specially. All entities are remapped into three consecutive ID sections.

  • virtual entities that only exist in interaction data.

  • entities that exist both in interaction data and kg triplets.

  • entities only exist in kg triplets.

It also provides several interfaces to transfer .kg features into coo sparse matrix, csr sparse matrix, DGL.Graph or PyG.Data.

head_entity_field

The same as config['HEAD_ENTITY_ID_FIELD'].

Type

str

tail_entity_field

The same as config['TAIL_ENTITY_ID_FIELD'].

Type

str

relation_field

The same as config['RELATION_ID_FIELD'].

Type

str

entity_field

The same as config['ENTITY_ID_FIELD'].

Type

str

kg_feat

Internal data structure stores the kg triplets. It’s loaded from file .kg.

Type

pandas.DataFrame

item2entity

Dict maps item_id to entity, which is loaded from file .link.

Type

dict

entity2item

Dict maps entity to item_id, which is loaded from file .link.

Type

dict

Note

entity_field doesn’t exist exactly. It’s only a symbol, representing entitiy features. E.g. it can be written into config['fields_in_same_space'].

[UI-Relation] is a special relation token.

ckg_graph(form='coo', value_field=None)[source]

Get graph or sparse matrix that describe relations of CKG, which combines interactions and kg triplets into the same graph.

Item ids and entity ids are added by user_num temporally.

For an edge of <src, tgt>, graph[src, tgt] = 1 if value_field is None, else graph[src, tgt] = self.kg_feat[self.relation_field][src, tgt] or graph[src, tgt] = [UI-Relation].

Currently, we support graph in DGL and PyG, and two type of sparse matrixes, coo and csr.

Parameters
  • form (str, optional) – Format of sparse matrix, or library of graph data structure. Defaults to coo.

  • value_field (str, optional) – self.relation_field or None, Defaults to None.

Returns

Graph / Sparse matrix of kg triplets.

property ent_level_ent_fields

Get entity fields remapped together with entity_id.

Returns

List of field names.

Return type

list

property entities

Returns: numpy.ndarray: List of entity id, including virtual entities.

property entity_num

Get the number of different tokens of entities, including virtual entities.

Returns

Number of different tokens of entities, including virtual entities.

Return type

int

property head_entities

Returns: numpy.ndarray: List of head entities of kg triplets.

kg_graph(form='coo', value_field=None)[source]

Get graph or sparse matrix that describe relations between entities.

For an edge of <src, tgt>, graph[src, tgt] = 1 if value_field is None, else graph[src, tgt] = self.kg_feat[value_field][src, tgt].

Currently, we support graph in DGL and PyG, and two type of sparse matrixes, coo and csr.

Parameters
  • form (str, optional) – Format of sparse matrix, or library of graph data structure. Defaults to coo.

  • value_field (str, optional) – edge attributes of graph, or data of sparse matrix, Defaults to None.

Returns

Graph / Sparse matrix of kg triplets.

property rec_level_ent_fields

Get entity fields remapped together with item_id.

Returns

List of field names.

Return type

list

property relation_num

Get the number of different tokens of self.relation_field.

Returns

Number of different tokens of self.relation_field.

Return type

int

property relations

Returns: numpy.ndarray: List of relations of kg triplets.

save(filepath)[source]

Saving this Dataset object to local path.

Parameters

filepath (str) – path of saved dir.

property tail_entities

Returns: numpy.ndarray: List of tail entities of kg triplets.