recbole.sampler

class recbole.sampler.sampler.AbstractSampler(distribution)[source]

Bases: object

AbstractSampler is a abstract class, all sampler should inherit from it. This sampler supports returning a certain number of random value_ids according to the input key_id, and it also supports to prohibit certain key-value pairs by setting used_ids. Besides, in order to improve efficiency, we use random_pr to move around the random_list to generate random numbers, so we need to implement the get_random_list() method in the subclass.

Parameters

distribution (str) – The string of distribution, which is used for subclass.

random_list

The shuffled result of get_random_list().

Type

list or numpy.ndarray

used_ids

The result of get_used_ids().

Type

numpy.ndarray

get_random_list()[source]
Returns

Random list of value_id.

Return type

numpy.ndarray or list

get_used_ids()[source]
Returns

Used ids. Index is key_id, and element is a set of value_ids.

Return type

numpy.ndarray

random()[source]
Returns

Random value_id. Generated by random_list.

Return type

value_id (int)

random_num(num)[source]
Parameters

num (int) – Number of random value_ids.

Returns

Random value_ids. Generated by random_list.

Return type

value_ids (numpy.ndarray)

sample_by_key_ids(key_ids, num)[source]

Sampling by key_ids.

Parameters
  • key_ids (numpy.ndarray or list) – Input key_ids.

  • num (int) – Number of sampled value_ids for each key_id.

Returns

Sampled value_ids. value_ids[0], value_ids[len(key_ids)], value_ids[len(key_ids) * 2], …, value_id[len(key_ids) * (num - 1)] is sampled for key_ids[0]; value_ids[1], value_ids[len(key_ids) + 1], value_ids[len(key_ids) * 2 + 1], …, value_id[len(key_ids) * (num - 1) + 1] is sampled for key_ids[1]; …; and so on.

Return type

torch.tensor

set_distribution(distribution)[source]

Set the distribution of sampler.

Parameters

distribution (str) – Distribution of the negative items.

class recbole.sampler.sampler.KGSampler(dataset, distribution='uniform')[source]

Bases: recbole.sampler.sampler.AbstractSampler

KGSampler is used to sample negative entities in a knowledge graph.

Parameters
  • dataset (Dataset) – The knowledge graph dataset, which contains triplets in a knowledge graph.

  • distribution (str, optional) – Distribution of the negative entities. Defaults to ‘uniform’.

get_random_list()[source]
Returns

Random list of entity_id.

Return type

numpy.ndarray or list

get_used_ids()[source]
Returns

Used entity_ids is the same as tail_entity_ids in knowledge graph. Index is head_entity_id, and element is a set of tail_entity_ids.

Return type

numpy.ndarray

sample_by_entity_ids(head_entity_ids, num=1)[source]

Sampling by head_entity_ids.

Parameters
  • head_entity_ids (numpy.ndarray or list) – Input head_entity_ids.

  • num (int, optional) – Number of sampled entity_ids for each head_entity_id. Defaults to 1.

Returns

Sampled entity_ids. entity_ids[0], entity_ids[len(head_entity_ids)], entity_ids[len(head_entity_ids) * 2], …, entity_id[len(head_entity_ids) * (num - 1)] is sampled for head_entity_ids[0]; entity_ids[1], entity_ids[len(head_entity_ids) + 1], entity_ids[len(head_entity_ids) * 2 + 1], …, entity_id[len(head_entity_ids) * (num - 1) + 1] is sampled for head_entity_ids[1]; …; and so on.

Return type

torch.tensor

class recbole.sampler.sampler.RepeatableSampler(phases, dataset, distribution='uniform')[source]

Bases: recbole.sampler.sampler.AbstractSampler

RepeatableSampler is used to sample negative items for each input user. The difference from Sampler is it can only sampling the items that have not appeared at all phases.

Parameters
  • phases (str or list of str) – All the phases of input.

  • dataset (Dataset) – The union of all datasets for each phase.

  • distribution (str, optional) – Distribution of the negative items. Defaults to ‘uniform’.

phase

the phase of sampler. It will not be set until set_phase() is called.

Type

str

get_random_list()[source]
Returns

Random list of item_id.

Return type

numpy.ndarray or list

get_used_ids()[source]
Returns

Used item_ids is the same as positive item_ids. Index is user_id, and element is a set of item_ids.

Return type

numpy.ndarray

sample_by_user_ids(user_ids, num)[source]

Sampling by user_ids.

Parameters
  • user_ids (numpy.ndarray or list) – Input user_ids.

  • num (int) – Number of sampled item_ids for each user_id.

Returns

Sampled item_ids. item_ids[0], item_ids[len(user_ids)], item_ids[len(user_ids) * 2], …, item_id[len(user_ids) * (num - 1)] is sampled for user_ids[0]; item_ids[1], item_ids[len(user_ids) + 1], item_ids[len(user_ids) * 2 + 1], …, item_id[len(user_ids) * (num - 1) + 1] is sampled for user_ids[1]; …; and so on.

Return type

torch.tensor

set_phase(phase)[source]

Get the sampler of corresponding phase.

Parameters

phase (str) – The phase of new sampler.

Returns

the copy of this sampler, and phase is set the same as input phase.

Return type

Sampler

class recbole.sampler.sampler.Sampler(phases, datasets, distribution='uniform')[source]

Bases: recbole.sampler.sampler.AbstractSampler

Sampler is used to sample negative items for each input user. In order to avoid positive items in train-phase to be sampled in valid-phase, and positive items in train-phase or valid-phase to be sampled in test-phase, we need to input the datasets of all phases for pre-processing. And, before using this sampler, it is needed to call set_phase() to get the sampler of corresponding phase.

Parameters
  • phases (str or list of str) – All the phases of input.

  • datasets (Dataset or list of Dataset) – All the dataset for each phase.

  • distribution (str, optional) – Distribution of the negative items. Defaults to ‘uniform’.

phase

the phase of sampler. It will not be set until set_phase() is called.

Type

str

get_random_list()[source]
Returns

Random list of item_id.

Return type

numpy.ndarray or list

get_used_ids()[source]
Returns

Used item_ids is the same as positive item_ids. Key is phase, and value is a numpy.ndarray which index is user_id, and element is a set of item_ids.

Return type

dict

sample_by_user_ids(user_ids, num)[source]

Sampling by user_ids.

Parameters
  • user_ids (numpy.ndarray or list) – Input user_ids.

  • num (int) – Number of sampled item_ids for each user_id.

Returns

Sampled item_ids. item_ids[0], item_ids[len(user_ids)], item_ids[len(user_ids) * 2], …, item_id[len(user_ids) * (num - 1)] is sampled for user_ids[0]; item_ids[1], item_ids[len(user_ids) + 1], item_ids[len(user_ids) * 2 + 1], …, item_id[len(user_ids) * (num - 1) + 1] is sampled for user_ids[1]; …; and so on.

Return type

torch.tensor

set_phase(phase)[source]

Get the sampler of corresponding phase.

Parameters

phase (str) – The phase of new sampler.

Returns

the copy of this sampler, phase is set the same as input phase, and used_ids is set to the value of corresponding phase.

Return type

Sampler