recbole.sampler

class recbole.sampler.sampler.AbstractSampler(distribution)[source]

Bases: object

AbstractSampler is a abstract class, all sampler should inherit from it. This sampler supports returning a certain number of random value_ids according to the input key_id, and it also supports to prohibit certain key-value pairs by setting used_ids.

Parameters

distribution (str) – The string of distribution, which is used for subclass.

used_ids

The result of get_used_ids().

Type

numpy.ndarray

get_used_ids()[source]
Returns

Used ids. Index is key_id, and element is a set of value_ids.

Return type

numpy.ndarray

sample_by_key_ids(key_ids, num)[source]

Sampling by key_ids.

Parameters
  • key_ids (numpy.ndarray or list) – Input key_ids.

  • num (int) – Number of sampled value_ids for each key_id.

Returns

Sampled value_ids. value_ids[0], value_ids[len(key_ids)], value_ids[len(key_ids) * 2], …, value_id[len(key_ids) * (num - 1)] is sampled for key_ids[0]; value_ids[1], value_ids[len(key_ids) + 1], value_ids[len(key_ids) * 2 + 1], …, value_id[len(key_ids) * (num - 1) + 1] is sampled for key_ids[1]; …; and so on.

Return type

torch.tensor

sampling(sample_num)[source]

Sampling [sample_num] item_ids.

Parameters

sample_num (int) – the number of samples.

Returns

a list of samples and the len is [sample_num].

Return type

sample_list (np.array)

set_distribution(distribution)[source]

Set the distribution of sampler.

Parameters

distribution (str) – Distribution of the negative items.

class recbole.sampler.sampler.KGSampler(dataset, distribution='uniform')[source]

Bases: recbole.sampler.sampler.AbstractSampler

KGSampler is used to sample negative entities in a knowledge graph.

Parameters
  • dataset (Dataset) – The knowledge graph dataset, which contains triplets in a knowledge graph.

  • distribution (str, optional) – Distribution of the negative entities. Defaults to ‘uniform’.

get_used_ids()[source]
Returns

Used entity_ids is the same as tail_entity_ids in knowledge graph. Index is head_entity_id, and element is a set of tail_entity_ids.

Return type

numpy.ndarray

sample_by_entity_ids(head_entity_ids, num=1)[source]

Sampling by head_entity_ids.

Parameters
  • head_entity_ids (numpy.ndarray or list) – Input head_entity_ids.

  • num (int, optional) – Number of sampled entity_ids for each head_entity_id. Defaults to 1.

Returns

Sampled entity_ids. entity_ids[0], entity_ids[len(head_entity_ids)], entity_ids[len(head_entity_ids) * 2], …, entity_id[len(head_entity_ids) * (num - 1)] is sampled for head_entity_ids[0]; entity_ids[1], entity_ids[len(head_entity_ids) + 1], entity_ids[len(head_entity_ids) * 2 + 1], …, entity_id[len(head_entity_ids) * (num - 1) + 1] is sampled for head_entity_ids[1]; …; and so on.

Return type

torch.tensor

class recbole.sampler.sampler.RepeatableSampler(phases, dataset, distribution='uniform')[source]

Bases: recbole.sampler.sampler.AbstractSampler

RepeatableSampler is used to sample negative items for each input user. The difference from Sampler is it can only sampling the items that have not appeared at all phases.

Parameters
  • phases (str or list of str) – All the phases of input.

  • dataset (Dataset) – The union of all datasets for each phase.

  • distribution (str, optional) – Distribution of the negative items. Defaults to ‘uniform’.

phase

the phase of sampler. It will not be set until set_phase() is called.

Type

str

get_used_ids()[source]
Returns

Used item_ids is the same as positive item_ids. Index is user_id, and element is a set of item_ids.

Return type

numpy.ndarray

sample_by_user_ids(user_ids, item_ids, num)[source]

Sampling by user_ids.

Parameters
  • user_ids (numpy.ndarray or list) – Input user_ids.

  • item_ids (numpy.ndarray or list) – Input item_ids.

  • num (int) – Number of sampled item_ids for each user_id.

Returns

Sampled item_ids. item_ids[0], item_ids[len(user_ids)], item_ids[len(user_ids) * 2], …, item_id[len(user_ids) * (num - 1)] is sampled for user_ids[0]; item_ids[1], item_ids[len(user_ids) + 1], item_ids[len(user_ids) * 2 + 1], …, item_id[len(user_ids) * (num - 1) + 1] is sampled for user_ids[1]; …; and so on.

Return type

torch.tensor

set_phase(phase)[source]

Get the sampler of corresponding phase.

Parameters

phase (str) – The phase of new sampler.

Returns

the copy of this sampler, and phase is set the same as input phase.

Return type

Sampler

class recbole.sampler.sampler.Sampler(phases, datasets, distribution='uniform')[source]

Bases: recbole.sampler.sampler.AbstractSampler

Sampler is used to sample negative items for each input user. In order to avoid positive items in train-phase to be sampled in valid-phase, and positive items in train-phase or valid-phase to be sampled in test-phase, we need to input the datasets of all phases for pre-processing. And, before using this sampler, it is needed to call set_phase() to get the sampler of corresponding phase.

Parameters
  • phases (str or list of str) – All the phases of input.

  • datasets (Dataset or list of Dataset) – All the dataset for each phase.

  • distribution (str, optional) – Distribution of the negative items. Defaults to ‘uniform’.

phase

the phase of sampler. It will not be set until set_phase() is called.

Type

str

get_used_ids()[source]
Returns

Used item_ids is the same as positive item_ids. Key is phase, and value is a numpy.ndarray which index is user_id, and element is a set of item_ids.

Return type

dict

sample_by_user_ids(user_ids, item_ids, num)[source]

Sampling by user_ids.

Parameters
  • user_ids (numpy.ndarray or list) – Input user_ids.

  • item_ids (numpy.ndarray or list) – Input item_ids.

  • num (int) – Number of sampled item_ids for each user_id.

Returns

Sampled item_ids. item_ids[0], item_ids[len(user_ids)], item_ids[len(user_ids) * 2], …, item_id[len(user_ids) * (num - 1)] is sampled for user_ids[0]; item_ids[1], item_ids[len(user_ids) + 1], item_ids[len(user_ids) * 2 + 1], …, item_id[len(user_ids) * (num - 1) + 1] is sampled for user_ids[1]; …; and so on.

Return type

torch.tensor

set_phase(phase)[source]

Get the sampler of corresponding phase.

Parameters

phase (str) – The phase of new sampler.

Returns

the copy of this sampler, phase is set the same as input phase, and used_ids is set to the value of corresponding phase.

Return type

Sampler

class recbole.sampler.sampler.SeqSampler(dataset, distribution='uniform')[source]

Bases: recbole.sampler.sampler.AbstractSampler

SeqSampler is used to sample negative item sequence.

Parameters
  • datasets (Dataset or list of Dataset) – All the dataset for each phase.

  • distribution (str, optional) – Distribution of the negative items. Defaults to ‘uniform’.

get_used_ids()[source]
Returns

Used ids. Index is key_id, and element is a set of value_ids.

Return type

numpy.ndarray

sample_neg_sequence(pos_sequence)[source]

For each moment, sampling one item from all the items except the one the user clicked on at that moment.

Parameters

pos_sequence (torch.Tensor) – all users’ item history sequence, with the shape of (N, ).

Returns

all users’ negative item history sequence.

Return type

torch.tensor