recbole.sampler¶

class recbole.sampler.sampler.AbstractSampler(distribution)[source]¶

Bases: object

AbstractSampler is a abstract class, all sampler should inherit from it. This sampler supports returning a certain number of random value_ids according to the input key_id, and it also supports to prohibit certain key-value pairs by setting used_ids. Besides, in order to improve efficiency, we use random_pr to move around the random_list to generate random numbers, so we need to implement the get_random_list() method in the subclass.

Parameters: distribution (str) – The string of distribution, which is used for subclass.

random_list¶

The shuffled result of get_random_list().

Type: list or numpy.ndarray

used_ids¶

The result of get_used_ids().

Type: numpy.ndarray

get_random_list()[source]¶

Returns: Random list of value_id.
Return type: np.ndarray or list

get_used_ids()[source]¶

Returns: Used ids. Index is key_id, and element is a set of value_ids.
Return type: np.ndarray

random()[source]¶

Returns: Random value_id. Generated by random_list.
Return type: value_id (int)

sample_by_key_ids(key_ids, num, used_ids)[source]¶

Sampling by key_ids.

Parameters

key_ids (np.ndarray or list) – Input key_ids.
num (int) – Number of sampled value_ids for each key_id.
used_ids (np.ndarray) – Used ids. index is key_id, and element is a set of value_ids.

Returns

Sampled value_ids. value_ids[0], value_ids[len(key_ids)], value_ids[len(key_ids) * 2], …, value_id[len(key_ids) * (num - 1)] is sampled for key_ids[0]; value_ids[1], value_ids[len(key_ids) + 1], value_ids[len(key_ids) * 2 + 1], …, value_id[len(key_ids) * (num - 1) + 1] is sampled for key_ids[1]; …; and so on.

Return type

np.ndarray

class recbole.sampler.sampler.KGSampler(dataset, distribution='uniform')[source]¶

Bases: recbole.sampler.sampler.AbstractSampler

KGSampler is used to sample negative entities in a knowledge graph.

Parameters

dataset (Dataset) – The knowledge graph dataset, which contains triplets in a knowledge graph.
distribution (str, optional) – Distribution of the negative entities. Defaults to ‘uniform’.

get_random_list()[source]¶

Returns: Random list of entity_id.
Return type: np.ndarray or list

get_used_ids()[source]¶

Returns: Used entity_ids is the same as tail_entity_ids in knowledge graph. Index is head_entity_id, and element is a set of tail_entity_ids.
Return type: np.ndarray

sample_by_entity_ids(head_entity_ids, num=1)[source]¶

Sampling by head_entity_ids.

Parameters

head_entity_ids (np.ndarray or list) – Input head_entity_ids.
num (int, optional) – Number of sampled entity_ids for each head_entity_id. Defaults to 1.

Returns

Sampled entity_ids. entity_ids[0], entity_ids[len(head_entity_ids)], entity_ids[len(head_entity_ids) * 2], …, entity_id[len(head_entity_ids) * (num - 1)] is sampled for head_entity_ids[0]; entity_ids[1], entity_ids[len(head_entity_ids) + 1], entity_ids[len(head_entity_ids) * 2 + 1], …, entity_id[len(head_entity_ids) * (num - 1) + 1] is sampled for head_entity_ids[1]; …; and so on.

Return type

np.ndarray

class recbole.sampler.sampler.RepeatableSampler(phases, dataset, distribution='uniform')[source]¶

Bases: recbole.sampler.sampler.AbstractSampler

RepeatableSampler is used to sample negative items for each input user. The difference from Sampler is it can only sampling the items that have not appeared at all phases.

Parameters

phases (str or list of str) – All the phases of input.
dataset (Dataset) – The union of all datasets for each phase.
distribution (str, optional) – Distribution of the negative items. Defaults to ‘uniform’.

phase¶

the phase of sampler. It will not be set until set_phase() is called.

Type: str

get_random_list()[source]¶

Returns: Random list of item_id.
Return type: np.ndarray or list

get_used_ids()[source]¶

Returns: Used item_ids is the same as positive item_ids. Index is user_id, and element is a set of item_ids.
Return type: np.ndarray

sample_by_user_ids(user_ids, num)[source]¶

Sampling by user_ids.

Parameters

user_ids (np.ndarray or list) – Input user_ids.
num (int) – Number of sampled item_ids for each user_id.

Returns

Sampled item_ids. item_ids[0], item_ids[len(user_ids)], item_ids[len(user_ids) * 2], …, item_id[len(user_ids) * (num - 1)] is sampled for user_ids[0]; item_ids[1], item_ids[len(user_ids) + 1], item_ids[len(user_ids) * 2 + 1], …, item_id[len(user_ids) * (num - 1) + 1] is sampled for user_ids[1]; …; and so on.

Return type

np.ndarray

set_phase(phase)[source]¶

Get the sampler of corresponding phase.

Parameters: phase (str) – The phase of new sampler.
Returns: the copy of this sampler, and phase is set the same as input phase.
Return type: Sampler

class recbole.sampler.sampler.Sampler(phases, datasets, distribution='uniform')[source]¶

Bases: recbole.sampler.sampler.AbstractSampler

Sampler is used to sample negative items for each input user. In order to avoid positive items in train-phase to be sampled in vaild-phase, and positive items in train-phase or vaild-phase to be sampled in test-phase, we need to input the datasets of all phases for pre-processing. And, before using this sampler, it is needed to call set_phase() to get the sampler of corresponding phase.

Parameters

phases (str or list of str) – All the phases of input.
datasets (Dataset or list of Dataset) – All the dataset for each phase.
distribution (str, optional) – Distribution of the negative items. Defaults to ‘uniform’.

phase¶

the phase of sampler. It will not be set until set_phase() is called.

Type: str

get_random_list()[source]¶

Returns: Random list of item_id.
Return type: np.ndarray or list

get_used_ids()[source]¶

Returns: Used item_ids is the same as positive item_ids. Key is phase, and value is a np.ndarray which index is user_id, and element is a set of item_ids.
Return type: dict

sample_by_user_ids(user_ids, num)[source]¶

Sampling by user_ids.

Parameters

user_ids (np.ndarray or list) – Input user_ids.
num (int) – Number of sampled item_ids for each user_id.

Returns

Sampled item_ids. item_ids[0], item_ids[len(user_ids)], item_ids[len(user_ids) * 2], …, item_id[len(user_ids) * (num - 1)] is sampled for user_ids[0]; item_ids[1], item_ids[len(user_ids) + 1], item_ids[len(user_ids) * 2 + 1], …, item_id[len(user_ids) * (num - 1) + 1] is sampled for user_ids[1]; …; and so on.

Return type

np.ndarray

set_phase(phase)[source]¶

Get the sampler of corresponding phase.

Parameters: phase (str) – The phase of new sampler.
Returns: the copy of this sampler, phase is set the same as input phase, and used_ids is set to the value of corresponding phase.
Return type: Sampler