recbole.sampler¶
-
class
recbole.sampler.sampler.
AbstractSampler
(distribution)[source]¶ Bases:
object
AbstractSampler
is a abstract class, all sampler should inherit from it. This sampler supports returning a certain number of random value_ids according to the input key_id, and it also supports to prohibit certain key-value pairs by setting used_ids. Besides, in order to improve efficiency, we userandom_pr
to move around therandom_list
to generate random numbers, so we need to implement theget_random_list()
method in the subclass.- Parameters
distribution (str) – The string of distribution, which is used for subclass.
-
random_list
¶ The shuffled result of
get_random_list()
.- Type
list or numpy.ndarray
-
used_ids
¶ The result of
get_used_ids()
.- Type
numpy.ndarray
-
get_used_ids
()[source]¶ - Returns
Used ids. Index is key_id, and element is a set of value_ids.
- Return type
np.ndarray
-
random
()[source]¶ - Returns
Random value_id. Generated by
random_list
.- Return type
value_id (int)
-
sample_by_key_ids
(key_ids, num, used_ids)[source]¶ Sampling by key_ids.
- Parameters
key_ids (np.ndarray or list) – Input key_ids.
num (int) – Number of sampled value_ids for each key_id.
used_ids (np.ndarray) – Used ids. index is key_id, and element is a set of value_ids.
- Returns
Sampled value_ids. value_ids[0], value_ids[len(key_ids)], value_ids[len(key_ids) * 2], …, value_id[len(key_ids) * (num - 1)] is sampled for key_ids[0]; value_ids[1], value_ids[len(key_ids) + 1], value_ids[len(key_ids) * 2 + 1], …, value_id[len(key_ids) * (num - 1) + 1] is sampled for key_ids[1]; …; and so on.
- Return type
np.ndarray
-
class
recbole.sampler.sampler.
KGSampler
(dataset, distribution='uniform')[source]¶ Bases:
recbole.sampler.sampler.AbstractSampler
KGSampler
is used to sample negative entities in a knowledge graph.- Parameters
dataset (Dataset) – The knowledge graph dataset, which contains triplets in a knowledge graph.
distribution (str, optional) – Distribution of the negative entities. Defaults to ‘uniform’.
-
get_used_ids
()[source]¶ - Returns
Used entity_ids is the same as tail_entity_ids in knowledge graph. Index is head_entity_id, and element is a set of tail_entity_ids.
- Return type
np.ndarray
-
sample_by_entity_ids
(head_entity_ids, num=1)[source]¶ Sampling by head_entity_ids.
- Parameters
head_entity_ids (np.ndarray or list) – Input head_entity_ids.
num (int, optional) – Number of sampled entity_ids for each head_entity_id. Defaults to
1
.
- Returns
Sampled entity_ids. entity_ids[0], entity_ids[len(head_entity_ids)], entity_ids[len(head_entity_ids) * 2], …, entity_id[len(head_entity_ids) * (num - 1)] is sampled for head_entity_ids[0]; entity_ids[1], entity_ids[len(head_entity_ids) + 1], entity_ids[len(head_entity_ids) * 2 + 1], …, entity_id[len(head_entity_ids) * (num - 1) + 1] is sampled for head_entity_ids[1]; …; and so on.
- Return type
np.ndarray
-
class
recbole.sampler.sampler.
RepeatableSampler
(phases, dataset, distribution='uniform')[source]¶ Bases:
recbole.sampler.sampler.AbstractSampler
RepeatableSampler
is used to sample negative items for each input user. The difference fromSampler
is it can only sampling the items that have not appeared at all phases.- Parameters
phases (str or list of str) – All the phases of input.
dataset (Dataset) – The union of all datasets for each phase.
distribution (str, optional) – Distribution of the negative items. Defaults to ‘uniform’.
-
phase
¶ the phase of sampler. It will not be set until
set_phase()
is called.- Type
str
-
get_used_ids
()[source]¶ - Returns
Used item_ids is the same as positive item_ids. Index is user_id, and element is a set of item_ids.
- Return type
np.ndarray
-
sample_by_user_ids
(user_ids, num)[source]¶ Sampling by user_ids.
- Parameters
user_ids (np.ndarray or list) – Input user_ids.
num (int) – Number of sampled item_ids for each user_id.
- Returns
Sampled item_ids. item_ids[0], item_ids[len(user_ids)], item_ids[len(user_ids) * 2], …, item_id[len(user_ids) * (num - 1)] is sampled for user_ids[0]; item_ids[1], item_ids[len(user_ids) + 1], item_ids[len(user_ids) * 2 + 1], …, item_id[len(user_ids) * (num - 1) + 1] is sampled for user_ids[1]; …; and so on.
- Return type
np.ndarray
-
class
recbole.sampler.sampler.
Sampler
(phases, datasets, distribution='uniform')[source]¶ Bases:
recbole.sampler.sampler.AbstractSampler
Sampler
is used to sample negative items for each input user. In order to avoid positive items in train-phase to be sampled in vaild-phase, and positive items in train-phase or vaild-phase to be sampled in test-phase, we need to input the datasets of all phases for pre-processing. And, before using this sampler, it is needed to callset_phase()
to get the sampler of corresponding phase.- Parameters
phases (str or list of str) – All the phases of input.
datasets (Dataset or list of Dataset) – All the dataset for each phase.
distribution (str, optional) – Distribution of the negative items. Defaults to ‘uniform’.
-
phase
¶ the phase of sampler. It will not be set until
set_phase()
is called.- Type
str
-
get_used_ids
()[source]¶ - Returns
Used item_ids is the same as positive item_ids. Key is phase, and value is a np.ndarray which index is user_id, and element is a set of item_ids.
- Return type
dict
-
sample_by_user_ids
(user_ids, num)[source]¶ Sampling by user_ids.
- Parameters
user_ids (np.ndarray or list) – Input user_ids.
num (int) – Number of sampled item_ids for each user_id.
- Returns
Sampled item_ids. item_ids[0], item_ids[len(user_ids)], item_ids[len(user_ids) * 2], …, item_id[len(user_ids) * (num - 1)] is sampled for user_ids[0]; item_ids[1], item_ids[len(user_ids) + 1], item_ids[len(user_ids) * 2 + 1], …, item_id[len(user_ids) * (num - 1) + 1] is sampled for user_ids[1]; …; and so on.
- Return type
np.ndarray