FEARec¶
- Reference:
Xinyu Du et al. “Frequency Enhanced Hybrid Attention Network for Sequential Recommendation.” In SIGIR 2023.
- Reference code:
- class recbole.model.sequential_recommender.fearec.FEABlock(n_heads, hidden_size, intermediate_size, hidden_dropout_prob, attn_dropout_prob, hidden_act, layer_norm_eps, n, config)[source]¶
Bases:
Module
One transformer layer consists of a multi-head self-attention layer and a point-wise feed-forward layer.
- Parameters:
hidden_states (torch.Tensor) – the input of the multi-head self-attention sublayer
attention_mask (torch.Tensor) – the attention mask for the multi-head self-attention sublayer
- Returns:
- The output of the point-wise feed-forward sublayer,
is the output of the transformer layer.
- Return type:
feedforward_output (torch.Tensor)
- forward(hidden_states, attention_mask)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class recbole.model.sequential_recommender.fearec.FEAEncoder(n_layers=2, n_heads=2, hidden_size=64, inner_size=256, hidden_dropout_prob=0.5, attn_dropout_prob=0.5, hidden_act='gelu', layer_norm_eps=1e-12, config=None)[source]¶
Bases:
Module
One TransformerEncoder consists of several TransformerLayers.
n_layers(num): num of transformer layers in transformer encoder. Default: 2
n_heads(num): num of attention heads for multi-head attention layer. Default: 2
hidden_size(num): the input and output hidden size. Default: 64
inner_size(num): the dimensionality in feed-forward layer. Default: 256
hidden_dropout_prob(float): probability of an element to be zeroed. Default: 0.5
attn_dropout_prob(float): probability of an attention score to be zeroed. Default: 0.5
- hidden_act(str): activation function in feed-forward layer. Default: ‘gelu’
candidates: ‘gelu’, ‘relu’, ‘swish’, ‘tanh’, ‘sigmoid’
layer_norm_eps(float): a value added to the denominator for numerical stability. Default: 1e-12
- forward(hidden_states, attention_mask, output_all_encoded_layers=True)[source]¶
- Parameters:
hidden_states (torch.Tensor) – the input of the TransformerEncoder
attention_mask (torch.Tensor) – the attention mask for the input hidden_states
output_all_encoded_layers (Bool) – whether output all transformer layers’ output
- Returns:
if output_all_encoded_layers is True, return a list consists of all transformer layers’ output, otherwise return a list only consists of the output of last transformer layer.
- Return type:
all_encoder_layers (list)
- training: bool¶
- class recbole.model.sequential_recommender.fearec.FEARec(config, dataset)[source]¶
Bases:
SequentialRecommender
- calculate_loss(interaction)[source]¶
Calculate the training loss for a batch data.
- Parameters:
interaction (Interaction) – Interaction class of the batch.
- Returns:
Training loss, shape: []
- Return type:
torch.Tensor
- decompose(z_i, z_j, origin_z, batch_size)[source]¶
We do not sample negative examples explicitly. Instead, given a positive pair, similar to (Chen et al., 2017), we treat the other 2(N − 1) augmented examples within a minibatch as negative examples.
- forward(item_seq, item_seq_len)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- full_sort_predict(interaction)[source]¶
full sort prediction function. Given users, calculate the scores between users and all candidate items.
- Parameters:
interaction (Interaction) – Interaction class of the batch.
- Returns:
Predicted scores for given users and all candidate items, shape: [n_batch_users * n_candidate_items]
- Return type:
torch.Tensor
- get_attention_mask(item_seq)[source]¶
Generate left-to-right uni-directional attention mask for multi-head attention.
- get_bi_attention_mask(item_seq)[source]¶
Generate bidirectional attention mask for multi-head attention.
- info_nce(z_i, z_j, temp, batch_size, sim='dot')[source]¶
We do not sample negative examples explicitly. Instead, given a positive pair, similar to (Chen et al., 2017), we treat the other 2(N − 1) augmented examples within a minibatch as negative examples.
- predict(interaction)[source]¶
Predict the scores between users and items.
- Parameters:
interaction (Interaction) – Interaction class of the batch.
- Returns:
Predicted scores for given users and items, shape: [batch_size]
- Return type:
torch.Tensor
- training: bool¶
- class recbole.model.sequential_recommender.fearec.FeedForward(hidden_size, inner_size, hidden_dropout_prob, hidden_act, layer_norm_eps)[source]¶
Bases:
Module
Point-wise feed-forward layer is implemented by two dense layers.
- Parameters:
input_tensor (torch.Tensor) – the input of the point-wise feed-forward layer
- Returns:
the output of the point-wise feed-forward layer
- Return type:
hidden_states (torch.Tensor)
- forward(input_tensor)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- gelu(x)[source]¶
Implementation of the gelu activation function.
For information: OpenAI GPT’s gelu is slightly different (and gives slightly different results):
0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3))))
Also see https://arxiv.org/abs/1606.08415
- training: bool¶
- class recbole.model.sequential_recommender.fearec.HybridAttention(n_heads, hidden_size, hidden_dropout_prob, attn_dropout_prob, layer_norm_eps, i, config)[source]¶
Bases:
Module
Hybrid Attention layer: combine time domain self-attention layer and frequency domain attention layer.
- Parameters:
input_tensor (torch.Tensor) – the input of the multi-head Hybrid Attention layer
attention_mask (torch.Tensor) – the attention mask for input tensor
- Returns:
the output of the multi-head Hybrid Attention layer
- Return type:
hidden_states (torch.Tensor)
- forward(input_tensor, attention_mask)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- time_delay_agg_inference(values, corr)[source]¶
SpeedUp version of Autocorrelation (a batch-normalization style design) This is for the inference phase.
- time_delay_agg_training(values, corr)[source]¶
SpeedUp version of Autocorrelation (a batch-normalization style design) This is for the training phase.
- training: bool¶