FEARec

Reference:

Xinyu Du et al. “Frequency Enhanced Hybrid Attention Network for Sequential Recommendation.” In SIGIR 2023.

Reference code:

https://github.com/sudaada/FEARec

class recbole.model.sequential_recommender.fearec.FEABlock(n_heads, hidden_size, intermediate_size, hidden_dropout_prob, attn_dropout_prob, hidden_act, layer_norm_eps, n, config)[source]

Bases: torch.nn.modules.module.Module

One transformer layer consists of a multi-head self-attention layer and a point-wise feed-forward layer.

Parameters
  • hidden_states (torch.Tensor) – the input of the multi-head self-attention sublayer

  • attention_mask (torch.Tensor) – the attention mask for the multi-head self-attention sublayer

Returns

The output of the point-wise feed-forward sublayer,

is the output of the transformer layer.

Return type

feedforward_output (torch.Tensor)

forward(hidden_states, attention_mask)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class recbole.model.sequential_recommender.fearec.FEAEncoder(n_layers=2, n_heads=2, hidden_size=64, inner_size=256, hidden_dropout_prob=0.5, attn_dropout_prob=0.5, hidden_act='gelu', layer_norm_eps=1e-12, config=None)[source]

Bases: torch.nn.modules.module.Module

One TransformerEncoder consists of several TransformerLayers.

  • n_layers(num): num of transformer layers in transformer encoder. Default: 2

  • n_heads(num): num of attention heads for multi-head attention layer. Default: 2

  • hidden_size(num): the input and output hidden size. Default: 64

  • inner_size(num): the dimensionality in feed-forward layer. Default: 256

  • hidden_dropout_prob(float): probability of an element to be zeroed. Default: 0.5

  • attn_dropout_prob(float): probability of an attention score to be zeroed. Default: 0.5

  • hidden_act(str): activation function in feed-forward layer. Default: ‘gelu’

    candidates: ‘gelu’, ‘relu’, ‘swish’, ‘tanh’, ‘sigmoid’

  • layer_norm_eps(float): a value added to the denominator for numerical stability. Default: 1e-12

forward(hidden_states, attention_mask, output_all_encoded_layers=True)[source]
Parameters
  • hidden_states (torch.Tensor) – the input of the TransformerEncoder

  • attention_mask (torch.Tensor) – the attention mask for the input hidden_states

  • output_all_encoded_layers (Bool) – whether output all transformer layers’ output

Returns

if output_all_encoded_layers is True, return a list consists of all transformer layers’ output, otherwise return a list only consists of the output of last transformer layer.

Return type

all_encoder_layers (list)

training: bool
class recbole.model.sequential_recommender.fearec.FEARec(config, dataset)[source]

Bases: recbole.model.abstract_recommender.SequentialRecommender

static alignment(x, y)[source]
calculate_loss(interaction)[source]

Calculate the training loss for a batch data.

Parameters

interaction (Interaction) – Interaction class of the batch.

Returns

Training loss, shape: []

Return type

torch.Tensor

decompose(z_i, z_j, origin_z, batch_size)[source]

We do not sample negative examples explicitly. Instead, given a positive pair, similar to (Chen et al., 2017), we treat the other 2(N − 1) augmented examples within a minibatch as negative examples.

forward(item_seq, item_seq_len)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

full_sort_predict(interaction)[source]

full sort prediction function. Given users, calculate the scores between users and all candidate items.

Parameters

interaction (Interaction) – Interaction class of the batch.

Returns

Predicted scores for given users and all candidate items, shape: [n_batch_users * n_candidate_items]

Return type

torch.Tensor

get_attention_mask(item_seq)[source]

Generate left-to-right uni-directional attention mask for multi-head attention.

get_bi_attention_mask(item_seq)[source]

Generate bidirectional attention mask for multi-head attention.

get_same_item_index(dataset)[source]
info_nce(z_i, z_j, temp, batch_size, sim='dot')[source]

We do not sample negative examples explicitly. Instead, given a positive pair, similar to (Chen et al., 2017), we treat the other 2(N − 1) augmented examples within a minibatch as negative examples.

mask_correlated_samples(batch_size)[source]
predict(interaction)[source]

Predict the scores between users and items.

Parameters

interaction (Interaction) – Interaction class of the batch.

Returns

Predicted scores for given users and items, shape: [batch_size]

Return type

torch.Tensor

training: bool
truncated_normal_(tensor, mean=0, std=0.09)[source]
static uniformity(x)[source]
class recbole.model.sequential_recommender.fearec.FeedForward(hidden_size, inner_size, hidden_dropout_prob, hidden_act, layer_norm_eps)[source]

Bases: torch.nn.modules.module.Module

Point-wise feed-forward layer is implemented by two dense layers.

Parameters

input_tensor (torch.Tensor) – the input of the point-wise feed-forward layer

Returns

the output of the point-wise feed-forward layer

Return type

hidden_states (torch.Tensor)

forward(input_tensor)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

gelu(x)[source]

Implementation of the gelu activation function.

For information: OpenAI GPT’s gelu is slightly different (and gives slightly different results):

0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3))))

Also see https://arxiv.org/abs/1606.08415

get_hidden_act(act)[source]
swish(x)[source]
training: bool
class recbole.model.sequential_recommender.fearec.HybridAttention(n_heads, hidden_size, hidden_dropout_prob, attn_dropout_prob, layer_norm_eps, i, config)[source]

Bases: torch.nn.modules.module.Module

Hybrid Attention layer: combine time domain self-attention layer and frequency domain attention layer.

Parameters
  • input_tensor (torch.Tensor) – the input of the multi-head Hybrid Attention layer

  • attention_mask (torch.Tensor) – the attention mask for input tensor

Returns

the output of the multi-head Hybrid Attention layer

Return type

hidden_states (torch.Tensor)

forward(input_tensor, attention_mask)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

time_delay_agg_inference(values, corr)[source]

SpeedUp version of Autocorrelation (a batch-normalization style design) This is for the inference phase.

time_delay_agg_training(values, corr)[source]

SpeedUp version of Autocorrelation (a batch-normalization style design) This is for the training phase.

training: bool
transpose_for_scores(x)[source]