Hyper-parameter Search Results (context)
Dataset informations: MovieLens-1m, Avazu-2m, Criteo-4m
Notes: The hyper-parameter search range in the table is for reference only. You can adjust the search range according to the actual situation of the dataset, such as reducing the range appropriately on large data to reduce time consumption. The orange bold text in the table represents the recommended value within the hyper-parameter search range. And the symbol "\" indicates that the model appears out of memory when running on a GPU with 12G memory or the running time is too long to complete the hyper-parameter search.
Model | MovieLens-1m | Avazu-2m | Criteo-4m |
---|---|---|---|
AFM | learning_rate in [5e-5,1e-4,5e-4] dropout_prob in [0.0,0.1] attention_size in [20,30] reg_weight in [2,5] |
learning_rate in [5e-5,1e-4,5e-4] dropout_prob in [0.0,0.1] attention_size in [20,30] reg_weight in [2,5] |
learning_rate in [5e-5,1e-4,5e-4] dropout_prob in [0.0,0.1] attention_size in [20,30] reg_weight in [2,5] |
AutoInt | learning_rate in [1e-3,5e-3] dropout_prob in [0.0,0.1] attention_size in [8,16,32] mlp_hidden_size in ['[64,64,64]','[128,128,128]','[256,256,256]'] |
learning_rate in [1e-3,5e-3] dropout_prob in [0.0,0.1] attention_size in [8,16,32] mlp_hidden_size in ['[64,64,64]','[128,128,128]','[256,256,256]'] |
learning_rate in [1e-3,5e-3] dropout_prob in [0.0,0.1] attention_size in [8,16,32] mlp_hidden_size in ['[64,64,64]','[128,128,128]','[256,256,256]'] |
DCN | learning_rate in [1e-4,5e-4,1e-3,5e-3,6e-3] mlp_hidden_size in ['[128,128,128]','[256,256,256]','[512,512,512]','[1024,1024,1024]'] reg_weight in [1,2,5] cross_layer_num in [6] dropout_prob in [0.1,0.2] |
learning_rate in [1e-4,5e-4,1e-3,5e-3,6e-3] mlp_hidden_size in ['[128,128,128]','[256,256,256]','[512,512,512]','[1024,1024,1024]'] reg_weight in [1,2,5] cross_layer_num in [6] dropout_prob in [0.1,0.2] |
learning_rate in [1e-4,5e-4,1e-3,5e-3,6e-3] mlp_hidden_size in ['[128,128,128]','[256,256,256]','[512,512,512]','[1024,1024,1024]'] reg_weight in [1,2,5] cross_layer_num in [6] dropout_prob in [0.1,0.2] |
DCN V2 (stacked) | learning_rate in [5e-3,1e-3,5e-4] mlp_hidden_size in ['[256,256]','[512,512]','[768,768]','[1024, 1024]'] cross_layer_num in [2,3,4] dropout_prob in [0.1,0.2] reg_weight in [1,2,5] |
learning_rate in [5e-3,1e-3,5e-4] mlp_hidden_size in ['[256,256]','[512,512]','[768,768]','[1024, 1024]'] cross_layer_num in [2,3,4] dropout_prob in [0.1,0.2] reg_weight in [1,2,5] |
learning_rate in [5e-3,1e-3,5e-4] mlp_hidden_size in ['[256,256]','[512,512]','[1024, 1024]'] cross_layer_num in [2,3,4] dropout_prob in [0.1,0.2] reg_weight in [1,2,5] |
DeepFM | learning_rate in [1e-3,5e-3,1e-2] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[128,128,128]','[256,256,256]'] |
learning_rate in [1e-3,5e-3,1e-2] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[128,128,128]','[256,256,256]'] |
learning_rate in [1e-3,5e-3,1e-2] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[128,128,128]','[256,256,256]'] |
DIEN | learning_rate in [1e-4, 1e-3] mlp_hidden_size in ['[128,128,128]','[256,256,256]'] dropout_prob in [0.0, 0.1] |
\ | \ |
DIN | learning_rate in [1e-4,5e-4,1e-3,3e-3,5e-3,6e-3,1e-2] dropout_prob in[0.0,0.1,0.2,0.3] mlp_hidden_size in ['[64,64,64]','[128,128,128]','[256,256,256]','[512,512,512]'] pooling_mode in ['mean','max','sum'] |
\ | \ |
FFM | learning_rate in [1e-4,5e-4,1e-3,5e-3,5e-2] | \ | \ |
FM | learning_rate in [5e-5,1e-4,2e-4,5e-4,1e-3,5e-3] | learning_rate in [5e-5,1e-4,2e-4,5e-4,1e-3,5e-3] | learning_rate in [5e-5,1e-4,2e-4,5e-4,1e-3,5e-3] |
FNN | learning_rate in [5e-4,1e-3,3e-3,5e-3] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[128,256,128]','[128,128,128]'] |
learning_rate in [5e-4,1e-3,3e-3,5e-3] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[128,256,128]','[128,128,128]'] |
learning_rate in [5e-4,1e-3,3e-3,5e-3] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[128,256,128]','[128,128,128]'] |
FwFM | learning_rate in [1e-4,5e-4,1e-3,5e-3,1e-2] dropout_prob in [0.0,0.2,0.4] |
learning_rate in [1e-4,5e-4,1e-3,5e-3,1e-2] dropout_prob in [0.0,0.2,0.4] |
learning_rate in [1e-4,5e-4,1e-3,5e-3,1e-2] dropout_prob in [0.0,0.2,0.4] |
LR | learning_rate in [5e-5,1e-4,2e-4,5e-4,1e-3,5e-3] | learning_rate in [5e-5,1e-4,2e-4,5e-4,1e-3,5e-3] | learning_rate in [5e-5,1e-4,2e-4,5e-4,1e-3,5e-3] |
NFM | learning_rate in [5e-5,8e-5,1e-4,5e-4,1e-3] dropout_prob in [0.1,0.2,0.3] mlp_hidden_size in ['[20,20,20]','[40,40,40]','[50,50,50]'] |
learning_rate in [5e-5,8e-5,1e-4,5e-4,1e-3] dropout_prob in [0.1,0.2,0.3] mlp_hidden_size in ['[20,20,20]','[40,40,40]','[50,50,50]'] |
learning_rate in [5e-5,8e-5,1e-4,5e-4,1e-3] dropout_prob in [0.1,0.2,0.3] mlp_hidden_size in ['[20,20,20]','[40,40,40]','[50,50,50]'] |
PNN(inner) | learning_rate in [1e-3,3e-3,5e-3,6e-3,1e-2] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[64,64,64]','[128,128,128]','[256,256,256]'] reg_weight in [0.0] |
learning_rate in [1e-3,3e-3,5e-3,6e-3,1e-2] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[64,64,64]','[128,128,128]','[256,256,256]'] reg_weight in [0.0] |
learning_rate in [1e-3,3e-3,5e-3,6e-3,1e-2] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[64,64,64]','[128,128,128]','[256,256,256]'] reg_weight in [0.0] |
PNN(outer) | learning_rate in [1e-3,3e-3,5e-3,6e-3,1e-2] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[64,64,64]','[128,128,128]','[256,256,256]'] reg_weight in [0.0] |
learning_rate in [1e-3,3e-3,5e-3,6e-3,1e-2] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[64,64,64]','[128,128,128]','[256,256,256]'] reg_weight in [0.0] |
learning_rate in [1e-3,3e-3,5e-3,6e-3,1e-2] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[64,64,64]','[128,128,128]','[256,256,256]'] reg_weight in [0.0] |
WideDeep | learning_rate in [5e-4,1e-3,5e-3,1e-2] dropout_prob in [0.0,0.2] mlp_hidden_size in ['[64,64,64]','[128,128,128]','[256,256,256]'] |
learning_rate in [5e-4,1e-3,5e-3,1e-2] dropout_prob in [0.0,0.2] mlp_hidden_size in ['[64,64,64]','[128,128,128]','[256,256,256]'] |
learning_rate in [5e-4,1e-3,5e-3,1e-2] dropout_prob in [0.0,0.2] mlp_hidden_size in ['[64,64,64]','[128,128,128]','[256,256,256]'] |
xDeepFM | learning_rate in [1e-4,1e-3,5e-3,6e-3] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[128,128,128]','[256,256,256]','[512,512,512]'] cin_layer_size in ['[60,60,60]','[100,100,100]'] reg_weight in [1e-5,5e-4] |
learning_rate in [1e-4,1e-3,5e-3,6e-3] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[128,128,128]','[256,256,256]','[512,512,512]'] cin_layer_size in ['[60,60,60]','[100,100,100]'] reg_weight in [1e-5,5e-4] |
learning_rate in [1e-4,1e-3,5e-3,6e-3] dropout_prob in [0.0,0.1] mlp_hidden_size in ['[128,128,128]','[256,256,256]','[512,512,512]'] cin_layer_size in ['[60,60,60]','[100,100,100]'] reg_weight in [1e-5,5e-4] |
FiGNN | learning_rate in [0.005,0.001,0.0005] attention_size in [8,16,32] n_layers in [2,3,4] |
learning_rate in [0.005,0.001,0.0005] attention_size in [8,16,32] n_layers in [2,3,4] |
learning_rate in [0.005,0.001,0.0005] attention_size in [8,16,32] n_layers in [2,3,4] |
KD_DAGFM | learning_rate in [1e-6,5e-6,1e-5,5e-5,1e-4,5e-4,1e-3,0.003] | learning_rate in [1e-6,5e-6,1e-5,5e-5,1e-4,5e-4,1e-3,0.003] | learning_rate in [1e-6,5e-6,1e-5,5e-5,1e-4,5e-4,1e-3,0.003] |