DL techniques on MoA dataset

Deep Learning on MoA

NN model

Code for NN model

# below stated is the architecture of 1d cnn model which will work on gene expressions and cell viability features
def nn_model():
   
    input1 = Input(shape=(60,), batch_size=None, name='input1')
    input2 = Input(shape=(887,), batch_size=None, name='input2')

    x = BatchNormalization(name= 'norm0')(input1)
    x = Dropout(rate = 0.1, name= 'drop0')(x)
    x = Dense(512, kernel_initializer = initializers.glorot_normal(),activation='elu', name= 'dense0')(x)
    x = BatchNormalization(name= 'norm1')(x)
    x = Dense(256, kernel_initializer = initializers.glorot_normal(),activation='elu', name= 'dense1')(x)


    answer1 = Concatenate(axis=-1)([x,input2])


    y = BatchNormalization(name= 'norm2')(answer1)
    y = Dropout(rate = 0.3, name= 'drop1')(y)
    y = Dense(512, kernel_initializer = initializers.glorot_normal(),activation='relu', name= 'dense2')(y)
    y = BatchNormalization(name= 'norm3')(y)
    y = Dense(512, kernel_initializer = initializers.glorot_normal(),activation='relu', name= 'dense3')(y)
    y = BatchNormalization(name= 'norm4')(y)
    y = Dense(256, kernel_initializer = initializers.glorot_normal(),activation='relu', name= 'dense4')(y)
    y = BatchNormalization(name= 'norm5')(y)
    y = Dense(256, kernel_initializer = initializers.glorot_normal(),activation='relu', name= 'dense5')(y)

    z = Average()([x, y])
    z = BatchNormalization(name= 'norm6')(z)
    z = Dense(256, kernel_initializer = initializers.LecunNormal(),activation='relu', name= 'dense6')(z)
    z = BatchNormalization(name= 'norm7')(z)
    z = Dense(206, kernel_initializer = initializers.LecunNormal(),activation='relu', name= 'dense7')(z)
    z = BatchNormalization(name= 'norm8')(z)
    model_output = Dense(206, activation='sigmoid', name= 'output')(z)

    model = Model(inputs=[input1, input2], outputs=model_output)
    model.compile(loss=losses.BinaryCrossentropy(label_smoothing=0.001), optimizer=Adam(lr=0.0001, decay=0.0001), metrics=[logloss])

    return model

Summary of NN model

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input1 (InputLayer)             [(None, 60)]         0                                            
__________________________________________________________________________________________________
norm0 (BatchNormalization)      (None, 60)           240         input1[0][0]                     
__________________________________________________________________________________________________
drop0 (Dropout)                 (None, 60)           0           norm0[0][0]                      
__________________________________________________________________________________________________
dense0 (Dense)                  (None, 512)          31232       drop0[0][0]                      
__________________________________________________________________________________________________
norm1 (BatchNormalization)      (None, 512)          2048        dense0[0][0]                     
__________________________________________________________________________________________________
dense1 (Dense)                  (None, 256)          131328      norm1[0][0]                      
__________________________________________________________________________________________________
input2 (InputLayer)             [(None, 887)]        0                                            
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 1143)         0           dense1[0][0]                     
                                                                 input2[0][0]                     
__________________________________________________________________________________________________
norm2 (BatchNormalization)      (None, 1143)         4572        concatenate[0][0]                
__________________________________________________________________________________________________
drop1 (Dropout)                 (None, 1143)         0           norm2[0][0]                      
__________________________________________________________________________________________________
dense2 (Dense)                  (None, 512)          585728      drop1[0][0]                      
__________________________________________________________________________________________________
norm3 (BatchNormalization)      (None, 512)          2048        dense2[0][0]                     
__________________________________________________________________________________________________
dense3 (Dense)                  (None, 512)          262656      norm3[0][0]                      
__________________________________________________________________________________________________
norm4 (BatchNormalization)      (None, 512)          2048        dense3[0][0]                     
__________________________________________________________________________________________________
dense4 (Dense)                  (None, 256)          131328      norm4[0][0]                      
__________________________________________________________________________________________________
norm5 (BatchNormalization)      (None, 256)          1024        dense4[0][0]                     
__________________________________________________________________________________________________
dense5 (Dense)                  (None, 256)          65792       norm5[0][0]                      
__________________________________________________________________________________________________
average (Average)               (None, 256)          0           dense1[0][0]                     
                                                                 dense5[0][0]                     
__________________________________________________________________________________________________
norm6 (BatchNormalization)      (None, 256)          1024        average[0][0]                    
__________________________________________________________________________________________________
dense6 (Dense)                  (None, 256)          65792       norm6[0][0]                      
__________________________________________________________________________________________________
norm7 (BatchNormalization)      (None, 256)          1024        dense6[0][0]                     
__________________________________________________________________________________________________
dense7 (Dense)                  (None, 206)          52942       norm7[0][0]                      
__________________________________________________________________________________________________
norm8 (BatchNormalization)      (None, 206)          824         dense7[0][0]                     
__________________________________________________________________________________________________
output (Dense)                  (None, 206)          42642       norm8[0][0]                      
==================================================================================================
Total params: 1,384,292
Trainable params: 1,376,866
Non-trainable params: 7,426
__________________________________________________________________________________________________

Inputs: Model has got two input heads.
- The first head takes 60 PCA (50 PCA of gene expression features and 10 PCA of cell viability features) components as input.
- The second head takes all the gene expression and cell viability features as input plus 15 row wise statistical features which were discussed in previous ML part.
Training:
- Model was trained for 50 epochs with a training batch size of 128 samples and 7 fold C.V was followed.
- See training plots
Prediction:
- At the output we have 206 neurons, one for each target label. Average prediction of 7 C.V. model is take into consideration.
- We are using Binary Cross entropy loss.
Conclusion: The submission and score of NN model is as shown here .
Click here to view the submission notebook

Screenshot-2021-02-15-at-2-07-43-PM

Neural Networks approach seems to be promising. Lets try 1d cnn network now.

1D CNN model

Code for NN model

# below stated is the architecture of 1d cnn model which will work on gene expressions and cell viability features
def cnn_1d_model():
    model_input = Input(shape=(872, 1), batch_size=None, name='input')
    x = Conv1D(filters = 32, kernel_size = 3, kernel_initializer = initializers.glorot_normal(),
               activation ='relu', name='conv_1d_1')(model_input)
    x = Conv1D(filters = 16, kernel_size = 3, kernel_initializer = initializers.glorot_normal(), 
               activation ='relu', name='conv_1d_2')(x)
    x = Conv1D(filters = 8, kernel_size = 3, kernel_initializer = initializers.glorot_normal(), 
               activation ='relu', name='conv_1d_3')(x)
    x = Flatten(name= 'flat1')(x)

    x = Dense(2048, kernel_initializer = initializers.glorot_normal(seed=3), activation='relu', name= 'dense1')(x)

    x = Dense(1048, kernel_initializer = initializers.glorot_normal(seed=3), activation='relu', name= 'dense2')(x)

    model_output = Dense(206, kernel_initializer = initializers.glorot_normal(seed=3), activation='sigmoid', name='output')(x)
    
    model = Model(inputs=model_input, outputs=model_output)
    model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0001), metrics=['accuracy'])
    
    return model

Summary of 1D CNN model

Model: "model_1"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    input (InputLayer)           [(None, 872, 1)]          0         
    _________________________________________________________________
    conv_1d_1 (Conv1D)           (None, 870, 32)           128       
    _________________________________________________________________
    conv_1d_2 (Conv1D)           (None, 868, 16)           1552      
    _________________________________________________________________
    conv_1d_3 (Conv1D)           (None, 866, 8)            392       
    _________________________________________________________________
    flat1 (Flatten)              (None, 6928)              0         
    _________________________________________________________________
    dense1 (Dense)               (None, 2048)              14190592  
    _________________________________________________________________
    dense2 (Dense)               (None, 1048)              2147352   
    _________________________________________________________________
    output (Dense)               (None, 206)               216094    
    =================================================================
    Total params: 16,556,110
    Trainable params: 16,556,110
    Non-trainable params: 0
    _________________________________________________________________

Inputs: Model has got two input heads.
- Only 872 numerical features gene expression and cell viability were passed as input to the model.
Training:
- Model was trained for 20 epochs with a training batch size of 300 samples and 6 fold C.V was followed.
- See training plots
Prediction:
- At the output we have 206 neurons, one for each target label. Average prediction of 6 C.V. model is take into cosideration.
- We are using Binary Cross entropy loss.
Conclusion: The submission and score of 1dcnn model is as shown here .
Click here to view the submission notebook

Apparently it performs better than NN model, now lets try TABNET model

TABNET model

Features: A lot of features were passed as input to the models. They are listed as follows.
- 875 dataset features.
- 600 PCA components of gene expression features.
- 50 PCA components of cell viability features.
- Cluster label obtained from clustering of gene expressions feature.
- Cluster label obtained from clustering of cell viability feature.
- Cluster label obtained from clustering of all PCA components.
- 15 row wise statistical features.
Training:
- Model was trained for 200 epochs with a training batch size of 1024 samples and 13 fold C.V was followed.
- See training plots
Prediction:
- At the output we have 206 neurons, one for each target label. Average prediction of 13 C.V. model is take into cosideration.
- We are using Binary Cross entropy loss.
Conclusion: The submission and score of TABNET model is as shown here
Click here to view the submission notebook

Screenshot-2021-02-15-at-12-46-25-PM Of all the models we tried till now, TABNET performs best among them.

Performance Comparison:

Model	PRIVATE SCORE	PUBLIC SCORE
Baseline Model	0.02154	0.02398
LR model	0.0335	0.04085
SVM model	0.09797	0.10802
XGBOOST model	0.01739	0.01961
NN model	0.01768	0.01991
1d CNN model	0.01727	0.01987
TABNET	0.01627	0.0185

Conclusion: Of all the models we tried till now, TABNET performs best among them.

Blog part
1. MoA problem definition link
2. EDA on LISH MoA dataset
3. Feature Engineering and Baseline model for MoA
4. ML techniques on MoA dataset