Deep Learning on MoA
NN model
Code for NN model
# below stated is the architecture of 1d cnn model which will work on gene expressions and cell viability features
def nn_model():
input1 = Input(shape=(60,), batch_size=None, name='input1')
input2 = Input(shape=(887,), batch_size=None, name='input2')
x = BatchNormalization(name= 'norm0')(input1)
x = Dropout(rate = 0.1, name= 'drop0')(x)
x = Dense(512, kernel_initializer = initializers.glorot_normal(),activation='elu', name= 'dense0')(x)
x = BatchNormalization(name= 'norm1')(x)
x = Dense(256, kernel_initializer = initializers.glorot_normal(),activation='elu', name= 'dense1')(x)
answer1 = Concatenate(axis=-1)([x,input2])
y = BatchNormalization(name= 'norm2')(answer1)
y = Dropout(rate = 0.3, name= 'drop1')(y)
y = Dense(512, kernel_initializer = initializers.glorot_normal(),activation='relu', name= 'dense2')(y)
y = BatchNormalization(name= 'norm3')(y)
y = Dense(512, kernel_initializer = initializers.glorot_normal(),activation='relu', name= 'dense3')(y)
y = BatchNormalization(name= 'norm4')(y)
y = Dense(256, kernel_initializer = initializers.glorot_normal(),activation='relu', name= 'dense4')(y)
y = BatchNormalization(name= 'norm5')(y)
y = Dense(256, kernel_initializer = initializers.glorot_normal(),activation='relu', name= 'dense5')(y)
z = Average()([x, y])
z = BatchNormalization(name= 'norm6')(z)
z = Dense(256, kernel_initializer = initializers.LecunNormal(),activation='relu', name= 'dense6')(z)
z = BatchNormalization(name= 'norm7')(z)
z = Dense(206, kernel_initializer = initializers.LecunNormal(),activation='relu', name= 'dense7')(z)
z = BatchNormalization(name= 'norm8')(z)
model_output = Dense(206, activation='sigmoid', name= 'output')(z)
model = Model(inputs=[input1, input2], outputs=model_output)
model.compile(loss=losses.BinaryCrossentropy(label_smoothing=0.001), optimizer=Adam(lr=0.0001, decay=0.0001), metrics=[logloss])
return modelSummary of NN model
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input1 (InputLayer) [(None, 60)] 0
__________________________________________________________________________________________________
norm0 (BatchNormalization) (None, 60) 240 input1[0][0]
__________________________________________________________________________________________________
drop0 (Dropout) (None, 60) 0 norm0[0][0]
__________________________________________________________________________________________________
dense0 (Dense) (None, 512) 31232 drop0[0][0]
__________________________________________________________________________________________________
norm1 (BatchNormalization) (None, 512) 2048 dense0[0][0]
__________________________________________________________________________________________________
dense1 (Dense) (None, 256) 131328 norm1[0][0]
__________________________________________________________________________________________________
input2 (InputLayer) [(None, 887)] 0
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 1143) 0 dense1[0][0]
input2[0][0]
__________________________________________________________________________________________________
norm2 (BatchNormalization) (None, 1143) 4572 concatenate[0][0]
__________________________________________________________________________________________________
drop1 (Dropout) (None, 1143) 0 norm2[0][0]
__________________________________________________________________________________________________
dense2 (Dense) (None, 512) 585728 drop1[0][0]
__________________________________________________________________________________________________
norm3 (BatchNormalization) (None, 512) 2048 dense2[0][0]
__________________________________________________________________________________________________
dense3 (Dense) (None, 512) 262656 norm3[0][0]
__________________________________________________________________________________________________
norm4 (BatchNormalization) (None, 512) 2048 dense3[0][0]
__________________________________________________________________________________________________
dense4 (Dense) (None, 256) 131328 norm4[0][0]
__________________________________________________________________________________________________
norm5 (BatchNormalization) (None, 256) 1024 dense4[0][0]
__________________________________________________________________________________________________
dense5 (Dense) (None, 256) 65792 norm5[0][0]
__________________________________________________________________________________________________
average (Average) (None, 256) 0 dense1[0][0]
dense5[0][0]
__________________________________________________________________________________________________
norm6 (BatchNormalization) (None, 256) 1024 average[0][0]
__________________________________________________________________________________________________
dense6 (Dense) (None, 256) 65792 norm6[0][0]
__________________________________________________________________________________________________
norm7 (BatchNormalization) (None, 256) 1024 dense6[0][0]
__________________________________________________________________________________________________
dense7 (Dense) (None, 206) 52942 norm7[0][0]
__________________________________________________________________________________________________
norm8 (BatchNormalization) (None, 206) 824 dense7[0][0]
__________________________________________________________________________________________________
output (Dense) (None, 206) 42642 norm8[0][0]
==================================================================================================
Total params: 1,384,292
Trainable params: 1,376,866
Non-trainable params: 7,426
__________________________________________________________________________________________________
- Inputs:
Model has got two input heads.
- The first head takes 60 PCA (50 PCA of gene expression features and 10 PCA of cell viability features) components as input.
- The second head takes all the gene expression and cell viability features as input plus 15 row wise statistical features which were discussed in previous ML part.
- Training:
- Model was trained for 50 epochs with a training batch size of 128 samples and 7 fold C.V was followed.
- See training plots
- Prediction:
- At the output we have 206 neurons, one for each target label. Average prediction of 7 C.V. model is take into consideration.
- We are using Binary Cross entropy loss.
- Conclusion:
The submission and score of NN model is as shown here .
Click here to view the submission notebook

Neural Networks approach seems to be promising. Lets try 1d cnn network now.
1D CNN model
Code for NN model
# below stated is the architecture of 1d cnn model which will work on gene expressions and cell viability features
def cnn_1d_model():
model_input = Input(shape=(872, 1), batch_size=None, name='input')
x = Conv1D(filters = 32, kernel_size = 3, kernel_initializer = initializers.glorot_normal(),
activation ='relu', name='conv_1d_1')(model_input)
x = Conv1D(filters = 16, kernel_size = 3, kernel_initializer = initializers.glorot_normal(),
activation ='relu', name='conv_1d_2')(x)
x = Conv1D(filters = 8, kernel_size = 3, kernel_initializer = initializers.glorot_normal(),
activation ='relu', name='conv_1d_3')(x)
x = Flatten(name= 'flat1')(x)
x = Dense(2048, kernel_initializer = initializers.glorot_normal(seed=3), activation='relu', name= 'dense1')(x)
x = Dense(1048, kernel_initializer = initializers.glorot_normal(seed=3), activation='relu', name= 'dense2')(x)
model_output = Dense(206, kernel_initializer = initializers.glorot_normal(seed=3), activation='sigmoid', name='output')(x)
model = Model(inputs=model_input, outputs=model_output)
model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0001), metrics=['accuracy'])
return modelSummary of 1D CNN model
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(None, 872, 1)] 0
_________________________________________________________________
conv_1d_1 (Conv1D) (None, 870, 32) 128
_________________________________________________________________
conv_1d_2 (Conv1D) (None, 868, 16) 1552
_________________________________________________________________
conv_1d_3 (Conv1D) (None, 866, 8) 392
_________________________________________________________________
flat1 (Flatten) (None, 6928) 0
_________________________________________________________________
dense1 (Dense) (None, 2048) 14190592
_________________________________________________________________
dense2 (Dense) (None, 1048) 2147352
_________________________________________________________________
output (Dense) (None, 206) 216094
=================================================================
Total params: 16,556,110
Trainable params: 16,556,110
Non-trainable params: 0
_________________________________________________________________
- Inputs:
Model has got two input heads.
- Only 872 numerical features gene expression and cell viability were passed as input to the model.
- Training:
- Model was trained for 20 epochs with a training batch size of 300 samples and 6 fold C.V was followed.
- See training plots
- Prediction:
- At the output we have 206 neurons, one for each target label. Average prediction of 6 C.V. model is take into cosideration.
- We are using Binary Cross entropy loss.
-
Conclusion: The submission and score of 1dcnn model is as shown here .
Click here to view the submission notebook
Apparently it performs better than NN model, now lets try TABNET model
TABNET model
- Features:
A lot of features were passed as input to the models. They are listed as follows.
- 875 dataset features.
- 600 PCA components of gene expression features.
- 50 PCA components of cell viability features.
- Cluster label obtained from clustering of gene expressions feature.
- Cluster label obtained from clustering of cell viability feature.
- Cluster label obtained from clustering of all PCA components.
- 15 row wise statistical features.
- Training:
- Model was trained for 200 epochs with a training batch size of 1024 samples and 13 fold C.V was followed.
- See training plots
- Prediction:
- At the output we have 206 neurons, one for each target label. Average prediction of 13 C.V. model is take into cosideration.
- We are using Binary Cross entropy loss.
- Conclusion:
The submission and score of TABNET model is as shown here
Click here to view the submission notebook
Of all the models we tried till now, TABNET performs best among them.
Performance Comparison:
| Model | PRIVATE SCORE | PUBLIC SCORE |
|---|---|---|
| Baseline Model | 0.02154 | 0.02398 |
| LR model | 0.0335 | 0.04085 |
| SVM model | 0.09797 | 0.10802 |
| XGBOOST model | 0.01739 | 0.01961 |
| NN model | 0.01768 | 0.01991 |
| 1d CNN model | 0.01727 | 0.01987 |
| TABNET | 0.01627 | 0.0185 |
- Conclusion:
Of all the models we tried till now, TABNET performs best among them.
| Blog part |
|---|
| 1. MoA problem definition link |
| 2. EDA on LISH MoA dataset |
| 3. Feature Engineering and Baseline model for MoA |
| 4. ML techniques on MoA dataset |