Deep Learning customization models to enhance the efficiency of Cancer Diagnosis
by PHAM Tri Cong
by PHAM Tri Cong
The dissertation considers two problems of cancer diagnosis based on two open data source types including gene expression and medical images. The first problem of breast cancer subtype prognosis based on gene expression is studied by the idea of a combination of data collected from multiple sources and machine learning algorithms. The second problem of skin cancer diagnosis is researched by the combination of data collected from largest existing data sources and deep learning addressing class imbalance.
Our objectives are optimizing algorithms to improve the efficiency of breast cancer subtype prognosis using gene expression and proposing deep learning customization models to improve the efficiency of skin cancer diagnosis using an imbalanced dataset. The dissertation has four main contributions:
Our objectives are optimizing algorithms to improve the efficiency of breast cancer subtype prognosis using gene expression and proposing deep learning customization models to improve the efficiency of skin cancer diagnosis using an imbalanced dataset. The dissertation has four main contributions:
- Proposing an optimized algorithm to improve accuracy and suggest related bio-markers of breast cancer subtypes problem based on gene expression. We apply the recursive feature elimination method to filter out unimportant genes. Then a support vector machine classifier with grid search hyper-parameter optimization is used to train the remaining genes after the elimination. Using the same dataset with the state-of-the-art model, we achieve the accuracy of 89.40%, which improves 5.44% accuracy. In addition, our model suggests 16 bio-markers associated with cancer that have supporting evidence from the literature and 11 new genes potential for future research (see [PTC4]).
- Proposing the best model selection method for imbalanced datasets of the skin cancer diagnosis. To solve the imbalance of the sensitivity and specificity problem of binary melanoma classification, we proposed an optimization for deep CNN combined with a change in the best model selection. Our proposed best model selection method with an increase in YI on both Test-10 and MClass-D datasets also outperforms traditional methods (see [PTC3]).
- Proposing an optimizing method to train deep convolutional neural networks on an imbalanced dataset. The deep convolutional neural networks are designed to detect melanoma as a binary classification problem. This method involves 3 key features, namely customized batch logic, customized loss function and reformed fully connected layers. The method achieved state-of-the-art performance with AUC at 94.4% with sensitivity of 85.0% and specificity of 95.0% on the MClass-D dataset of 100 dermoscopic images when we use the default prediction threshold of 0.5. Moreover, at threshold of 0.40858, it showed the most balanced measure compared to other researches, and is promisingly applied to medical diagnosis, with sensitivity of 90.0% and specificity of 93.8% (see [PTC2]).
- Proposing a hybrid method, which combines the algorithm level method of new designed loss function and the data level method of balanced mini-batch logic integrated with the real-time image augmentation, is effective in handling class effectiveness of networks optimization on the imbalanced dataset because it helps the networks learn the minority classes faster. Compared to the original methods, our proposed method not only surpasses 4.65% (86.13% vs 81.48%) of mean recalls but also reduces 4.24% of the recalls’ standard deviations (from ±11.84% to ±7.60%) (see [PTC1]).
References.
[PTC1] T. C. Pham, A. Doucet, C. M. Luong, C. T. Tran, and V. D. Hoang, “Improving Skin-Disease Classification Based on Customized Loss Function Combined With Balanced Mini-Batch Logic and Real-Time Image Augmentation,” IEEE Access, vol. 8, pp. 150725–150737, 2020.
[PTC2] T. C. Pham, C. M. Luong, V. D. Hoang and A. Doucet, “AI outperformed every dermatologist in dermoscopic melanoma diagnosis, using an optimized deep-CNN architecture with custom mini-batch logic and loss function,” Scientific Reports, 2021 (Accepted).
[PTC3] T. C. Pham, V. D. Hoang, C. T. Tran, M. S. K. Luu, D. A. Mai, A. Doucet, and C. M. Luong, “Improving binary skin cancer classification based on best model selection method combined with optimizing full connected layers of Deep CNN,” in 3rd International Conference on Multimedia Analysis and Pattern Recognition, 2020.
[PTC4] T. C. Pham, A. Doucet, T. T. Bui, M. S. K. Luu, D. A. Mai, C. M. Luong and V. D. Hoang, “A new feature selection and classification approach for optimizing breast cancer subtyping based on gene expression”, in IIHMSP/FITAT-2020.
[PTC2] T. C. Pham, C. M. Luong, V. D. Hoang and A. Doucet, “AI outperformed every dermatologist in dermoscopic melanoma diagnosis, using an optimized deep-CNN architecture with custom mini-batch logic and loss function,” Scientific Reports, 2021 (Accepted).
[PTC3] T. C. Pham, V. D. Hoang, C. T. Tran, M. S. K. Luu, D. A. Mai, A. Doucet, and C. M. Luong, “Improving binary skin cancer classification based on best model selection method combined with optimizing full connected layers of Deep CNN,” in 3rd International Conference on Multimedia Analysis and Pattern Recognition, 2020.
[PTC4] T. C. Pham, A. Doucet, T. T. Bui, M. S. K. Luu, D. A. Mai, C. M. Luong and V. D. Hoang, “A new feature selection and classification approach for optimizing breast cancer subtyping based on gene expression”, in IIHMSP/FITAT-2020.