In collaboration with Payame Noor University and the Iranian Society of Instrumentation and Control Engineers

Document Type : Research Article

Author

Department of Computer Engineering and Information Technology‎, ‎Payame Noor University‎, ‎‎Tehran,‎ ‎Iran‎.

10.30473/coam.2025.72710.1269

Abstract

Gene expression signatures‎ reflect the response of cell tissues to diseases‎, ‎genetic disorders‎, ‎and drug treatments‎, ‎ containing hidden patterns that can provide valuable insights for biological research and cancer diagnostics‎. ‎This study‎proposes a hybrid deep learning approach combining convolutional neural networks (CNNs) and support vector machines (SVMs) to classify cancer types using unstructured gene expression data‎. ‎ We applied three hybrid CNN-SVM models to a dataset of 10,340 samples spanning 33 cancer types from the Cancer Genome Atlas‎‎. ‎The CNN component extracted latent features from the gene expression data‎, ‎while the SVM replaced the softmax layer to enhance classification robustness‎. ‎ Among the proposed models‎, ‎the Hybrid-CNN-SVM model achieved superior performance‎, ‎demonstrating excellent prediction accuracy and outperforming other models‎. ‎This study highlights the potential of hybrid deep learning frameworks for cancer type prediction and underscores their applicability to high-dimensional genomic datasets‎.

Highlights

  • Developed a Hybrid-CNN-SVM model for classifying high-dimensional gene expression data.
  • CNNs extract high-level features from 33 cancer types and 23 normal tissue datasets.
  • SVMs classify these features into cancerous or normal tissues with high accuracy.
  • The approach reduces CNN limitations, such as noise sensitivity and slow training.
  • Results demonstrate improved accuracy and robustness in cancer tissue analysis.

Keywords

Main Subjects

[1] Ajmal, H.B., and Madden, M.G.(2022).“ Dynamic Bayesian network learning to infer sparse models from time series Gene expression data'”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(5), 2794-2805, doi10.1109/TCBB.2021.3092879.
[2] Arowolo, M.O., Adebiyi, M., Adebiyi, A., and Okesola, O. (2020). “PCA model for RNA-seq malaria vector data classification using KNN and decision tree algorithm”, International Conference in Mathematics, Computer Engineering and Computer Science (ICMCECS), 1-8, doi: 10.1109/ICMCECS47690.2020.240881.
[3] Barman, S., and Kwon, Y.-K. (2018). “A Boolean network inference from time-series gene expression data using a genetic algorithm”, Bioinformatics, 34(17), i927-i933, doi:  10.1093/bioinformatics/bty584.
[4] Ciriello, G., Gatza, M.L., Beck, A.H., Wilkerson, M.D., Rhie, S.K., Pastore, A., Zhang, H., McLellan, M., Yau, C., Kandoth, C., Bowlby, R., Shen, H., Hayat, S., Fieldhouse, R., Lester, S.C., Tse, G.M., Factor, R.E., Collins, L.C., Allison, K.H., Chen, Y.Y., Jensen, K., Johnson, N.B., Oesterreich, S., Mills, G.B., Cherniack, A.D., Robertson, G., Benz, C., Sander, C., Laird, P.W., Hoadley, K.A., and King, T.A. (2015). TCGA Research Network; “Comprehensive molecular portraits of invasive lobular breast cancer”, Perou CM. Cell. 163(2):506-519. PMID: 26451490; PM-CID: PMC4603750, doi10.1016/j.cell.2015.09.033.
[5] Colaprico, A., Silva, T.C., Olsen, C., Garofano, L., Cava, C., Garolini, D., Sabedot, T.S., Malta, T.M., Pagnotta, S.M., Castiglioni, I., Ceccarelli, M., Bontempi, G., and Noushmehr, H. (2016). “TC-GAbiolinks: An R/Bioconductor package for integrative analysis of TCGA data”. Nucleic Acids Research, 44(8): e71. PMID: 26704973; PMCID: PMC4856967, doi10.1093/nar/gkv1507.
[6] Huang, Z., Johnson, T.S., Han, Z., Helm, B., Cao, S., Zhang, C., Salama, P., Rizkalla, M., Yu, C.Y., Cheng, J., Xiang, S., Zhan, X., Zhang, J., and Huang, K. (2020). “Deep learning-based cancer survival prognosis from RNA-seq data: Approaches and evaluations”, BMC Medical Genomics, 13(S5), doi10.1186/s12920-020-0686-1.
[7] Jiang, X., Zhao, J., Qian, W., Song, W., and Lin, G.N. (2020). “A generative adversarial network model for disease gene prediction with RNA-Seq data”, IEEE Access, 8, 37352-37360, doi:10.1109/ACCESS.2020.2975585.
[8] Kim, T., Chen, I.R., Lin, Y., Wang, A.Y.-Y., Yang, J. Y., and Yang, P. (2018). “Impact of similarity metrics on single-cell RNA-seq data clustering”, Briefings in Bioinformatics, 20(6), 2316-2326, doi10.1093/bib/bby076.
[9] Li, Y., Kang, K., Krahn, J.M., Croutwater, N., Lee, K., Umbach, D.M., and Li, L. (2017). “A comprehensive genomic pan-cancer classification using the cancer genome atlas gene expression data”, BMC Genomics, 18(1), doi10.1186/s12864-017-3906-0.
[10] Locati, L.D., Serafini, M.S., Iannò, M.F., Carenzo, A., Orlandi, E., Resteghin, C., Cavalieri, S., Bossi, P., Canevari, S., Licitra, L., and De Cecco, L. (2019). “Mining of self-organizing map gene-expression portraits reveals prognostic stratification of HPV-positive head and neck squamous cell carcinoma”, Cancers, 11(8), 1057,  doi10.3390/cancers11081057.
[11] Lyu, B., and Haque, A.(2018).“ Deep learning based tumor type classification using gene expression data”, Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, 89-96, doi10.1155/2022/4715998.
[12] Ma, R., Yang, G., Xu, R., Liu, X., Zhang, Y., Ma, Y., and Wang, Q. (2019). “Pattern analysis of conditional essentiality (pace)-based heuristic identification of an in vivo colonization determinant as a novel target for the construction of a live attenuated vaccine against Edwardsiella piscicida”, Fish and Shellfish Immunology, 90, 65-72, doi10.1016/j.fsi.2019.03.079.
[13] Melana, S.M., Nepomnaschy, I., Hasa, J., Djougarian, A., Djougarian, A., Holland, J.F., and Pogo, B.G. (2010). “Detection of human mammary tumor virus proteins in human breast cancer cells”, Journal of Virological Methods, 163(1), 157-161, doi10.1016/j.jviromet.2009.09.015.
[14] Monti, M., Fiorentino, J., Milanetti, E., Gosti, G., and Tartaglia, G.G. (2022). “Prediction of time series gene expression and structural analysis of gene regulatory networks using recurrent neural networks”, Entropy (Basel, Switzerland), 24(2), 141, doi10.3390/e24020141.
[15] Mostavi, M., Chiu, Y.-C., Huang, Y., and Chen, Y. (2020). “Convolutional neural network models for cancer type prediction based on gene expression”, BMC Medical Genomics, 13(S5), doi10.1186/s12920-020-0677-2.
[16] Moussa, M., and Măndoiu, I.I. (2018). “Single cell RNA-seq data clustering using TF-IDF based methods”, BMC Genomics, 19(S6),  doi10.1186/s12864-018-4922-4.
[17] Nandini, D., Capecci, E., Koefoed, L., Laña, I., Shahi, G.K., and Kasabov, N. (2018). “Modelling and analysis of temporal gene expression data using spiking neural networks”, Lecture Notes in Computer Science, 571-581, doi:10.1007/978-3-030-04167-0_52.
[18] Peng, J., Wang, X., and Shang, X. (2019). “Combining gene ontology with deep neural networks to enhance the clustering of single cell RNA-Seq Data”, BMC Bioinformatics, 20(S8), doi10.1186/s12859-019-2769-6.
[19] Ramirez, R., Chiu, Y.-C., Hererra, A., Mostavi, M., Ramirez, J., Chen, Y., Huang, Y., and Jin, Y.-F. (2020). “Classification of cancer types using graph convolutional neural networks”, Frontiers in Physics, 8, doi10.3389/fphy.2020.00203.
[20] Stickels, R.R., Murray, E., Kumar, P., Li, J., Marshall, J.L., DiBella, D., Arlotta, P., Macosko, E.Z., and Chen, F. (2020). “Sensitive spatial genome wide expression profiling at cellular resolution”, bioRxiv, 989806, doi:10.1101/2020.03.12.989806.
[21] Tran, K.A., and Kondrashova, O., Bradley, A., Williams, E.D., Pearson, J.V., and Waddell, N. (2021). “Deep learning in cancer diagnosis, prognosis and treatment selection”, Genome Medicine, 13(1), doi:10.1186/s13073-021-00968-x.
[22] Xiao, Y., Wu, J., Lin, Z., and Zhao, X. (2018). “A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-Seq data”, Computer Methods and Programs in Biomedicine, 166, 99-105, doi:10.1016/j.cmpb.2018.10.004.
[23] Xie, Y., Meng, W.-Y., Li, R.-Z., Wang, Y.-W., Qian, X., Chan, C., Yu, Z.-F., Fan, X.-X., Pan, H.-D., Xie, C., Wu, Q.-B., Yan, P.-Y., Liu, L., Tang, Y.-J., Yao, X.-J., Wang, M.-F., and Leung, E.L.-H. (2021). “Early lung cancer diagnostic biomarker discovery by machine learning methods”, Translational Oncology, 14(1), 100907, doi:10.1016/j.tranon.2020.100907.