Periódico de Acesso Aberto
4.8
Calculated on 05 May, 2025
0.31
Powered by scimagojr.com
Informações do autor
Informações do autor
Variable selection plays a critical role in enhancing the predictive accuracy, interpretability, and computational efficiency of kernel ridge regression (KRR) models, especially when applied to high-dimensional datasets such as those used in quantitative structure-activity relationship (QSAR) modeling. This study investigates improved binary sparrow bird search algorithm (BSSA) variants incorporating different transfer functions for variable selection in KRR. The performance of these variants was extensively evaluated on seven benchmark biopharmaceutical datasets with thousands of molecular descriptors, comparing their prediction accuracy, variable subset compactness, and computational cost against baseline KRR without variable selection. Results demonstrate that all BSSA variants significantly outperform KRR in terms of mean squared error (MSE) and coefficient of determination. The quadratic-BSSA (Q-BSSA) variant consistently achieved the best predictive performance, reducing MSE by up to 30% and increasing the coefficient of determination to values above 0.95 on several datasets while selecting the fewest variables, reflecting effective and parsimonious variable selection. Furthermore, BSSA variants substantially decreased the computational time required for model training compared to KRR, with Q-BSSA exhibiting the lowest runtime across datasets. Statistical validation using the Wilcoxon signed-rank test confirmed that all BSSA variants provided statistically significant improvements over KRR. The findings highlight the efficacy of sophisticated binary metaheuristic algorithms for variable selection in kernel-based models, underscoring their potential in computational chemistry and related domains where high-dimensionality and nonlinear interactions complicate predictive modeling.
[1] H. Zhu, T. M. Martin, L. Ye, A. Sedykh, D. M. Young, and A. Tropsha. (2009). "Quantitative Structure-Activity Relationship Modeling of Rat Acute Toxicity by Oral Exposure". Chemical Research in Toxicology. 22 (12): 1913-1921. 10.1021/tx900189p.
DOI: https://doi.org/10.1021/tx900189p[2] A. Al-Fakih, M. Qasim, Z. Algamal, A. Alharthi, and M. Zainal-Abidin. (2023). "QSAR Classification Model for Diverse Series of Antifungal Agents Based on Binary Coyote Optimization Algorithm". SAR and QSAR in Environmental Research. 34 (4): 285-298. 10.1080/1062936X.2023.2208374.
DOI: https://doi.org/10.1080/1062936X.2023.2208374[3] A. Al-Fakih, Z. Algamal, M. Lee, M. Aziz, and H. Ali. (2019). "QSAR Classification Model for Diverse Series of Antifungal Agents Based on Improved Binary Differential Search Algorithm". SAR and QSAR in Environmental Research. 30 (2): 131-143. 10.1080/1062936X.2019.1568298.
DOI: https://doi.org/10.1080/1062936X.2019.1568298[4] Z. Y. Algamal, M. K. Qasim, M. H. Lee, and H. T. M. Ali. (2020). "High-Dimensional QSAR/QSPR Classification Modeling Based on Improving Pigeon Optimization Algorithm". Chemometrics and Intelligent Laboratory Systems. 206 : 104170. 10.1016/j.chemolab.2020.104170.
DOI: https://doi.org/10.1016/j.chemolab.2020.104170[5] Z. Y. Algamal and M. H. Lee. (2017). "A Novel Molecular Descriptor Selection Method in QSAR Classification Model Based on Weighted Penalized Logistic Regression". Journal of Chemometrics. 31 (10). 10.1002/cem.2915.
DOI: https://doi.org/10.1002/cem.2915[6] A. Al-Fakih, Z. Algamal, and M. Qasim. (2022). "An Improved Opposition-Based Crow Search Algorithm for Biodegradable Material Classification". SAR and QSAR in Environmental Research. 33 (5): 403-415. 10.1080/1062936X.2022.2064546.
DOI: https://doi.org/10.1080/1062936X.2022.2064546[7] B. Y. S. Li, L. F. Yeung, and K. T. Ko. (2015). "Indefinite Kernel Ridge Regression and Its Application on QSAR Modelling". Neurocomputing. 158 : 127-133. 10.1016/j.neucom.2015.01.060.
DOI: https://doi.org/10.1016/j.neucom.2015.01.060[8] J. Verma, V. M. Khedkar, and E. C. Coutinho. (2010). "3D-QSAR in Drug Design: A Review". Current Topics in Medicinal Chemistry. 10 (1): 95-115. 10.2174/156802610790232260.
DOI: https://doi.org/10.2174/156802610790232260[9] Z. Y. Algamal, M. H. Lee, A. M. Al-Fakih, and M. Aziz. (2017). "High-Dimensional QSAR Classification Model for Anti-Hepatitis C Virus Activity of Thiourea Derivatives Based on the Sparse Logistic Regression Model with a Bridge Penalty". Journal of Chemometrics. 31 (6). 10.1002/cem.2889.
DOI: https://doi.org/10.1002/cem.2889[10] Z. Algamal, M. Qasim, and H. Ali. (2017). "A QSAR Classification Model for Neuraminidase Inhibitors of Influenza A Viruses (H1N1) Based on Weighted Penalized Support Vector Machine". SAR and QSAR in Environmental Research. 28 (5): 415-426. 10.1080/1062936X.2017.1326402.
DOI: https://doi.org/10.1080/1062936X.2017.1326402[11] Z. Y. Algamal, M. H. Lee, and A. M. Al-Fakih. (2016). "High-Dimensional Quantitative Structure-Activity Relationship Modeling of Influenza Neuraminidase A/PR/8/34 (H1N1) Inhibitors Based on a Two-Stage Adaptive Penalized Rank Regression". Journal of Chemometrics. 30 (2): 50-57. 10.1002/cem.2766.
DOI: https://doi.org/10.1002/cem.2766[12] Z. Y. Algamal, M. H. Lee, A. M. Al-Fakih, and M. Aziz. (2015). "High-Dimensional QSAR Prediction of Anticancer Potency of Imidazo[4,5-b]Pyridine Derivatives Using Adjusted Adaptive LASSO". Journal of Chemometrics. 29 (10): 547-556. 10.1002/cem.2741.
DOI: https://doi.org/10.1002/cem.2741[13] M. Qasim, Z. Algamal, and H. M. Ali. (2018). "A Binary QSAR Model for Classifying Neuraminidase Inhibitors of Influenza A Viruses (H1N1) Using the Combined Minimum Redundancy Maximum Relevancy Criterion with the Sparse Support Vector Machine". SAR and QSAR in Environmental Research. 29 (7): 517-527. 10.1080/1062936X.2018.1491414.
DOI: https://doi.org/10.1080/1062936X.2018.1491414[14] A. Tropsha. (2010). "Best Practices for QSAR Model Development, Validation, and Exploitation". Molecular Informatics. 29 (6-7): 476-488. 10.1002/minf.201000061.
DOI: https://doi.org/10.1002/minf.201000061[15] B. B. Hazarika and D. Gupta. (2023). "Affinity-Based Fuzzy Kernel Ridge Regression Classifier for Binary Class Imbalance Learning". Engineering Applications of Artificial Intelligence. 117 : 105544. 10.1016/j.engappai.2022.105544.
DOI: https://doi.org/10.1016/j.engappai.2022.105544[16] L. Zhang and P. N. Suganthan. (2017). "Benchmarking Ensemble Classifiers with Novel Co-Trained Kernel Ridge Regression and Random Vector Functional Link Ensembles". IEEE Computational Intelligence Magazine. 12 (4): 61-72. 10.1109/MCI.2017.2742867.
DOI: https://doi.org/10.1109/MCI.2017.2742867[17] K. Rakesh and P. N. Suganthan. (2017). "An Ensemble of Kernel Ridge Regression for Multi-Class Classification". Procedia Computer Science. 108 : 375-383. 10.1016/j.procs.2017.05.109.
DOI: https://doi.org/10.1016/j.procs.2017.05.109[18] S. An, W. Liu, and S. Venkatesh. (2007). "Face Recognition Using Kernel Ridge Regression". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2007). 10.1109/CVPR.2007.383105
DOI: https://doi.org/10.1109/CVPR.2007.383105[19] C. Yan, Y. Li, W. Liu, M. Li, J. Chen, and L. Wang. (2020). "An Artificial Bee Colony-Based Kernel Ridge Regression for Automobile Insurance Fraud Identification". Neurocomputing. 393 : 115-125. 10.1016/j.neucom.2017.12.072.
DOI: https://doi.org/10.1016/j.neucom.2017.12.072[20] C. Saunders, A. Gammerman, and V. Vovk. (1998). "Ridge Regression Learning Algorithm in Dual Variables". Proceedings of the 15th International Conference on Machine Learning (ICML 1998).
[21] P. Exterkate, P. J. Groenen, C. Heij, and D. van Dijk. (2016). "Nonlinear Forecasting with Many Predictors Using Kernel Ridge Regression". International Journal of Forecasting. 32 (3): 736-753. 10.1016/j.ijforecast.2015.11.017.
DOI: https://doi.org/10.1016/j.ijforecast.2015.11.017[22] P. S. Kumar, S. Laha, and L. Kumaraswamidhas. (2023). "Assessment of Rolling Element Bearing Degradation Based on Dynamic Time Warping, Kernel Ridge Regression and Support Vector Regression". Applied Acoustics. 208 : 109389. 10.1016/j.apacoust.2023.109389.
DOI: https://doi.org/10.1016/j.apacoust.2023.109389[23] Y. Özüpak, F. Alpsalaz, and E. Aslan. (2025). "Air Quality Forecasting Using Machine Learning: Comparative Analysis and Ensemble Strategies for Enhanced Prediction". Water, Air, & Soil Pollution. 236 (7): 464. 10.1007/s11270-025-08122-8.
DOI: https://doi.org/10.1007/s11270-025-08122-8[24] E. Aslan. (2024). "Prediction and Comparative Analysis of Emissions from Gas Turbines Using Random Search Optimization and Different Machine Learning-Based Algorithms". Bulletin of the Polish Academy of Sciences: Technical Sciences. 72 (6): e151956. 10.24425/bpasts.2024.151956.
DOI: https://doi.org/10.24425/bpasts.2024.151956[25] M. A. Kahya, S. A. Altamir, and Z. Y. Algamal. (2020). "Improving Whale Optimization Algorithm for Feature Selection with a Time-Varying Transfer Function". Numerical Algebra, Control and Optimization. 11 (1): 87-98. 10.3934/naco.2020017.
DOI: https://doi.org/10.3934/naco.2020017[26] Z. Algamal, M. Lee, A. Al-Fakih, and M. Aziz. (2016). "High-Dimensional QSAR Modelling Using Penalized Linear Regression Model with L1/2-Norm". SAR and QSAR in Environmental Research. 27 (9): 703-719. 10.1080/1062936X.2016.1228696.
DOI: https://doi.org/10.1080/1062936X.2016.1228696[27] E. Aslan and Y. Özüpak. (2024). "Detection of Road Extraction from Satellite Images with Deep Learning Method". Cluster Computing. 28 (1): 72. 10.1007/s10586-024-04880-y.
DOI: https://doi.org/10.1007/s10586-024-04880-y[28] Y. Andu, M. H. Lee, and Z. Y. Algamal. (2020). "Variable Selection of Yearly High-Dimension Stock Market Price Using Ordered Homogenous Pursuit LASSO". AIP Conference Proceedings. 10.1063/5.0019161
DOI: https://doi.org/10.1063/5.0019161[29] N. A. Al-Thanoon, O. S. Qasim, and Z. Y. Algamal. (2025). "Improving the Binary Tree Growth Algorithm with Fuzzy Mutual Information for Feature Selection". 10.1063/5.0286531
DOI: https://doi.org/10.1063/5.0286531[30] Z. A. Almishlih, O. S. Qasim, and Z. Y. Algamal. (2025). "Binary Arithmetic Optimization Algorithm Using a New Transfer Function for Fusion Modeling". Fusion: Practice and Applications. 18 (2): 157-168. 10.54216/FPA.180212.
DOI: https://doi.org/10.54216/FPA.180212[31] G. Y. I. Abdallh and Z. Y. Algamal. (2020). "A QSAR Classification Model of Skin Sensitization Potential Based on Improving Binary Crow Search Algorithm". Electronic Journal of Applied Statistical Analysis. 13 (1).
[32] A. Al-Fakih, Z. Algamal, M. Lee, M. Aziz, and H. Ali. (2019). "A QSAR Model for Predicting Antidiabetic Activity of Dipeptidyl Peptidase-IV Inhibitors by Enhanced Binary Gravitational Search Algorithm". SAR and QSAR in Environmental Research. 30 (6): 403-416. 10.1080/1062936X.2019.1607899.
DOI: https://doi.org/10.1080/1062936X.2019.1607899[33] A. A. Ewees, M. A. Al-Qaness, L. Abualigah, Z. Y. Algamal, D. Oliva, D. Yousri, and M. A. Elaziz. (2023). "Enhanced Feature Selection Technique Using Slime Mould Algorithm: A Case Study on Chemical Data". Neural Computing and Applications. 35 (4): 3307-3324. 10.1007/s00521-022-07852-8.
DOI: https://doi.org/10.1007/s00521-022-07852-8[34] S. Geman, E. Bienenstock, and R. Doursat. (1992). "Neural Networks and the Bias/Variance Dilemma". Neural Computation. 4 (1): 1-58. 10.1162/neco.1992.4.1.1.
DOI: https://doi.org/10.1162/neco.1992.4.1.1[35] Z. Zhang, G. Dai, C. Xu, and M. I. Jordan. (2010). "Regularized Discriminant Analysis, Ridge Regression and Beyond". Journal of Machine Learning Research. 11 : 2199-2228.
DOI: https://doi.org/10.5772/217[36] C. Orsenigo and C. Vercellis. (2012). "Kernel Ridge Regression for Out-of-Sample Mapping in Supervised Manifold Learning". Expert Systems with Applications. 39 (9): 7757-7762. 10.1016/j.eswa.2012.01.060.
DOI: https://doi.org/10.1016/j.eswa.2012.01.060[37] S. Zhang, Q. Hu, Z. Xie, and J. Mi. (2015). "Kernel Ridge Regression for General Noise Model with Its Application". Neurocomputing. 149 : 836-846. 10.1016/j.neucom.2014.07.051.
DOI: https://doi.org/10.1016/j.neucom.2014.07.051[38] S. Mahmood and Z. Algamal. (2025). "Kernel Ridge Regression Improving Based on Golden Eagle Optimization Algorithm for Multi-Class Classification". Statistics, Optimization & Information Computing. 15 (1): 354-371. 10.19139/soic-2310-5070-2569.
DOI: https://doi.org/10.19139/soic-2310-5070-2569[39] Y. Zhang, J. Duchi, and M. Wainwright. (2013). "Divide and Conquer Kernel Ridge Regression". Proceedings of the Conference on Learning Theory.
[40] F. A. Y. Al-Taie, O. S. Qasim, and Z. Y. Algamal. (2024). "Improving Kernel Semi-Parametric Regression Model Based on a Bat Optimization Algorithm". 10.1063/5.0202591
DOI: https://doi.org/10.1063/5.0202591[41] M. Ali, R. Prasad, Y. Xiang, and Z. M. Yaseen. (2020). "Complete Ensemble Empirical Mode Decomposition Hybridized with Random Forest and Kernel Ridge Regression Model for Monthly Rainfall Forecasts". Journal of Hydrology. 584 : 124647. 10.1016/j.jhydrol.2020.124647.
DOI: https://doi.org/10.1016/j.jhydrol.2020.124647[42] Y. You, J. Demmel, C. J. Hsieh, and R. Vuduc. (2018). "Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems". Proceedings of the 2018 International Conference on Supercomputing. 10.1145/3205289.3205290
DOI: https://doi.org/10.1145/3205289.3205290[43] J. Naik, R. Bisoi, and P. Dash. (2018). "Prediction Interval Forecasting of Wind Speed and Wind Power Using Modes Decomposition Based Low Rank Multi-Kernel Ridge Regression". Renewable Energy. 129 : 357-383. 10.1016/j.renene.2018.05.031.
DOI: https://doi.org/10.1016/j.renene.2018.05.031[44] J. Xue and B. Shen. (2020). "A Novel Swarm Intelligence Optimization Approach: Sparrow Search Algorithm". Systems Science & Control Engineering. 8 (1): 22-34. 10.1080/21642583.2019.1708830.
DOI: https://doi.org/10.1080/21642583.2019.1708830[45] L. Sun, S. Si, W. Ding, J. Xu, and Y. Zhang. (2023). "BSSFS: Binary Sparrow Search Algorithm for Feature Selection". International Journal of Machine Learning and Cybernetics. 14 (8): 2633-2657. 10.1007/s13042-023-01788-8.
DOI: https://doi.org/10.1007/s13042-023-01788-8[46] L. Lian. (2024). "An Improved Sparrow Search Algorithm Using Chaotic Opposition-Based Learning and Hybrid Updating Rules". Concurrency and Computation: Practice and Experience. 36 (14). 10.1002/cpe.8101.
DOI: https://doi.org/10.1002/cpe.8101[47] S. W. Mahmood, G. T. Basheer, and Z. Y. Algamal. (2025). "Quantitative Structure-Activity Relationship Modeling Based on Improving Kernel Ridge Regression". Journal of Chemometrics. 39 (5). 10.1002/cem.70027.
DOI: https://doi.org/10.1002/cem.70027[48] S. W. Mahmood, G. T. Basheer, and Z. Y. Algamal. (2025). "Improving Kernel Ridge Regression for Medical Data Classification Based on Meta-Heuristic Algorithms". Kuwait Journal of Science. 52 (3): 100408. 10.1016/j.kjs.2025.100408.
DOI: https://doi.org/10.1016/j.kjs.2025.100408[49] Z. Y. Algamal. (2018). "A New Method for Choosing the Biasing Parameter in Ridge Estimator for Generalized Linear Model". Chemometrics and Intelligent Laboratory Systems. 183 : 96-101. 10.1016/j.chemolab.2018.10.014.
DOI: https://doi.org/10.1016/j.chemolab.2018.10.014[50] M. A. Kahya, S. A. Altamir, and Z. Y. Algamal. (2019). "Improving Firefly Algorithm-Based Logistic Regression for Feature Selection". Journal of Interdisciplinary Mathematics. 22 (8): 1577-1581. 10.1080/09720502.2019.1706861.
DOI: https://doi.org/10.1080/09720502.2019.1706861[51] Z. A. Almishlih, O. S. Qasim, and Z. Y. Algamal. (2025). "Design and Evaluation of a New Tent-Shaped Transfer Function Using the Polar Lights Optimizer Algorithm for Feature Selection". Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska. 15 (2): 27-31. 10.35784/iapgos.6802.
DOI: https://doi.org/10.35784/iapgos.6802[52] A. G. Gad, K. M. Sallam, R. K. Chakrabortty, M. J. Ryan, and A. A. Abohany. (2022). "An Improved Binary Sparrow Search Algorithm for Feature Selection in Data Classification". Neural Computing and Applications. 34 (18): 15705-15752. 10.1007/s00521-022-07203-7.
DOI: https://doi.org/10.1007/s00521-022-07203-7[53] M. Eklund, U. Norinder, S. Boyer, and L. Carlsson. (2012). "Benchmarking Variable Selection in QSAR". Molecular Informatics. 31 (2): 173-179. 10.1002/minf.201100142.
DOI: https://doi.org/10.1002/minf.201100142[54] E. Baş and G. Yildizdan. (2023). "Enhanced coati optimization algorithm for big data optimization problem". Neural Processing Letters. 55 (8): 10131-10199.
DOI: https://doi.org/10.1007/s11063-023-11321-1[55] J. Derrac, S. García, D. Molina, and F. Herrera. (2011). "A Practical Tutorial on the Use of Nonparametric Statistical Tests as a Methodology for Comparing Evolutionary and Swarm Intelligence Algorithms". Swarm and Evolutionary Computation. 1 (1): 3-18. 10.1016/j.swevo.2011.02.002.
DOI: https://doi.org/10.1016/j.swevo.2011.02.002[56] J. Ma, D. Xia, H. Guo, Y. Wang, X. Niu, Z. Liu, and S. Jiang. (2022). "Metaheuristic-based support vector regression for landslide displacement prediction: A comparative study". Landslides. 19 (10): 2489-2511.
DOI: https://doi.org/10.1007/s10346-022-01923-6