Periódico de Acesso Aberto
4.8
Calculated on 05 May, 2025
0.31
Powered by scimagojr.com
Informações do autor
Informações do autor
Informações do autor
Multicollinearity is a violation of assumptions in multiple linear regression analysis that can occur if there is a high correlation between the independent variables. Likewise, the variants of multiple linear regression models such as the Geographically Weighted Regression model (GWR). Multicollinearity causes parameter estimation using the Quadratic Method (QM) unstable and produces a large variance. On the other hand, what is expected in the estimation parameters is an estimate with a minimum variance, even though it is biased. Thus, one way to overcome multicollinearity can be to use biased estimators, such as Ridge Regression (RR), Least Absolute Shrinkage and Selection Operator (LASSO), and Elastic Net (EN). In RR, the Least Square Method (LSM) coefficient is reduced to zero but it can’t select the independent variable. However, the parameter model obtained from the Ridge Regression is biased, and the variance of the resulting regression coefficients is relatively tiny. In addition, the RR is increasingly difficult to understand if a huge number of independent variables are used. Meanwhile, LASSO is a computational method that uses quadratic programming and can act out the RR principles and perform variable selection. The LASSO method became known after discovering the Least-Angle Regression (LARS) algorithm. The LASSO method can reduce the LSM coefficient to zero to perform variable selection. LASSO also has a weakness, so EN is used. In this article, the performance of the three methods is compared from the mathematical aspect. The performance of each is written as follows, RR is helpful for clustering effects, where collinear features can be selected together; LASSO is proper for feature selection when the dataset has features with poor predictive power and EN combines LASSO and RR, which has the potential to lead to simple and predictive models.
[1] P. Barua, S. H. Rahman, and M. H. Molla. (2022). "Analysis of Climate Change Induced Parameters of South-Eastern Coastal Islands of Bangladesh: Comparison from 1977 to 2017". Journal of Multidisciplinary Applied Natural Science. 2 (1): 47-57. 10.47352/jmans.2774-3047.107.
DOI: https://doi.org/10.47352/jmans.2774-3047.107[2] S. B. Mahfoud, H. E. Oirdi, E. H. E. Mouhab, N. Abdellahi, F. Ahmed, J. Mostafi, M. Maaroufi, S. Lotfi, K. E. Kharrim, and D. Belghyti. (2023). "Evaluation of The Integrated Protocol for The Management of Severe Malnutrition in Children at The National Hospital of Nouakchott-Mauritania". Journal of Multidisciplinary Applied Natural Science. 4 (1): 130-138. 10.47352/jmans.2774-3047.199.
DOI: https://doi.org/10.47352/jmans.2774-3047.199[3] C. G. Soh and Y. Zhu. (2022). "A sparse fused group lasso regression model for fourier-transform infrared spectroscopic data with application to purity prediction in olive oil blends". Chemometrics and Intelligent Laboratory Systems. 224. 10.1016/j.chemolab.2022.104530.
DOI: https://doi.org/10.1016/j.chemolab.2022.104530[4] N. T. Negero, G. F. Duressa, L. Rathour, and V. N. Mishra. (2023). "A novel fitted numerical scheme for singularly perturbed delay parabolic problems with two small parameters". Partial Differential Equations in Applied Mathematics. 8. 10.1016/j.padiff.2023.100546.
DOI: https://doi.org/10.1016/j.padiff.2023.100546[5] Q. Gao, Y. He, Z. Yuan, J. Zhao, B. Zhang, and F. Xue. (2011). "Gene- or region-based association study via kernel principal component analysis". BMC Genetics. 12 : 75. 10.1186/1471-2156-12-75.
DOI: https://doi.org/10.1186/1471-2156-12-75[6] J. Y.-L. Chan, S. M. H. Leow, K. T. Bea, W. K. Cheng, S. W. Phoong, Z.-W. Hong, and Y.-L. Chen. (2022). "Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review". Mathematics. 10 (8). 10.3390/math10081283.
DOI: https://doi.org/10.3390/math10081283[7] C. W. Beaver and J. F. Harbertson. (2016). "Comparison of Multivariate Regression Methods for the Analysis of Phenolics in Wine Made from TwoVitis viniferaCultivars". American Journal of Enology and Viticulture. 67 (1): 56-64. 10.5344/ajev.2015.15063.
DOI: https://doi.org/10.5344/ajev.2015.15063[8] R. Tibshirani. (1996). "Regression Shrinkage and Selection Via the Lasso". Journal of the Royal Statistical Society Series B: Statistical Methodology. 58 (1): 267-288. 10.1111/j.2517-6161.1996.tb02080.x.
DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x[9] S. Al-Shoukry, B. J. M. Jawad, Z. Musa, and A. H. Sabry. (2022). "Development of predictive modeling and deep learning classification of taxi trip tolls". Eastern-European Journal of Enterprise Technologies. 3 (3 (117)): 6-12. 10.15587/1729-4061.2022.259242.
DOI: https://doi.org/10.15587/1729-4061.2022.259242[10] Y. Xiong, W. Yang, H. Liao, Z. Gong, Z. Xu, Y. Du, and W. Li. (2022). "Soft variable selection combining partial least squares and attention mechanism for multivariable calibration". Chemometrics and Intelligent Laboratory Systems. 223. 10.1016/j.chemolab.2022.104532.
DOI: https://doi.org/10.1016/j.chemolab.2022.104532[11] U. R. V. Aires, D. D. D. Silva, E. I. Fernandes Filho, L. N. Rodrigues, E. M. Uliana, R. S. S. Amorim, C. B. M. Ribeiro, and J. A. Campos. (2022). "Modeling of surface sediment concentration in the Doce River basin using satellite remote sensing". Journal of Environmental Management. 323 : 116207. 10.1016/j.jenvman.2022.116207.
DOI: https://doi.org/10.1016/j.jenvman.2022.116207[12] L. Wang, S. Fang, Z. Pei, D. Wu, Y. Zhu, and W. Zhuo. (2022). "Developing machine learning models with multisource inputs for improved land surface soil moisture in China". Computers and Electronics in Agriculture. 192. 10.1016/j.compag.2021.106623.
DOI: https://doi.org/10.1016/j.compag.2021.106623[13] Ş. Çelik, T. Şengül, B. Söğüt, H. Inci, A. Y. Şengül, A. Kayaokay, and T. Ayaşan. (2018). "Analysis of Variables Affecting Carcass Weight of White Turkeys by Regression Analysis Based on Factor Analysis Scores and Ridge Regression". Brazilian Journal of Poultry Science. 20 (2): 273-280. 10.1590/1806-9061-2017-0574.
DOI: https://doi.org/10.1590/1806-9061-2017-0574[14] S. C. Basak and S. Majumdar. (2015). "Prediction of Mutagenicity of Chemicals from Their Calculated Molecular Descriptors: A Case Study with Structurally Homogeneous versus Diverse Datasets". Current Computer-Aided Drug Design. 11 (2): 117-23. 10.2174/1871524915666150722121322.
DOI: https://doi.org/10.2174/1871524915666150722121322[15] C. J. Ransom, N. R. Kitchen, J. J. Camberato, P. R. Carter, R. B. Ferguson, F. G. Fernández, D. W. Franzen, C. A. M. Laboski, D. B. Myers, E. D. Nafziger, J. E. Sawyer, and J. F. Shanahan. (2019). "Statistical and machine learning methods evaluated for incorporating soil and weather into corn nitrogen recommendations". Computers and Electronics in Agriculture. 164. 10.1016/j.compag.2019.104872.
DOI: https://doi.org/10.1016/j.compag.2019.104872[16] C. P. Herter, E. Ebmeyer, S. Kollers, V. Korzun, T. Wurschum, and T. Miedaner. (2019). "Accuracy of within- and among-family genomic prediction for Fusarium head blight and Septoria tritici blotch in winter wheat". Theoretical and Applied Genetics. 132 (4): 1121-1135. 10.1007/s00122-018-3264-6.
DOI: https://doi.org/10.1007/s00122-018-3264-6[17] T. Kusunoki, S. Hatanaka, M. Hariu, Y. Kusano, D. Yoshida, H. Katoh, M. Shimbo, and T. Takahashi. (2022). "Evaluation of prediction and classification performances in different machine learning models for patient-specific quality assurance of head-and-neck VMAT plans". Medical Physics. 49 (1): 727-741. 10.1002/mp.15393.
DOI: https://doi.org/10.1002/mp.15393[18] A. Anjum, A. A. Shaikh, and N. Tiwari. (2023). "Experimental investigations and modeling for multi-pass laser micro-milling by soft computing-physics informed machine learning on PMMA sheet using CO2 laser". Optics & Laser Technology. 158. 10.1016/j.optlastec.2022.108922.
DOI: https://doi.org/10.1016/j.optlastec.2022.108922[19] R. Carvalheiro, E. C. Pimentel, V. Cardoso, S. A. Queiroz, and L. A. Fries. (2006). "Genetic effects on preweaning weight gain of Nelore-Hereford calves according to different models and estimation methods". Journal of Animal Science. 84 (11): 2925-33. 10.2527/jas.2006-214.
DOI: https://doi.org/10.2527/jas.2006-214[20] J. Friedman, T. Hastie, and R. Tibshirani. (2008). "Sparse inverse covariance estimation with the graphical lasso". Biostatistics. 9 (3): 432-41. 10.1093/biostatistics/kxm045.
DOI: https://doi.org/10.1093/biostatistics/kxm045[21] A. Zöngür and M. A. Buzpinar. (2023). "AI-assisted antifungal discovery of Aspergillus parasiticus and Aspergillus flavus: investigating the potential of Asphodelus aestivus, Beta vulgaris, and Morus alba plant leaf extracts". Modeling Earth Systems and Environment. 9 (2): 2745-2756. 10.1007/s40808-022-01658-2.
DOI: https://doi.org/10.1007/s40808-022-01658-2[22] C. T. Beil, V. A. Anderson, A. Morgounov, and S. D. Haley. (2019). "Genomic selection for winter survival ability among a diverse collection of facultative and winter wheat genotypes". Molecular Breeding. 39 (2). 10.1007/s11032-018-0925-8.
DOI: https://doi.org/10.1007/s11032-018-0925-8[23] H. Zou and T. Hastie. (2005). "Regularization and Variable Selection Via the Elastic Net". Journal of the Royal Statistical Society Series B: Statistical Methodology. 67 (2): 301-320. 10.1111/j.1467-9868.2005.00503.x.
DOI: https://doi.org/10.1111/j.1467-9868.2005.00503.x[24] Ș. Kılıçoğlu and F. Yerlikaya-Özkurt. (2024). "A novel comparison of shrinkage methods based on multi criteria decision making in case of multicollinearity". Journal of Industrial and Management Optimization. 20 (12): 3816-3842. 10.3934/jimo.2024072.
DOI: https://doi.org/10.3934/jimo.2024072[25] H. Osman, M. Ghafari, and O. Nierstrasz. (2017). "Automatic feature selection by regularization to improve bug prediction accuracy". presented at the 2017 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE). 10.1109/MALTESQUE.2017.7882013.
DOI: https://doi.org/10.1109/MALTESQUE.2017.7882013[26] G. James, D. Witten, T. Hastie, R. Tibshirani, and J. Taylor. (2023). "An Introduction to Statistical Learning". 10.1007/978-3-031-38747-0.
DOI: https://doi.org/10.1007/978-3-031-38747-0[27] C. J. Greenwood, G. J. Youssef, P. Letcher, J. A. Macdonald, L. J. Hagg, A. Sanson, J. McIntosh, D. M. Hutchinson, J. W. Toumbourou, M. Fuller-Tyszkiewicz, and C. A. Olsson. (2020). "A comparison of penalised regression methods for informing the selection of predictive markers". PLoS One. 15 (11): e0242730. 10.1371/journal.pone.0242730.
DOI: https://doi.org/10.1371/journal.pone.0242730[28] I. Omar, M. Khan, and A. Starr. (2023). "Suitability Analysis of Machine Learning Algorithms for Crack Growth Prediction Based on Dynamic Response Data". Sensors (Basel). 23 (3). 10.3390/s23031074.
DOI: https://doi.org/10.3390/s23031074[29] E. M. Raouhi, M. Lachgar, and A. Kartit. (2022). In: "World Integrated Trade Solution 2020, (Lecture Notes in Electrical Engineering, ch. Chapter 22". 233-240. 10.1007/978-981-33-6893-4_22.
DOI: https://doi.org/10.1007/978-981-33-6893-4_22[30] K. Enwere, E. Nduka, and U. Ogoke. (2023). "Comparative Analysis of Ridge, Bridge and Lasso Regression Models In the Presence of Multicollinearity". IPS Intelligentsia Multidisciplinary Journal. 3 (1): 1-8. 10.54117/iimj.v3i1.5.
DOI: https://doi.org/10.54117/iimj.v3i1.5[31] A. Rajkomar, J. Dean, and I. Kohane. (2019). "Machine Learning in Medicine". The New England Journal of Medicine. 380 (14): 1347-1358. 10.1056/NEJMra1814259.
DOI: https://doi.org/10.1056/NEJMra1814259[32] W. E. Gilbraith, J. C. Carter, K. L. Adams, K. S. Booksh, and J. M. Ottaway. (2021). "Improving Prediction of Peroxide Value of Edible Oils Using Regularized Regression Models". Molecules. 26 (23). 10.3390/molecules26237281.
DOI: https://doi.org/10.3390/molecules26237281