Performance of variables selection in logistic regression: A comparison of LASSO, PLS, Information criterion and Significance base procedures
Keywords:
Data Analysis, Data Science, RMSE, , BiasAbstract
The selection of relevant variables in the face of a large number of covariates to make accurate predictions is becoming a common habit for statisticians and practitioners in general. This selection has been expanding in logistic regression due to its regular use in various studies. In this study, we had evaluated the performance of Lasso, PLS, information criteria and significance
basis of variable selection methods frequently used in the generalized linear model. The simulation study allows us to explore the performance of the methods in terms of variable selection and prediction. We took low, medium and high dimensional configurations and considered different cases of multicollinearity between covariates and different sample sizes. A real application using medium dimensional agronomic data is included in this study. It was found that Stepwise, AIC methods tend to select more variables than Lasso and PLS. In case of a large number of covariates, the PLS method selects better the variables. In summary, the PLS method would be effective if used when the sample size is equal to or close to the number of covariates and in cases of high correlation. The Lasso and AIC methods are suitable when the correlation is medium and low as is the Stepwise
method.
Downloads
Published
How to Cite
Edition
Sub-theme
License
Copyright (c) 2023 Azonvidé Hubert DOSSA, Dossou Seblodo Judes Charlemagne Gbemavo, Judicael Laly
This work is licensed under a Creative Commons Attribution 4.0 International License.