Performance of variables selection in logistic regression: A comparison of LASSO, PLS, Information criterion and Significance base procedures

Authors

  • Azonvidé Hubert DOSSA Biostatistics and Modeling Unit/National University of Technical Sciences, Engineering and Mathematics of Abomey
  • Dossou Seblodo Judes Charlemagne Gbemavo Ecole Nationale Supérieure des Biosciences et Biotechnologies Appliquées, Université Nationale des Sciences, Technologies, Ingénierie et Mathématiques, Abomey, Benin
  • Judicael Laly

Keywords:

Data Analysis, Data Science, RMSE, , Bias

Abstract

The selection of relevant variables in the face of a large number of covariates to make accurate predictions is becoming a common habit for statisticians and practitioners in general. This selection has been expanding in logistic regression due to its regular use in various studies. In this study, we had evaluated the performance of Lasso, PLS, information criteria and significance
basis of variable selection methods frequently used in the generalized linear model. The simulation study allows us to explore the performance of the methods in terms of variable selection and prediction. We took low, medium and high dimensional configurations and considered different cases of multicollinearity between covariates and different sample sizes. A real application using medium dimensional agronomic data is included in this study. It was found that Stepwise, AIC methods tend to select more variables than Lasso and PLS. In case of a large number of covariates, the PLS method selects better the variables. In summary, the PLS method would be effective if used when the sample size is equal to or close to the number of covariates and in cases of high correlation. The Lasso and AIC methods are suitable when the correlation is medium and low as is the Stepwise
method.

Downloads

Download data is not yet available.

Published

2024-04-03

How to Cite

DOSSA, A. H., Gbemavo, D. S. J. C., & Laly, J. (2024). Performance of variables selection in logistic regression: A comparison of LASSO, PLS, Information criterion and Significance base procedures. Data Science and Artificial Intelligence. Retrieved from https://conferences.kabarak.ac.ke/index.php/dsai/article/view/166