Logistic v. Linear: Choosing the right model for your study
Behind most important scientific theory, major discovery, or breakthrough, there are data and statistical analyses. Researchers are frequently responsible for ascertaining which of many statistical analyses to use to analyze data. If you pick the wrong analysis, or set your model up incorrectly, you have a big problem – you’re in some way misrepresenting your data. In “Logistic or Linear?”, Gomilla (2019) discusses the misuse of non-linear models (i.e., logistic model) in psychology, and the preferred alternative, linear models.
The analysis that researchers use depends on what form the variable being measured (i.e., dependent variable[RKM1] ) takes. If the outcome is a scale (e.g., confidence in verdict), a linear model is used. However, if there are only two outcome options (i.e., a dichotomous or binary variable; e.g., verdict – guilty or not guilty), logistic models are typically used. Logistic models are preferred because of the usefulness for prediction, the potential bias of predictors, and heteroscedasticity (unequal variance of error terms) of binary outcomes that violate the assumptions of linear models. Each of these advantages reflects something which linear models lack.
Although there are many advantages of logistic models, they are complicated and the results are also often misinterpreted. In mathematical terms, a linear model is based on a relatively simple formula (). Linear models provide clear probabilities and effect sizes. Logistic models, on the other hand, are more complicated. Logistic regression coefficients are expressed in terms of odds (), which are more difficult to interpret and requires additional analyses in order to transform into probabilities. It is particularly difficult to interpret interaction effects in logistic models. Fixed effects are also problematic for logistic models, because the analysis removes observations that do not vary in the outcome variable; linear regression better handles nested data that is often used in psychological research.
Psychologists are often interested in binary outcomes, but also frequently misinterpret logistic analyses. Thus, Gomilla (2019) assessed whether linear models can serve as a substitute for logistic models even with binary data. Gomilla compared the results of actual and simulation binary data using both linear and logistic models. Gomila found that that the p-values and average causal effects were similar in both models. However, the results from the linear models are much easier to interpret.
Gomila argues that, even with binary data, psychology researchers should opt for linear models. This argument concerns some statisticians, particularly because binary data violates many of the statistical assumptions for linear tests. Under this view, it is not appropriate to switch tests – instead, researchers should be better trained to interpret logistic results. One solution might be for researchers to analyze their data using both methods. If the results are similar, they can report the results from the more easily interpreted linear model (and provide the logistic model results in supplementary materials). If the results are not the same, they can use the statistically preferred logistic model, but ensure that they are properly interpreting the results!