Logistic Regression Analysis Study Notes 4
Confounding and Collinearity for Logistic Regression in SAS
Download pdf •
SAS Linear Regression Analysis,
Linear Regression Analysis Study Notes
Some Statisticians may have experienced some strange logistic output: if we are runing logistic regression against
a simple predictor:
Proc logistic Data=A;
Model Y=X1 ;
We get the significant positive coefficient output.
However, if we are running a few more predictors together, using either stepwise or backward subset selection,
somehow we get a signicant negative coefficient output.
Proc logistic Data=A;
Model Y=X1 X2 X3 X4 X5 X6 X7 X8/ stepwise ;
Is that wired?! How do we explain this strange behavior?
First of all, it's very easy to get significance when you have large sample size; so don't be surprised when you find
two significant results, one for positive, one of negative! Even more,you can actually control some coeffeicient in logistic regression
to be always positive or negative.
This may relates with confounding issue. Here are the ways of systematically check the confounding issue for the model:
1. Look at various descriptive statistics
to get a feel for the data. For logistic
regression, this usually includes looking at descriptive statistics within "outcome
= yes = 1" versus "outcome = no = 0" groups.
2. The above "by outcome group
" descriptive statistics are often sufficient for
discrete covariates, but you may want to prepare some graphics for continuous
3. For all continuous variables being considered, calculate a correlation matrix
each variable against each other variable. This allows one to begin to investigate
possible confounding and collinearity.
4. Similarly, for each categorical/continous independent variable pair, look at the
values for the continuous variable in each category of the other variable.
5. Finally, create tables for all categorical/categorical independent variable pairs.
6. Perform a simple logistic regression for each independent variable
to investigate confounding (we will see in more detail next class), as well as
providing an initial "unadjusted" view of the importance of each variable, by
7. Think about any "interaction terms" that you may want to try in the model.
8. Perform some sort of model selection technique, or, often much better, think
about avoiding any strict model selection(stepwise,backward etc.) by finding a set of models that seem
to have something to contribute to overall conclusions.
9. Based on all work done, draw some inferences and conclusions. Carefully interpret
each estimated parameter, perform "model criticism", possibly repeating
some of the above steps (for example, run further models), as needed.
10. In regular regression e.g. Proc Reg, we can generate Variance Inflation Factors (VIFs) to check the multicollinearity,
How can we do this in logistic regression? In fact, You can check for collinearity for logistic regression the same way as
you would for linear regression i.e. just run a linear regression with the same predictors and dependant
you are using for the logistic model
proc logistic data=model_data descending ;
model resp_var = var1 var2 var3
output out=preddat pred=pred;
proc reg data=Preddat(where=(split=1)) plots=ResidualByPredicted noprint;
model pred=var1 var2 var3
output out=Preddat1 p=predicted student=stdres;
You are just running it to get the collinearity stats and then interpret these the same way.
Acknowledgement: The tutorial is based on the lecture notes from the famous epidemiology professor Joseph.
Continue to next:
Logistic Regression Study Notes 5: Proc StdRate for Descriptive Analytics
Stats 101 Home
SAS tutorial home
Interpret Odds ratio, ROC Curve, Concordant