Confounding and Collinearity for Logistic Regression in SAS

Some Statisticians may have experienced some strange logistic output: if we are runing logistic regression against a simple predictor:

We get the significant positive coefficient output. However, if we are running a few more predictors together, using either stepwise or backward subset selection, somehow we get a signicant negative coefficient output.Proc logistic Data=A; Model Y=X1 ; Run;

Is that wired?! How do we explain this strange behavior?Proc logistic Data=A; Model Y=X1 X2 X3 X4 X5 X6 X7 X8/ stepwise ; Run;

First of all, it's very easy to get significance when you have large sample size; so don't be surprised when you find two significant results, one for positive, one of negative! Even more,you can actually control some coeffeicient in logistic regression to be always positive or negative.

This may relates with confounding issue. Here are the ways of systematically check the confounding issue for the model:

1. Look at various

2. The above "

3. For all continuous variables being considered, calculate a

4. Similarly, for each categorical/continous independent variable pair, look at the values for the continuous variable in each category of the other variable.

5. Finally, create tables for all categorical/categorical independent variable pairs.

6.

This begins to investigate confounding (we will see in more detail next class), as well as providing an initial "unadjusted" view of the importance of each variable, by itself.

7. Think about any "interaction terms" that you may want to try in the model.

8. Perform some sort of model selection technique, or, often much better, think about avoiding any strict model selection(stepwise,backward etc.) by finding a set of models that seem to have something to contribute to overall conclusions.

9. Based on all work done, draw some inferences and conclusions. Carefully interpret each estimated parameter, perform "model criticism", possibly repeating some of the above steps (for example, run further models), as needed.

10. In regular regression e.g. Proc Reg, we can generate Variance Inflation Factors (VIFs) to check the multicollinearity, How can we do this in logistic regression? In fact, You can check for collinearity for logistic regression the same way as you would for linear regression i.e.

proc logistic data=model_data descending ; class var1; model resp_var = var1 var2 var3 /outroc=ROC1; weight split; output out=preddat pred=pred; run; quit; proc reg data=Preddat(where=(split=1)) plots=ResidualByPredicted noprint; model pred=var1 var2 var3 /selection=forward vif; output out=Preddat1 p=predicted student=stdres; run; quit;You are just running it to get the collinearity stats and then interpret these the same way.

Acknowledgement: The tutorial is based on the lecture notes from the famous epidemiology professor Joseph.