EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

Logistic Regression Analysis Study Notes 4
Confounding and Collinearity for Logistic Regression in SAS

Download pdf • SAS Linear Regression Analysis, Linear Regression Analysis Study Notes

Some Statisticians may have experienced some strange logistic output: if we are runing logistic regression against a simple predictor:
```Proc logistic Data=A;
Model Y=X1 ;
Run;
```
We get the significant positive coefficient output. However, if we are running a few more predictors together, using either stepwise or backward subset selection, somehow we get a signicant negative coefficient output.
```Proc logistic Data=A;
Model Y=X1 X2 X3 X4 X5 X6 X7 X8/ stepwise ;
Run;
```
Is that wired?! How do we explain this strange behavior?

First of all, it's very easy to get significance when you have large sample size; so don't be surprised when you find two significant results, one for positive, one of negative! Even more,you can actually control some coeffeicient in logistic regression to be always positive or negative.

This may relates with confounding issue. Here are the ways of systematically check the confounding issue for the model:

1. Look at various descriptive statistics to get a feel for the data. For logistic regression, this usually includes looking at descriptive statistics within "outcome = yes = 1" versus "outcome = no = 0" groups.

2. The above "by outcome group" descriptive statistics are often sufficient for discrete covariates, but you may want to prepare some graphics for continuous variables.

3. For all continuous variables being considered, calculate a correlation matrix of each variable against each other variable. This allows one to begin to investigate possible confounding and collinearity.

4. Similarly, for each categorical/continous independent variable pair, look at the values for the continuous variable in each category of the other variable.

5. Finally, create tables for all categorical/categorical independent variable pairs.

6. Perform a simple logistic regression for each independent variable.
This begins to investigate confounding (we will see in more detail next class), as well as providing an initial "unadjusted" view of the importance of each variable, by itself.

7. Think about any "interaction terms" that you may want to try in the model.

8. Perform some sort of model selection technique, or, often much better, think about avoiding any strict model selection(stepwise,backward etc.) by finding a set of models that seem to have something to contribute to overall conclusions.

9. Based on all work done, draw some inferences and conclusions. Carefully interpret each estimated parameter, perform "model criticism", possibly repeating some of the above steps (for example, run further models), as needed.

10. In regular regression e.g. Proc Reg, we can generate Variance Inflation Factors (VIFs) to check the multicollinearity, How can we do this in logistic regression? In fact, You can check for collinearity for logistic regression the same way as you would for linear regression i.e. just run a linear regression with the same predictors and dependant you are using for the logistic model.
```
proc logistic data=model_data descending ;
class var1;
model resp_var = var1 var2 var3
/outroc=ROC1;
weight split;
output out=preddat pred=pred;
run; quit;

proc reg data=Preddat(where=(split=1)) plots=ResidualByPredicted noprint;
model pred=var1  var2 var3
/selection=forward vif;
output out=Preddat1 p=predicted student=stdres;
run; quit;
```
You are just running it to get the collinearity stats and then interpret these the same way.

Acknowledgement: The tutorial is based on the lecture notes from the famous epidemiology professor Joseph.

Continue to next: Logistic Regression Study Notes 5: Proc StdRate for Descriptive Analytics   Stats 101 Home   SAS tutorial home
Back to: Interpret Odds ratio, ROC Curve, Concordant   Regression Analysis   Experiment Design   Statistics tutorial