EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.


Study Notes 5: Proc StdRate for Descriptive Analytics
A great procedure to check confounding in regression

In this tutorial, we present a classcial example of descriptive analytics, help identify the potential confounding issues in regression. Consider the prevous example we mentioned, the mortality rate in Florida, which is overall higher than in Alaska.

Before concluding Florida is a riskier place to live, one needs to consider confounding
factors such as age. Florida has a higher proportion of people of retirement age and older than does Alaska, and older people are more likely to die in any given interval of time. Therefore, one must "adjust" for age before drawing any conclusions.

Title "Proc StdRate: Descriptive Analytics about Confounding";
   data Alaska;
   State='Alaska';
   input Sex $ Age $ Death PYear comma9.  @@;
   datalines;
Male    00-14   37   81,205    Male    15-34   68   93,662
Male    35-54  206  108,615    Male    55-74  369   35,139
Male    75+    556    5,491    Female  00-14   78   77,203
Female  15-34  181   85,412    Female  35-54  395  100,386
Female  55-74  555   32,118    Female  75+    479    7,701
;run;

data Florida;
   State='Florida';
   input Sex $ Age $ Death comma8. PYear comma11. @@;
   datalines;
Male    00-14   1,189  1,505,889  Male    15-34   2,962  1,972,157
Male    35-54  10,279  2,197,912  Male    55-74  26,354  1,383,533
Male    75+    42,443    554,632  Female  00-14    906  1,445,831
Female  15-34   1,234  1,870,430  Female  35-54   5,630  2,246,737
Female  55-74  18,309  1,612,270  Female  75+    53,489    868,838
;run;

data US;
   input Sex $ Age $ PYear comma12. @@;
   datalines;
Male    00-14  30,854,207  Male    15-34  40,199,647
Male    35-54  40,945,028  Male    55-74  19,948,630
Male    75+     6,106,351  Female  00-14  29,399,168
Female  15-34  38,876,268  Female  35-54  41,881,451
Female  55-74  22,717,040  Female  75+    10,494,416
;run;

data TwoStates;
   length State $ 7.;
   set Alaska Florida;  run;              

proc logistic data=TwoStates;
class state age sex;
model Death/PYear =state/ outroc=rocout   lackfit rsq;
run;quit;

* You will get negative estimate for the state Alaska ;
* which is basically the z-test of group means ;
* This implies Florida has higher mortality rate; 

proc logistic data=TwoStates desc;
class state age sex;
model Death/PYear =sex state age sex*state  state*age 


      sex*age  state*sex*age/ outroc=rocout  lackfit rsq;
run;quit;

* You will get positive estimate for the state Alaska ;
* This implies Florida actually has lower mortality rate;
* which one is correct result? ;  
ods graphics on; proc stdrate data=TwoStates refdata=US method=direct stat=rate(mult=1000) effect plots(only)=(dist effect) ; population group=State event=Death total=PYear; reference total=PYear; strata Sex Age / effect; run; ods graphics off;
From Proc stdrate output, we can conclude that, State Florida actually has lower mortality rate than in Alaska.

Continue to next: Logistic Regression Analysis Study Notes 6: Effective Variable Selection   Stats 101 Home   SAS tutorial home
Back to: Confounding and Collinearity for Logistic Regression   Regression Analysis   Experiment Design   Statistics tutorial