EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer

Study Notes 6a: Effective Variable Selection
in Logistic regression Based on Random Sampling

In this tutorial, we present a useful technique for an effective way of variable selection in Logistic regression based on random sampling. The idea seems very simple, just continue sampling and running logsitic regression based on the sample; But the results can tell you more interesting story about the variable importance.

proc logistic data=datain descending namelen=100;
		model dep_var = 	
		var1-var10  / selection=stepwise
 slentry=&slent /*threshold of entering a variable into model*/  

 slstay=&slst; /* the threshold of leaving the model */
weight split;
output out=dataout pred=pred;
In logistic regression, you can simply run the stepwise selection to all the datasets you have to get variables you want. In general, the result should tell you almost everything. However, in case of large dataset, it may take you quite some time to run logistic regression over the whole dataset, instead, you can run over a sample of it. The question will be, how can we make sure we get the representative sample? You actually don't need to worry about that, since you can just run the samples continuously.

%let varlist= var1-var10;
%macro var_filt(input, dep_var, nboots, bootsize, slent, slst, out);

 %do i=1 %to &nboots; 
 /* run the stepwise logistic regression nboots time */
     proc surveyselect method=srs data=&input out=boot&i
 /* generates a small sample for each run */

 proc logistic data=boot&i desc noprint outest=log_var_filt_&i ;
 model &dep_var=&varlist / selection=stepwise
 slentry=&slent /* threshold of entering a variable into model */                                                                                       
 slstay=&slst; /*  threshold of leaving the model */

 proc datasets nolist; 
 append data=log_var_filt_&i 
 base= &out force; /* append all the output files */

%mend var_filt;

options mprint mlogic spool;
proc sql; drop table out; quit;

%var_filt(Data1,tag , 20, 30000, 0.2, 0.1,dataout1 );

ods html file='var_selection.html'; 
title 'Variable Selection Logistic';

proc means data=dataout1 noprint; 
output out= subset; run; 

ods html;

proc print data=subset; run;
Continue to next: Logistic Regression Analysis Study Notes 6   Stats 101 Home   SAS tutorial home   Statistics tutorial
Back to: Logistic Regression Study Notes 5: Proc StdRate for Descriptive Analytics   Regression Analysis   Experiment Design