EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.


Logistic Regression Analysis Study Notes 2-II
KS Interpretation/Calculation in SAS Logistic regression

From the previous tutorial, we know how to use ROC as one of the useful metrics to compare different models. Notice that ROC did not give us where is the best cut-off point to separate 0/1 for the response variable. We actually need to use K-S statistics to choose the best cut-off point.

The Kolmogorov-Smirnoff(K-S) Statistic

The Kolmogorov-Smirnoff Statistic when used to measure the dicrimatory power of a score card, looks at how the distribution of the score differs among goods and bads. The Kolgomorov Stat measures the maximum point of separation between the CDF of two distributions. click to download paper Why AUC is a better measuer?"

The Graph above Shows the cummulative distributions of the observed goods and bads. The K-S Statistic is the maximum separation of these cdfs.

Here are some sample code for the KS calculation:
 ...(from previous tutorial)
proc logistic data=Data1 descending  
  namelen=100  plots=roc 
  /*namelen can be also applied in other regression e.g. proc glm */
  outest=Cov_betas covout;
  model dep_var  (event='1') = &vars. 
  /CTABLE PPROB =(0 to 1 by .10); 


      weight split;  
   Output out=KSdata0 predicted=p_hat  xbeta=xbeta;  
   run;                                                   
 
%macro ks(indata,score,resp,split,outdat,print=Y);
	%global ks;
	proc sql;
create table ksdata1
as select distinct &resp, count(*) as count
from &indata where &split.=0
group by &resp. ;quit;

  proc sql;
select count into: NUMGOOD from ksdata1 where &resp.=0;
select count into: NUMBAD from ksdata1 where &resp.=1;  quit;

	proc sql;
create table ksdata2
as select *
from &indata where &split.=0
order by &score. desc ;quit;
	
	data ksdata3;	
	set ksdata2;
		retain numbad ks 0;
		num=_n_;
		numbad+&resp;
		numgood=num-numbad;
		perc=100*num/(&NUMGOOD+&NUMBAD);
		percgood=100*numgood/(&NUMGOOD);
		percbad=100*numbad/(&NUMBAD);
		probbad=100*numbad/num;
		ks=max(ks,percbad-percgood);
		call symput("KS",ks);
		per=ceil(perc/.1)*.1;
keep &score  num numbad numgood perc percbad percgood per probbad;
	run;
	proc sort data=ksdata3;	
		by per;	run;

	data &outdat;
		set ksdata3;
		by per;
		if last.per;  	run;
 	title8 "ks=&ks";
	proc sort data=&outdat out=ksdata4;
		by &score; 	run;
	%if "&PRINT"="Y" %then %do;
	proc print data=&outdat;
	run;
	%end;  
%mend ks;
		
%ks(KSdata0,pred,bad,split,outdat);    

Continue to next: Interpret Odds ratio, ROC Curve, Concordant   Stats 101 Home   SAS tutorial   Statistics tutorial
Back to: ROC/AUC Interpretation in SAS Logistic regression   Classic regression home     Experiment Design