EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

Logistic Regression Analysis Study Notes 2-II
KS Interpretation/Calculation in SAS Logistic regression

From the previous tutorial, we know how to use ROC as one of the useful metrics to compare different models. Notice that ROC did not give us where is the best cut-off point to separate 0/1 for the response variable. We actually need to use K-S statistics to choose the best cut-off point.

### The Kolmogorov-Smirnoff(K-S) Statistic

The Kolmogorov-Smirnoff Statistic when used to measure the dicrimatory power of a score card, looks at how the distribution of the score differs among goods and bads. The Kolgomorov Stat measures the maximum point of separation between the CDF of two distributions. click to download paper Why AUC is a better measuer?"

The Graph above Shows the cummulative distributions of the observed goods and bads. The K-S Statistic is the maximum separation of these cdfs.

Here are some sample code for the KS calculation:
``` ...(from previous tutorial)
proc logistic data=Data1 descending
namelen=100  plots=roc
/*namelen can be also applied in other regression e.g. proc glm */
outest=Cov_betas covout;
model dep_var  (event='1') = &vars.
/CTABLE PPROB =(0 to 1 by .10);

weight split;
Output out=KSdata0 predicted=p_hat  xbeta=xbeta;
run;

%macro ks(indata,score,resp,split,outdat,print=Y);
%global ks;
proc sql;
create table ksdata1
as select distinct &resp, count(*) as count
from &indata where &split.=0
group by &resp. ;quit;

proc sql;
select count into: NUMGOOD from ksdata1 where &resp.=0;
select count into: NUMBAD from ksdata1 where &resp.=1;  quit;

proc sql;
create table ksdata2
as select *
from &indata where &split.=0
order by &score. desc ;quit;

data ksdata3;
set ksdata2;
num=_n_;
percgood=100*numgood/(&NUMGOOD);
call symput("KS",ks);
per=ceil(perc/.1)*.1;
run;
proc sort data=ksdata3;
by per;	run;

data &outdat;
set ksdata3;
by per;
if last.per;  	run;
title8 "ks=&ks";
proc sort data=&outdat out=ksdata4;
by &score; 	run;
%if "&PRINT"="Y" %then %do;
proc print data=&outdat;
run;
%end;
%mend ks;