EZ Study

Actuarial Biology Chemistry Economics Calculators Confucius Engineer
Physics
C.S.

Tutorial: Mahalanobis distance for outlier detection -1
How do you calculate Mahalanobis distance in SAS?

Mahalanobis distance: To get the Mahalanobis distance of each observation to the mean, first run PRINCOMP with the STD option to produce principal component
scores in the OUT=data set with an identity covariance matrix. (Hence Mahalanobis distance and Euclidean distance are the same for these scores.) Then use a DATA step with a statement such as : Mah_dist=sqrt (uss(of PRIN1-PRINn)); to complete the required distance.
   data points;
     drop i;
     do i=1 to 10;
       id=put(i,$9.);
       x=rannor(34343);
       y=rannor(12345);
       z=rannor(54321);
     output;
     end;     run;

    title2 'find Mahalanobis distance from each point to the mean';
    proc princomp data=points std out=out outstat=outstat noprint;
      var x y z;        run;
 
    data mahalanobis_to_mean; set out;
    mahalanobis_distance_to_mean = sqrt(uss(of prin:));      run;
 
    proc print data=mahalanobis_to_mean uniform noobs;
     id id;  run;     
To get the Mahalanobis distance of each observation to a specific point, compute the principal component score for that point using the original scoring coefficients. Then compute the Euclidean distance from each observation to the reference point. One easy way to do this is to use PROC FASTCLUS treating the reference point as the SEED.
     title2 'find Mahalanobis distance from each point to a reference';
     data reference_point;
     x=-1; y=-2; z=-2.5;
     id='reference';      run;
     
     proc score data=reference_point score=outstat out=reference_point;
     var x y z;      run;
     
     proc append data=out base=reference_point;      run;
     
     proc fastclus data=reference_point maxc=1 replace=none       
     maxiter=0 noprint  out=mahalanobis_to_point drop=cluster);
     var prin:;       run;
     
     proc print data=mahalanobis_to_point uniform noobs;
     id id;        run; 
To get Mahalanobis distances between all possible pairs, run PROC DISTANCE on the OUT= data set as created by PRINCOMP in the steps above.
     title2 'find Mahalanobis distances between all pairs of points';
     proc distance data=out out=distance_matrix;
	   var interval(prin:);
     id id;       run;
     
     proc print data=distance_matrix uniform noobs;      run;
      

Mahalanobis distances on a heat map

To show the heat map for the Mahalanobis distances between all pairs of points.
     /* 1 -- Find pairwise Mahalanobis distances */ 
  proc princomp data=sashelp.class std out=_1 noprint;
     var age weight height; run;

  proc distance data=_1 out=_2;
   var interval(prin:);
   id name; run;

    /* 2 -- Restructrue data */   
    data _3(where=(missing(distance)=0));
    set _2;
    array a[*] _numeric_;
    do i = 1 to dim(a);
      x = name;
      y = vlabel(a[i]);
      distance = a[i];
      output;     end;
     keep x y distance; run;

    data _4;
    set _3 _3(rename=(x=y y=x)); run;

     /* 3 -- Draw Mahalanobis distances on a heat map */  
    proc template;
    define statgraph heatmapparm;
    begingraph;
    layout overlay /
    xaxisopts=(label=" ") yaxisopts=(label=" " reverse=true) ;
    heatmapparm x = x y = y colorresponse = distance / name = "heatmap";
    continuouslegend "heatmap" / orient = vertical location = outside;
    endlayout;   endgraph;   end; run;

    ods html style = money;
    proc sgrender data=_4 template=heatmapparm; 
    run;  
Acknowledgement: The heat map tutorial is from famous SAS Blogger Charlie.

Continue to Mahalanobis distance for outlier detection -2       Statistics Tutorial Home
Back to 6. Elasticity: Theory and Application-3       SAS Tutorial Home