Modeling Visualization Macros - SmartData Collective

I was creating some scoring models and decided to look for some macros.

Contents

SAS MACRO TO CALCULATE GAINS CHART WITH KS

1) Here is a nice SAS Macro from Wensui’s blog at http://statcompute.spaces.live.com/blog/

Its particularly useful for Modelling chaps, I have seen a version of this Macro sometime back which had curves also plotted but this one is quite nice too

2) I then found some ROC macros for SAS, and one document for SPSS

3) I found a R package for ROC curves

and compared all three

I was creating some scoring models and decided to look for some macros.

1) Here is a nice SAS Macro from Wensui’s blog at http://statcompute.spaces.live.com/blog/

Its particularly useful for Modelling chaps, I have seen a version of this Macro sometime back which had curves also plotted but this one is quite nice too

2) I then found some ROC macros for SAS, and one document for SPSS

3) I found a R package for ROC curves

and compared all three

SAS MACRO TO CALCULATE GAINS CHART WITH KS

%macro ks(data = , score = , y = );

options nocenter mprint nodate;

data _tmp1;
set &data;
where &score ~= . and y in (1, 0);
random = ranuni(1);
keep &score &y random;
run;

proc sort data = _tmp1 sortsize = max;
by descending &score random;
run;

data _tmp2;
set _tmp1;
by descending &score random;
i + 1;
run;

proc rank data = _tmp2 out = _tmp3 groups = 10;
var i;
run;

proc sql noprint;
create table
_tmp4 as
select
i + 1      as decile,
count(*)    as cnt,
sum(&y)    as bad_cnt,
min(&score) as min_scr format = 8.2,
max(&score) as max_scr format = 8.2
from
_tmp3
group by
i;

select
sum(cnt) into :cnt
from
_tmp4;

select
sum(bad_cnt) into :bad_cnt
from
_tmp4;
quit;

data _tmp5;
set _tmp4;
retain cum_cnt cum_bcnt cum_gcnt;
cum_cnt + cnt;
cum_bcnt + bad_cnt;
cum_gcnt + (cnt – bad_cnt);
cum_pct = cum_cnt / &cnt;
cum_bpct = cum_bcnt / &bad_cnt;
cum_gpct = cum_gcnt / (&cnt – &bad_cnt);
ks       = (max(cum_bpct, cum_gpct) – min(cum_bpct, cum_gpct)) * 100;

format cum_bpct percent9.2 cum_gpct percent9.2
         ks      6.2;

label decile    = ‘DECILE’
        cnt       = ‘#FREQ’
        bad_cnt   = ‘#BAD’
        min_scr   = ‘MIN SCORE’
        max_scr   = ‘MAX SCORE’
        cum_gpct = ‘CUM GOOD%’
        cum_bpct = ‘CUM BAD%’
        ks        = ‘KS’;
run;

title “%upcase(&score) KS”;
proc print data = _tmp5 label noobs;
var decile cnt bad_cnt min_scr max_scr cum_bpct cum_gpct ks;
run;
title;

proc datasets library = work nolist;
delete _: / memtype = data;
run;
quit;

%mend ks;

data test;
do i = 1 to 1000;
    score = ranuni(1);
    if score * 2 + rannor(1) * 0.3 > 1.5 then y = 1;
    else y = 0;
    output;
end;
run;

%ks(data = test, score = score, y = y);

/*
SCORE KS
                                MIN         MAX

DECILE #FREQ #BAD SCORE SCORE CUM BAD% CUM GOOD% KS

1 100 87 0.91 1.00 34.25% 1.74% 32.51

2 100 78 0.80 0.91 64.96% 4.69% 60.27

3 100 49 0.69 0.80 84.25% 11.53% 72.72

4 100 25 0.61 0.69 94.09% 21.58% 72.51

5 100 11 0.51 0.60 98.43% 33.51% 64.91

6 100 3 0.40 0.51 99.61% 46.51% 53.09

7 100 1 0.32 0.40 100.00% 59.79% 40.21

8 100 0 0.20 0.31 100.00% 73.19% 26.81

9 100 0 0.11 0.19 100.00% 86.60% 13.40

10 100 0 0.00 0.10 100.00% 100.00% 0.00

*/

Its particularly useful for Modelling , I have seen a version of this Macro sometime back which had curves also plotted but this one is quite nice too.

Here is another example of a SAS Macro for ROC Curve and this one comes from http://www2.sas.com/proceedings/sugi22/POSTERS/PAPER219.PDF

APPENDIX A
Macro
/***************************************************************/;
/* MACRO PURPOSE: CREATE AN ROC DATASET AND PLOT */;
/* */;
/* VARIABLES INTERPRETATION */;
/* */;
/* DATAIN INPUT SAS DATA SET */;
/* LOWLIM MACRO VARIABLE LOWER LIMIT FOR CUTOFF */;
/* UPLIM MACRO VARIABLE UPPER LIMIT FOR CUTOFF */;
/* NINC MACRO VARIABLE NUMBER OF INCREMENTS */;
/* I LOOP INDEX */;
/* OD OPTICAL DENSITY */;
/* CUTOFF CUTOFF FOR TEST */;
/* STATE STATE OF NATURE */;
/* TEST QUALITATIVE RESULT WITH CUTOFF */;
/* */;
/* DATE WRITTEN BY */;
/* */;
/* 09-25-96 A. STEAD */;
/***************************************************************/;
%MACRO ROC(DATAIN,LOWLIM,UPLIM,NINC=20);
OPTIONS MTRACE MPRINT;
DATA ROC;
SET &DATAIN;
LOWLIM = &LOWLIM; UPLIM = &UPLIM; NINC = &NINC;
DO I = 1 TO NINC+1;
CUTOFF = LOWLIM + (I-1)*((UPLIM-LOWLIM)/NINC);
IF OD > CUTOFF THEN TEST=”R”; ELSE TEST=”N”;
OUTPUT;
END;
DROP I;
RUN;
PROC PRINT;
RUN;
PROC SORT; BY CUTOFF;
RUN;
PROC FREQ; BY CUTOFF;
TABLE TEST*STATE / OUT=PCTS1 OUTPCT NOPRINT;
RUN;
DATA TRUEPOS; SET PCTS1; IF STATE=”P” AND TEST=”R”;
TP_RATE = PCT_COL; DROP PCT_COL;
RUN;
DATA FALSEPOS; SET PCTS1; IF STATE=”N” AND TEST=”R”;
FP_RATE = PCT_COL; DROP PCT_COL;
RUN;
DATA ROC; MERGE TRUEPOS FALSEPOS; BY CUTOFF;
IF TP_RATE = . THEN TP_RATE=0.0;
IF FP_RATE = . THEN FP_RATE=0.0;
RUN;
PROC PRINT;
RUN;
PROC GPLOT DATA=ROC;
PLOT TP_RATE*FP_RATE=CUTOFF;
RUN;
%MEND;

VERSION 9.2 of SAS has a macro called %ROCPLOT http://support.sas.com/kb/25/018.html

SPSS also uses ROC curve and there is a nice document here on that

http://www.childrensmercy.org/stats/ask/roc.asp

Here are some examples from R with the package ROCR from

http://rocr.bioinf.mpi-sb.mpg.de/

Using ROCR’s 3 commands to produce a simple ROC plot:
pred <- prediction(predictions, labels)
perf <- performance(pred, measure = “tpr”, x.measure = “fpr”)
plot(perf, col=rainbow(10))

The graphics are outstanding in the R package and here is an example

Citation:

Tobias Sing, Oliver Sander, Niko Beerenwinkel, Thomas Lengauer.
ROCR: visualizing classifier performance in R.
Bioinformatics 21(20):3940-3941 (2005).

More Read