I was creating some scoring models and decided to look for some macros.
1) Here is a nice SAS Macro from Wensui’s blog at http://statcompute.spaces.live.com/blog/
Its particularly useful for Modelling chaps, I have seen a version of this Macro sometime back which had curves also plotted but this one is quite nice too
2) I then found some ROC macros for SAS, and one document for SPSS
3) I found a R package for ROC curves
and compared all three
I was creating some scoring models and decided to look for some macros.
1) Here is a nice SAS Macro from Wensui’s blog at http://statcompute.spaces.live.com/blog/
Its particularly useful for Modelling chaps, I have seen a version of this Macro sometime back which had curves also plotted but this one is quite nice too
2) I then found some ROC macros for SAS, and one document for SPSS
3) I found a R package for ROC curves
and compared all three
SAS MACRO TO CALCULATE GAINS CHART WITH KS
%macro ks(data = , score = , y = );
options nocenter mprint nodate;
data _tmp1;
set &data;
where &score ~= . and y in (1, 0);
random = ranuni(1);
keep &score &y random;
run;
proc sort data = _tmp1 sortsize = max;
by descending &score random;
run;
data _tmp2;
set _tmp1;
by descending &score random;
i + 1;
run;
proc rank data = _tmp2 out = _tmp3 groups = 10;
var i;
run;
proc sql noprint;
create table
_tmp4 as
select
i + 1 as decile,
count(*) as cnt,
sum(&y) as bad_cnt,
min(&score) as min_scr format = 8.2,
max(&score) as max_scr format = 8.2
from
_tmp3
group by
i;
select
sum(cnt) into :cnt
from
_tmp4;
select
sum(bad_cnt) into :bad_cnt
from
_tmp4;
quit;
data _tmp5;
set _tmp4;
retain cum_cnt cum_bcnt cum_gcnt;
cum_cnt + cnt;
cum_bcnt + bad_cnt;
cum_gcnt + (cnt – bad_cnt);
cum_pct = cum_cnt / &cnt;
cum_bpct = cum_bcnt / &bad_cnt;
cum_gpct = cum_gcnt / (&cnt – &bad_cnt);
ks = (max(cum_bpct, cum_gpct) – min(cum_bpct, cum_gpct)) * 100;
format cum_bpct percent9.2 cum_gpct percent9.2
ks 6.2;
label decile = ‘DECILE’
cnt = ‘#FREQ’
bad_cnt = ‘#BAD’
min_scr = ‘MIN SCORE’
max_scr = ‘MAX SCORE’
cum_gpct = ‘CUM GOOD%’
cum_bpct = ‘CUM BAD%’
ks = ‘KS’;
run;
title “%upcase(&score) KS”;
proc print data = _tmp5 label noobs;
var decile cnt bad_cnt min_scr max_scr cum_bpct cum_gpct ks;
run;
title;
proc datasets library = work nolist;
delete _: / memtype = data;
run;
quit;
%mend ks;
data test;
do i = 1 to 1000;
score = ranuni(1);
if score * 2 + rannor(1) * 0.3 > 1.5 then y = 1;
else y = 0;
output;
end;
run;
%ks(data = test, score = score, y = y);
/*
SCORE KS
MIN MAX
DECILE #FREQ #BAD SCORE SCORE CUM BAD% CUM GOOD% KS
1 100 87 0.91 1.00 34.25% 1.74% 32.51
2 100 78 0.80 0.91 64.96% 4.69% 60.27
3 100 49 0.69 0.80 84.25% 11.53% 72.72
4 100 25 0.61 0.69 94.09% 21.58% 72.51
5 100 11 0.51 0.60 98.43% 33.51% 64.91
6 100 3 0.40 0.51 99.61% 46.51% 53.09
7 100 1 0.32 0.40 100.00% 59.79% 40.21
8 100 0 0.20 0.31 100.00% 73.19% 26.81
9 100 0 0.11 0.19 100.00% 86.60% 13.40
10 100 0 0.00 0.10 100.00% 100.00% 0.00
*/
Its particularly useful for Modelling , I have seen a version of this Macro sometime back which had curves also plotted but this one is quite nice too.
Here is another example of a SAS Macro for ROC Curve and this one comes from http://www2.sas.com/proceedings/sugi22/POSTERS/PAPER219.PDF
APPENDIX A
Macro
/***************************************************************/;
/* MACRO PURPOSE: CREATE AN ROC DATASET AND PLOT */;
/* */;
/* VARIABLES INTERPRETATION */;
/* */;
/* DATAIN INPUT SAS DATA SET */;
/* LOWLIM MACRO VARIABLE LOWER LIMIT FOR CUTOFF */;
/* UPLIM MACRO VARIABLE UPPER LIMIT FOR CUTOFF */;
/* NINC MACRO VARIABLE NUMBER OF INCREMENTS */;
/* I LOOP INDEX */;
/* OD OPTICAL DENSITY */;
/* CUTOFF CUTOFF FOR TEST */;
/* STATE STATE OF NATURE */;
/* TEST QUALITATIVE RESULT WITH CUTOFF */;
/* */;
/* DATE WRITTEN BY */;
/* */;
/* 09-25-96 A. STEAD */;
/***************************************************************/;
%MACRO ROC(DATAIN,LOWLIM,UPLIM,NINC=20);
OPTIONS MTRACE MPRINT;
DATA ROC;
SET &DATAIN;
LOWLIM = &LOWLIM; UPLIM = &UPLIM; NINC = &NINC;
DO I = 1 TO NINC+1;
CUTOFF = LOWLIM + (I-1)*((UPLIM-LOWLIM)/NINC);
IF OD > CUTOFF THEN TEST=”R”; ELSE TEST=”N”;
OUTPUT;
END;
DROP I;
RUN;
PROC PRINT;
RUN;
PROC SORT; BY CUTOFF;
RUN;
PROC FREQ; BY CUTOFF;
TABLE TEST*STATE / OUT=PCTS1 OUTPCT NOPRINT;
RUN;
DATA TRUEPOS; SET PCTS1; IF STATE=”P” AND TEST=”R”;
TP_RATE = PCT_COL; DROP PCT_COL;
RUN;
DATA FALSEPOS; SET PCTS1; IF STATE=”N” AND TEST=”R”;
FP_RATE = PCT_COL; DROP PCT_COL;
RUN;
DATA ROC; MERGE TRUEPOS FALSEPOS; BY CUTOFF;
IF TP_RATE = . THEN TP_RATE=0.0;
IF FP_RATE = . THEN FP_RATE=0.0;
RUN;
PROC PRINT;
RUN;
PROC GPLOT DATA=ROC;
PLOT TP_RATE*FP_RATE=CUTOFF;
RUN;
%MEND;
VERSION 9.2 of SAS has a macro called %ROCPLOT http://support.sas.com/kb/25/018.html
SPSS also uses ROC curve and there is a nice document here on that
http://www.childrensmercy.org/stats/ask/roc.asp
Here are some examples from R with the package ROCR from
http://rocr.bioinf.mpi-sb.mpg.de/
Using ROCR’s 3 commands to produce a simple ROC plot:
pred <- prediction(predictions, labels)
perf <- performance(pred, measure = “tpr”, x.measure = “fpr”)
plot(perf, col=rainbow(10))
The graphics are outstanding in the R package and here is an example
Citation:
Tobias Sing, Oliver Sander, Niko Beerenwinkel, Thomas Lengauer.
ROCR: visualizing classifier performance in R.
Bioinformatics 21(20):3940-3941 (2005).