SQL Server ROCTable Function

Updated 2023-11-02 12:19:09.947000

Description

Use the table-valued function ROCTABLE to show the calculation of the area under the ROC curve. The function accepts either raw or grouped data as input.

The function has a single input parameter which is an SQL SELECT statement which, when executed, returns a resultant table where the first column is the predicted probabilities. For the raw case, the SQL will return one additional column, consisting of zeroes and ones indicating the absence (0) or presence (1) of the characteristic of interest. You may also think of these as indicating the failure (0) or success (1) of the observation.

For the grouped case, the input SQL will return two additional columns containing the count of failures and successes for a predicted probability.

The function returns a table (described below) sorted by ascending predicted probability which calculates the True Positive Rate, the False Positive Rate, and the area under the ROC curve (AUROC). This is the same value returned by the LOGIT and LOGITSUM functions.

Syntax

SELECT * FROM [westclintech].[wct].[ROCTable](
   <@Matrix_RangeQuery, nvarchar(max),>)

Arguments

@Matrix_RangeQuery

the SELECT statement, as a string, which, when executed, creates the resultant table of predicted probabilities and Y values.

Return Type

table

colName	colDatatype	colDesc
idx	int	a unique identifier for the row identifying its positon in the resultant table
ppred	float	predicted probability
failure	int	for raw data, the count of the number of rows for the predicted probability having a value of 0. For grouped data, the sum of the second column passed into the function grouped by predicted probability
success	int	for raw data, the sum of the second column grouped by predicted probability. For grouped data, the sum of the third column passed into the function grouped by the predicted probability
cumfailure	int	the sum of failure for the current row and all preceding rows
cumsuccess	int	the sum of success for the current row and all preceding rows
FalsePositiveRate	float	cumfailure(idx) /cumfailure(idxmax)
TruePositiveRate	float	cumsuccess(idx) /cumsuccess(idxmax)
AUROC	float	[FalsePositiveRate(idx+1) – FalsePositiveRate(idx)] * TruePositiveRate(idx)
cumAUROC	float	the sum of AUROC for the current row and all preceding rows

Remarks

The first column returned by @Matrix_RangeQuery should contain the predicted probabilities where 0 <= predicted probability <= 1.

The resultant table returned by @Matrix_RangeQuery should return either 2 columns or 3 columns.

When the resultant table contains 2 columns the function assumes that the second column contains binary responses consisting of zero (0) or (1); the use of other values will produce unreliable results.

When the resultant table contains 3 columns the function assumes that the second column contains a count of the failures or absences and the third column contains a count of the successes or presence.

Examples

In this example we use the same data as was used in the LOGIT documentation, consisting of the Coronary Heart Disease data from Applied Logistic Regression, Third Edition by David W. Hosmer, Jr., Stanley Lemeshow, and Rodney X. Sturdivant . The data consist of a single independent variable (age) and an outcome (chd) which indicates the absence (0) or presence (1) of coronary heart disease. We will put the data into a temporary table, run the logistic regression, use the coefficients from the logistic to create the predicted probabilities and then produce the ROC table. Note that the AUROC value is actually returned by the LOGIT function; this example simply explains the calculation.

--Put the Hosmer data into the #chd table
SELECT *
INTO   #chd
  FROM (   VALUES (20, 0),
                  (23, 0),
                  (24, 0),
                  (25, 0),
                  (25, 1),
                  (26, 0),
                  (26, 0),
                  (28, 0),
                  (28, 0),
                  (29, 0),
                  (30, 0),
                  (30, 0),
                  (30, 0),
                  (30, 0),
                  (30, 0),
                  (30, 1),
                  (32, 0),
                  (32, 0),
                  (33, 0),
                  (33, 0),
                  (34, 0),
                  (34, 0),
                  (34, 1),
                  (34, 0),
                  (34, 0),
                  (35, 0),
                  (35, 0),
                  (36, 0),
                  (36, 1),
                  (36, 0),
                  (37, 0),
                  (37, 1),
                  (37, 0),
                  (38, 0),
                  (38, 0),
                  (39, 0),
                  (39, 1),
                  (40, 0),
                  (40, 1),
                  (41, 0),
                  (41, 0),
                  (42, 0),
                  (42, 0),
                  (42, 0),
                  (42, 1),
                  (43, 0),
                  (43, 0),
                  (43, 1),
                  (44, 0),
                  (44, 0),
                  (44, 1),
                  (44, 1),
                  (45, 0),
                  (45, 1),
                  (46, 0),
                  (46, 1),
                  (47, 0),
                  (47, 0),
                  (47, 1),
                  (48, 0),
                  (48, 1),
                  (48, 1),
                  (49, 0),
                  (49, 0),
                  (49, 1),
                  (50, 0),
                  (50, 1),
                  (51, 0),
                  (52, 0),
                  (52, 1),
                  (53, 1),
                  (53, 1),
                  (54, 1),
                  (55, 0),
                  (55, 1),
                  (55, 1),
                  (56, 1),
                  (56, 1),
                  (56, 1),
                  (57, 0),
                  (57, 0),
                  (57, 1),
                  (57, 1),
                  (57, 1),
                  (57, 1),
                  (58, 0),
                  (58, 1),
                  (58, 1),
                  (59, 1),
                  (59, 1),
                  (60, 0),
                  (60, 1),
                  (61, 1),
                  (62, 1),
                  (62, 1),
                  (63, 1),
                  (64, 0),
                  (64, 1),
                  (65, 1),
                  (69, 1)) n (age, chd);

--Run LOGIT and store the results in #mylogit
SELECT *
INTO   #mylogit
  FROM wct.LOGIT('SELECT age,chd FROM #chd', 2);

--Calculate the predicted probabilities for each row in #chd and store the
--predicted probability and the chd value in #t
SELECT wct.LOGITPRED('SELECT stat_val FROM #mylogit where stat_name = ''b'' ORDER BY idx', cast(age as varchar(max))) as [p predicted],
       chd as y
INTO   #t
  FROM #chd;

--Run the ROCTable function
SELECT *
  FROM wct.ROCTable('SELECT * FROM #t');

This produces the following result.

idx	ppred	failure	success	cumfailure	cumsuccess	FalsePositiveRate	TruePositiveRate	AUROC	cumAUROC
0	0.912464554564153	0	1	0	1	0	0.0232558139534884	0	0
1	0.869939152344419	0	1	0	2	0	0.0465116279069767	0.000815993472052223	0.000815993472052223
2	0.856865930676536	1	1	1	3	0.0175438596491228	0.0697674418604651	0	0.000815993472052223
3	0.842716220602683	0	1	1	4	0.0175438596491228	0.0930232558139535	0	0.000815993472052223
4	0.827449401763914	0	2	1	6	0.0175438596491228	0.13953488372093	0	0.000815993472052223
5	0.811032992880968	0	1	1	7	0.0175438596491228	0.162790697674419	0.00285597715218278	0.00367197062423501
6	0.793444615655287	1	1	2	8	0.0350877192982456	0.186046511627907	0	0.00367197062423501
7	0.774673993551717	0	2	2	10	0.0350877192982456	0.232558139534884	0.00407996736026112	0.00775193798449612
8	0.754724899971724	1	2	3	12	0.0526315789473684	0.27906976744186	0.00979192166462668	0.0175438596491228
9	0.733616953220639	2	4	5	16	0.087719298245614	0.372093023255814	0	0.0175438596491228
10	0.711387142595015	0	3	5	19	0.087719298245614	0.441860465116279	0.00775193798449612	0.0252957976336189
11	0.688090963392313	1	2	6	21	0.105263157894737	0.488372093023256	0	0.0252957976336189
12	0.663803041111905	0	1	6	22	0.105263157894737	0.511627906976744	0	0.0252957976336189
13	0.638617138505235	0	2	6	24	0.105263157894737	0.558139534883721	0.00979192166462668	0.0350877192982456
14	0.61264546440856	1	1	7	25	0.12280701754386	0.581395348837209	0.0101999184006528	0.0452876376988984
15	0.586017240033851	1	0	8	25	0.140350877192982	0.581395348837209	0.0101999184006528	0.0554875560995512
16	0.558876524531328	1	1	9	26	0.157894736842105	0.604651162790698	0.0212158302733578	0.076703386372909
17	0.531379353436951	2	1	11	27	0.192982456140351	0.627906976744186	0.011015911872705	0.087719298245614
18	0.503690295993513	1	2	12	29	0.210526315789474	0.674418604651163	0.0236638106895145	0.111383108935129
19	0.475978584473281	2	1	14	30	0.245614035087719	0.697674418604651	0.0122399020807834	0.123623011015912
20	0.448414004860464	1	1	15	31	0.263157894736842	0.720930232558139	0.0126478988168095	0.136270909832721
21	0.421162758975344	1	1	16	32	0.280701754385965	0.744186046511628	0.0261117911056712	0.162382700938392
22	0.394383510626178	2	2	18	34	0.315789473684211	0.790697674418605	0.0277437780497756	0.190126478988168
23	0.368223812328276	2	1	20	35	0.350877192982456	0.813953488372093	0.0428396572827417	0.23296613627091
24	0.342817076642784	3	1	23	36	0.403508771929825	0.837209302325581	0.0293757649938801	0.26234190126479
25	0.318280211425752	2	0	25	36	0.43859649122807	0.837209302325581	0.01468788249694	0.27702978376173
26	0.294711986717842	1	1	26	37	0.456140350877193	0.86046511627907	0.0150958792329661	0.292125662994696
27	0.272192148511754	1	1	27	38	0.473684210526316	0.883720930232558	0.0310077519379845	0.323133414932681
28	0.250781246560969	2	0	29	38	0.508771929824561	0.883720930232558	0.0310077519379845	0.354141166870665
29	0.230521103877386	2	1	31	39	0.543859649122807	0.906976744186046	0.0318237454100367	0.385964912280702
30	0.211435827131904	2	1	33	40	0.578947368421053	0.930232558139535	0.0326397388820889	0.418604651162791
31	0.193533240663126	2	0	35	40	0.614035087719298	0.930232558139535	0.0652794777641779	0.483884128926969
32	0.176806621582586	4	1	39	41	0.684210526315789	0.953488372093023	0.0334557323541412	0.51733986128111
33	0.161236617821071	2	0	41	41	0.719298245614035	0.953488372093023	0.0334557323541412	0.550795593635251
34	0.146793242543317	2	0	43	41	0.754385964912281	0.953488372093023	0.0836393308853529	0.634434924520604
35	0.121125053503268	5	1	48	42	0.842105263157895	0.976744186046512	0.0171358629130967	0.651570787433701
36	0.10980443546362	1	0	49	42	0.859649122807018	0.976744186046512	0.0342717258261934	0.685842513259894
37	0.0994221764013863	2	0	51	42	0.894736842105263	0.976744186046512	0.0342717258261934	0.720114239086087
38	0.0812484736598618	2	0	53	42	0.929824561403509	0.976744186046512	0.0171358629130966	0.737250101999184
39	0.0733437884100028	1	1	54	43	0.947368421052632	1	0.0175438596491229	0.754793961648307
40	0.0661527783012159	1	0	55	43	0.964912280701754	1	0.0175438596491228	0.77233782129743
41	0.0596214497281155	1	0	56	43	0.982456140350877	1	0.0175438596491229	0.789881680946553
42	0.0434787567488236	1	0	57	43	1	1	0	0.789881680946553

You can see from the table that the cumulative AUROC value is 0.789881680946553. This is the same as the value returned by LOGIT .

--Get the AUROC value from #mylogit
SELECT stat_val
  FROM #mylogit
 WHERE stat_name = 'AUROC';

This produces the following result.

stat_val
0.789881680946553

However, ROCTABLE does return the False Positive Rate and the True Positive Rate, which can be graphed using SSRS, Excel, or any tool that you prefer. In this example, I have simply copied the FalsePositiveRate and TruePositiveRate from ROCTABLE , pasted them into Excel and then produced the following graph.

It is worth noting that our input data consisted of 100 rows, yet ROCTABLE only returned 43 rows of data from the temporary table #t, even though we generated a predicted probability for all 100 rows. This is because there were not 100 unique predicted probabilities. We can get the number of unique predicted probabilities using the following SQL.

SELECT COUNT(DISTINCT [p predicted]) as [COUNT p predicted]
  FROM #t;

This produces the following result, which matches what was returned by ROCTable .

COUNT p predicted
43

As Hosmer points out in section 5.4.2 "let n₁ denote the number of subjects with y = 1 and n₀ denote the number of subjects with y = 0. We can then create n₁ x n₀ pairs; each subject with y = 1 , paired with each subject with y = 0 . Of these n₁ x n₀ pairs, we determine the proportion of the pairs where the subject with y = 1 had the higher of the two probabilities. This proportion may be shown to be equal to the area under the ROC Curve."

The technique that he is suggesting lends itself quite well to SQL and we can use it to check the AUROC calculation in both LOGIT and ROCTable . We would not recommend this calculation as a practical matter as it requires a Cartesian product; in this case 57 x 43 combinations.

--Calculate the area under the ROC Curve using a Cartesian product
;with mycte
as (SELECT n1.y as y1,
           n1.[p predicted] as p1,
           n0.y as y0,
           n0.[p predicted] as p0
    FROM #t n1,
         #t n0
    WHERE n1.y = 1
          AND n0.y = 0)
SELECT COUNT(m1.y1) / cast(n.pairs as float) As AUROC,
       n.pairs
FROM
(SELECT SUM(y1)FROM mycte) n(pairs) ,
mycte m1
WHERE m1.p1 > m1.p0
GROUP BY n.pairs;

This produces the following result.

AUROC	pairs
0.789881680946552	2451

Now, let's look at an example using grouped data. The data consist of 3 independent variables; x1, x2, and x3, and 2 additional columns; the number of successes for that combination of independent variables and the number of observations for that combination of independent variables.

--Put grouped data into a temporary table #x
SELECT *
INTO #x
FROM
(
    VALUES
        (100, 1, 10, 28, 156),
        (150, 1, 10, 33, 144),
        (200, 1, 10, 44, 171),
        (250, 1, 10, 56, 196),
        (300, 1, 10, 55, 158),
        (350, 1, 10, 44, 100),
        (400, 1, 10, 57, 126),
        (450, 1, 10, 77, 166),
        (500, 1, 10, 84, 166),
        (100, 2, 10, 23, 153),
        (150, 2, 10, 31, 165),
        (200, 2, 10, 40, 179),
        (250, 2, 10, 42, 152),
        (300, 2, 10, 55, 181),
        (350, 2, 10, 68, 200),
        (400, 2, 10, 59, 148),
        (450, 2, 10, 69, 156),
        (500, 2, 10, 75, 157),
        (100, 1, 11, 19, 164),
        (150, 1, 11, 23, 147),
        (200, 1, 11, 35, 182),
        (250, 1, 11, 46, 196),
        (300, 1, 11, 41, 143),
        (350, 1, 11, 60, 189),
        (400, 1, 11, 59, 162),
        (450, 1, 11, 75, 187),
        (500, 1, 11, 59, 129),
        (100, 2, 11, 9, 105),
        (150, 2, 11, 22, 179),
        (200, 2, 11, 30, 182),
        (250, 2, 11, 32, 155),
        (300, 2, 11, 41, 164),
        (350, 2, 11, 58, 200),
        (400, 2, 11, 60, 181),
        (450, 2, 11, 75, 199),
        (500, 2, 11, 59, 141)
) n (x1, x2, x3, success, N);
--Run LOGIT and store the results in #mylogit
SELECT *
INTO #mylogit
FROM wct.LOGITSUM('SELECT x1,x2,x3,success,n-success from #x', 4, 5);

--Calculate the predicted probabilities using LOGITPROB for each row in #x and store the
--predicted probability and the group totals in #t
SELECT wct.LOGITPROB(n.x, m.stat_val) as [p predicted],
       n - Success as failure,
       success as success
INTO #t
FROM #x
    CROSS APPLY
(
    VALUES
        (0, 1),
        (1, x1),
        (2, x2),
        (3, x3)
) n (idx, x)
    INNER JOIN #mylogit m
        ON m.idx = n.idx
WHERE m.stat_name = 'b'
GROUP BY n - Success,
         success;
--Run ROCTable function
SELECT *
FROM wct.ROCTable('SELECT * FROM #t');

This produces the following result.

idx	ppred	failure	success	cumfailure	cumsuccess	FalsePositiveRate	TruePositiveRate	AUROC	cumAUROC
0	0.540977960616935	82	84	82	84	0.019825918762089	0.0481927710843374	0.000955465964438023	0.000955465964438023
1	0.49876262766249	82	75	164	159	0.0396518375241779	0.0912220309810671	0.00196294989296784	0.00291841585740586
2	0.488565636437005	89	77	253	236	0.0611702127659574	0.135398737808376	0.00229156471145705	0.00520998056886291
3	0.460157630044192	70	59	323	295	0.0780947775628627	0.16924842226047	0.0035601094624422	0.00877009003130511
4	0.446462214823615	87	69	410	364	0.0991295938104449	0.208835341365462	0.00348395516301181	0.0122540451943169
5	0.436403529774778	69	57	479	421	0.115812379110251	0.241537578886976	0.00478870441700485	0.0170427496113218
6	0.418499071132844	82	59	561	480	0.13563829787234	0.275387263339071	0.00745729533219921	0.024500044943521
7	0.40860533932183	112	75	673	555	0.162717601547389	0.3184165232358	0.00685180623017075	0.0313518511736917
8	0.3953206788673	89	59	762	614	0.184235976789168	0.352266207687894	0.00476956180621907	0.0361214129799108
9	0.385611537599448	56	44	818	658	0.197775628626692	0.377510040160643	0.0113179992698065	0.0474394122497173
10	0.368428698069553	124	75	942	733	0.227756286266925	0.420539300057372	0.0104728113892431	0.0579122236389604
11	0.358987917507059	103	59	1045	792	0.252659574468085	0.454388984509466	0.0145017761013659	0.0724139997403263
12	0.346371616586007	132	68	1177	860	0.284574468085106	0.493402180149168	0.0122873366913357	0.084701336431662
13	0.337194277421765	103	55	1280	915	0.309477756286267	0.524956970740103	0.0153577837184605	0.100059120150122
14	0.32104154787904	121	60	1401	975	0.338733075435203	0.559380378657487	0.0174468251563868	0.117505945306509
15	0.312214767445495	129	60	1530	1035	0.369922630560928	0.593803786574871	0.0180897671925613	0.135595712499071
16	0.300471741826253	126	55	1656	1090	0.400386847195358	0.625358577165806	0.0211678435210863	0.156763556020157
17	0.291967312452626	140	56	1796	1146	0.434235976789168	0.657487091222031	0.022573299553561	0.179336855573718
18	0.277075424928281	142	58	1938	1204	0.468568665377176	0.690763052208835	0.0170352590244926	0.19637211459821
19	0.268978570102441	102	41	2040	1245	0.493230174081238	0.714285714285714	0.0189969604863222	0.215369075084533
20	0.258251135511103	110	42	2150	1287	0.519825918762089	0.738382099827883	0.0226727579009045	0.238041832985437
21	0.250513751624832	127	44	2277	1331	0.550531914893617	0.763625932300631	0.0227093785476252	0.260751211533062
22	0.237028393827286	123	41	2400	1372	0.58027079303675	0.78714859437751	0.0285474586935751	0.289298670226637
23	0.229729927067	150	46	2550	1418	0.616537717601547	0.813539873780838	0.0273409193557873	0.316639589582425
24	0.220096523923864	139	40	2689	1458	0.650145067698259	0.836488812392427	0.0224492887271662	0.339088878309591
25	0.213173748215327	111	33	2800	1491	0.676982591876209	0.855421686746988	0.0254392813031624	0.364528159612753
26	0.201158946047794	123	32	2923	1523	0.706721470019342	0.873780837636259	0.0310555568502249	0.395583716462978
27	0.194683131790574	147	35	3070	1558	0.742263056092843	0.8938611589214	0.0289597183983239	0.424543434861302
28	0.186164174671769	134	31	3204	1589	0.774661508704062	0.911646586345382	0.0282134340068203	0.452756868868122
29	0.180062272857074	128	28	3332	1617	0.805609284332689	0.927710843373494	0.0340938220968982	0.486850690965021
30	0.169511632771358	152	30	3484	1647	0.842359767891683	0.944922547332186	0.0283293993881023	0.515180090353123
31	0.163845663830216	124	23	3608	1670	0.872340425531915	0.95811818703385	0.0301149333448744	0.545295023697997
32	0.156414007523259	130	23	3738	1693	0.903771760154739	0.971313826735513	0.0368704716628325	0.58216549536083
33	0.141958450405941	157	22	3895	1715	0.941731141199226	0.983935742971888	0.0344948459214033	0.616660341282233
34	0.137061458881927	145	19	4040	1734	0.97678916827853	0.994836488812392	0.0230909823322025	0.639751323614436
35	0.118246213552031	96	9	4136	1743	1	1	0	0.639751323614436

You can see from the table that the cumulative AUROC value is 0.639751323614436. This is the same as the value returned by LOGITSUM .

--Get the AUROC value from #mylogit
SELECT stat_val
FROM #mylogit
WHERE stat_name = 'AUROC';

This produces the following result.

stat_val
0.639751323614436

We can modify our SQL slightly from the previous example in order to verify the calculation of the area under the ROC curve using the Cartesian product.

--Calculate the area under the ROC Curve using a Cartesian product
SELECT SUM(n.y0 * n.y1) / cast(p.pairs as float) as AUROC,
       p.pairs
FROM
(
    SELECT t1.success as y1,
           t1.[p predicted] as p1,
           t2.failure as y0,
           t2.[p predicted] as p0
    FROM #t t1,
         #t t2
) n ,
(SELECT SUM(success) * SUM(failure)FROM #t) p(pairs)
WHERE p1 > p0
GROUP BY p.pairs;

This produces the following result.

AUROC	pairs
0.639751323614436	7209048

Resources

SQL Server ROCTable Function

Description

Syntax

Arguments

@Matrix_RangeQuery

Return Type

Remarks

Examples

See Also