Logo

SQL Server CORRM Function

Updated 2024-03-04 16:47:05.203000

Description

Use the table-valued function CORRM to calculate a correlation matrix. Given a matrix Am,n and a covariance matrix Cn,n = COVM(Am,n), then each element in correlation matrix Rn,n is calculated as:

R_{i,j}=\frac{C_{i,j}}{\sqrt{C_{i,i}}\times\sqrt{C_{j,j}}

For example:

A=\begin{bmatrix}1&-10&120&20\\2&20&30&-40\\3&-40&90&80\end{bmatrix}
C=\begin{bmatrix}1&-15&-15&30\\-15&900&-900&-1800\\-15&-900&2100&1800\\30&-1800&1800&3600\end{bmatrix}
R=\begin{bmatrix}\frac{1}{\sqrt{1}\times\sqrt{1}}&\frac{-15}{\sqrt{1}\times\sqrt{900}}&\frac{-15}{\sqrt{1}\times\sqrt{2100}}&\frac{30}{\sqrt{1}\times\sqrt{3600}}\\\\\frac{-15}{\sqrt{900}\times\sqrt{1}}&\frac{900}{\sqrt{900}\times\sqrt{900}}&\frac{-900}{\sqrt{900}\times\sqrt{2100}}&\frac{-1800}{\sqrt{900}\times\sqrt{3600}}\\\\\frac{-15}{\sqrt{2100}\times\sqrt{1}}&\frac{-900}{\sqrt{2100}\times\sqrt{2100}}&\frac{2100}{\sqrt{2100}\times\sqrt{2100}}&\frac{1800}{\sqrt{2100}\times\sqrt{3600}}\\\\\frac{30}{\sqrt{3600}\times\sqrt{3600}}&\frac{-1800}{\sqrt{3600}\times\sqrt{900}}&\frac{1800}{\sqrt{3600}\times\sqrt{2100}}&\frac{3600}{\sqrt{3600}\times\sqrt{3600}}\end{bmatrix}

Syntax

SELECT * FROM [westclintech].[wct].[CORRM](
  <@Matrix_RangeQuery, nvarchar(max),>
 ,<@Is3N, bit,>)

Arguments

@Matrix_RangeQuery

the SELECT statement, as text, used to return the input matrix for this function. The SELECT statement specifies the column names from the table or view or can be used to enter the matrix values directly. Data returned from the @Matrix_RangeQuery select must be of the type float or of a type that implicitly converts to float.

@Is3N

a bit value identifying the form for the resultant table returned by @Matrix_RangeQuery. Enter 'True' for a resultant table in 3rd normal form. Enter 'False' for a de-normalized table in 'spreadsheet' form.

Return Type

table

colNamecolDatatypecolDesc
RowNumintThe zero-based row index for the matrix
ColNumintThe zero-based column index for the matrix
ItemValuefloatThe value at RowNum, ColNum

Remarks

If @Is3N is NULL then @Is3N = 'False'.

Use MCORR for a matrix stored as a string.

If @Is3N is'True' then the result table should be returned as row, column, and value.

If the array contains NULL, then NULL will be returned.

The function returns an error if the array contains a non-numeric value.

If the supplied input is a vector, then result will be the (sample) variance.

Examples

In this example @Matrix_RangeQuery returns the matrix from a derived table embodied in the statement. The matrix is in spreadsheet form.

SELECT *
FROM wct.CORRM(
                  'SELECT
          x1,x2,x3,x4,x5,x6,x7,x8
        FROM (VALUES
           (1,1,1,1,1,1,1,1)
          ,(2,3,4,5,6,7,8,9)
          ,(4,9,16,25,36,49,64,81)
        )n(x1,x2,x3,x4,x5,x6,x7,x8)',
                  'False'
              );

This produces the following result.

RowNumColNumItemValue
001
010.995870594885822
020.989743318610787
030.98432413828809
040.9798637100972
050.97622103992743
060.973222701448379
070.970725343394151
100.995870594885822
111
120.998625428903524
130.996270962773436
140.993944095928862
150.991869783800371
160.990071896570823
170.988522467870286
200.989743318610787
210.998625428903524
221
230.999423797128766
240.998337488459583
250.997176464952738
260.996078416265654
270.995082098645899
300.98432413828809
310.996270962773436
320.999423797128766
331
340.999718640088218
350.999150742946594
360.998507503106759
370.997870827960502
400.9798637100972
410.993944095928862
420.998337488459583
430.999718640088218
441
450.999846989517886
460.999522026579879
470.999137118134003
500.97622103992743
510.991869783800371
520.997176464952738
530.999150742946594
540.999846989517886
551
560.999909873371905
570.999710773674355
600.973222701448379
610.990071896570823
620.996078416265654
630.998507503106759
640.999522026579879
650.999909873371905
661
670.99994354800767
700.970725343394151
710.988522467870286
720.995082098645899
730.997870827960502
740.999137118134003
750.999710773674355
760.99994354800767
771

Note that the results are returned in third-normal form. If we wanted to a more traditional (de-normalized) presentation of the results, we can us the PIVOT function. Also note that we can use * to select all the columns. We have rounded the results to 4 decimal place for ease of viewing.

SELECT ROUND([0], 4) as [0],
       ROUND([1], 4) as [1],
       ROUND([2], 4) as [2],
       ROUND([3], 4) as [3],
       ROUND([4], 4) as [4],
       ROUND([5], 4) as [5],
       ROUND([6], 4) as [6],
       ROUND([7], 4) as [7]
FROM
(
    SELECT *
    FROM wct.CORRM(
                      'SELECT
          *
        FROM (VALUES
           (1,1,1,1,1,1,1,1)
          ,(2,3,4,5,6,7,8,9)
          ,(4,9,16,25,36,49,64,81)
        )n(x1,x2,x3,x4,x5,x6,x7,x8)',
                      'False'
                  )
) d
PIVOT
(
    SUM(ItemValue)
    for ColNum in ([0], [1], [2], [3], [4], [5], [6], [7])
) as P;

This produces the following result.

01234567
10.99590.98970.98430.97990.97620.97320.9707
0.995910.99860.99630.99390.99190.99010.9885
0.98970.998610.99940.99830.99720.99610.9951
0.98430.99630.999410.99970.99920.99850.9979
0.97990.99390.99830.999710.99980.99950.9991
0.97620.99190.99720.99920.999810.99990.9997
0.97320.99010.99610.99850.99950.999910.9999
0.97070.98850.99510.99790.99910.99970.99991

Let's put several matrices into a table and calculate the correlation matrix for each. We will use CROSS APPLY to calculate to the correlation matrix for each matrix. Note that we have to convert the matrix identifier (which is defined as int) to a varchar to include it in the WHERE clause of @Matrix_RangeQuery.

CREATE TABLE #c
(
    Matrix int,
    rn int,
    x1 float,
    x2 float,
    x3 float,
    PRIMARY KEY (
                    Matrix,
                    rn
                )
);
INSERT INTO #c
VALUES
(100, 1, -11, -41, 36);
INSERT INTO #c
VALUES
(100, 2, -31, 41, -47);
INSERT INTO #c
VALUES
(100, 3, 48, -38, 33);
INSERT INTO #c
VALUES
(100, 4, 8, 44, -10);
INSERT INTO #c
VALUES
(101, 1, 39, 6, -7);
INSERT INTO #c
VALUES
(101, 2, 33, -49, 16);
INSERT INTO #c
VALUES
(101, 3, 14, 29, 13);
INSERT INTO #c
VALUES
(101, 4, 35, -38, -50);
INSERT INTO #c
VALUES
(101, 5, 9, -32, -25);
INSERT INTO #c
VALUES
(102, 1, 29, 49, -17);
INSERT INTO #c
VALUES
(102, 2, 35, 28, 28);
INSERT INTO #c
VALUES
(102, 3, -34, -29, -49);
INSERT INTO #c
VALUES
(102, 4, 0, -5, 0);
INSERT INTO #c
VALUES
(102, 5, -17, 14, 24);
INSERT INTO #c
VALUES
(102, 6, 44, 3, -23);
SELECT n.MATRIX,
       k.*
FROM
(SELECT DISTINCT MATRIX FROM #c) n
    CROSS APPLY wct.CORRM('SELECT
          x1,x2,x3
        FROM
          #c
       WHERE MATRIX = ' + cast(n.Matrix as varchar(max)) + ' ORDER by rn', 'False')
                 k;

This produces the following result.

MATRIXRowNumColNumItemValue
100001
10001-0.483356996796483
100020.633890426031688
10010-0.483356996796483
100111
10012-0.913359160170441
100200.633890426031688
10021-0.913359160170441
100221
101001
10101-0.230728768997873
10102-0.107306924177497
10110-0.230728768997873
101111
101120.374980979068184
10120-0.107306924177497
101210.374980979068184
101221
102001
102010.640154472423516
102020.23474788750615
102100.640154472423516
102111
102120.503721652740323
102200.23474788750615
102210.503721652740323
102221

In this example we calculate the correlation matrix from a derived table in 3rd normal form.

SELECT *
FROM wct.CORRM(
                  'SELECT
          rownum,colnum,itemvalue
        FROM (VALUES
               (0,0,1),(0,1,1),(0,2,1),(0,3,1),(0,4,1),(0,5,1),(0,6,1),(0,7,1)
              ,(1,0,2),(1,1,3),(1,2,4),(1,3,5),(1,4,6),(1,5,7),(1,6,8),(1,7,9)
              ,(2,0,4),(2,1,9),(2,2,16),(2,3,25),(2,4,36),(2,5,49),(2,6,64),(2,7,
                        81)
        )n(rownum,colnum,itemvalue)',
                  'True'
              );

This produces the following result.

RowNumColNumItemValue
001
010.995870594885822
020.989743318610787
030.98432413828809
040.9798637100972
050.97622103992743
060.973222701448379
070.970725343394151
100.995870594885822
111
120.998625428903524
130.996270962773436
140.993944095928862
150.991869783800371
160.990071896570823
170.988522467870286
200.989743318610787
210.998625428903524
221
230.999423797128766
240.998337488459583
250.997176464952738
260.996078416265654
270.995082098645899
300.98432413828809
310.996270962773436
320.999423797128766
331
340.999718640088218
350.999150742946594
360.998507503106759
370.997870827960502
400.9798637100972
410.993944095928862
420.998337488459583
430.999718640088218
441
450.999846989517886
460.999522026579879
470.999137118134003
500.97622103992743
510.991869783800371
520.997176464952738
530.999150742946594
540.999846989517886
551
560.999909873371905
570.999710773674355
600.973222701448379
610.990071896570823
620.996078416265654
630.998507503106759
640.999522026579879
650.999909873371905
661
670.99994354800767
700.970725343394151
710.988522467870286
720.995082098645899
730.997870827960502
740.999137118134003
750.999710773674355
760.99994354800767
771

See Also

MCORR - Correlation matrix using a formatted matrix as input

COVM - Table-valued function to calculate the covariance matrix

CORREL - Aggregate function to calculate the correlation coefficient

PEARSON - Aggregate function to calculate the correlation coefficient