Logo

SQL Server COVARIANCE_S Function

Updated 2023-10-23 19:33:31.550000

Description

Use the aggregate function COVARIANCE_S to calculate the sample covariance for a set of x- and y-values. The formula for the sample covariance is

s_{xy} = \frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})}{n-1}

Syntax

SELECT [westclintech].[wct].[COVARIANCE_S] (
  <@known_y, float,>
 ,<@known_x, float,>)

Arguments

@known_y

the y-values to be used in the calculation. @Known_y is an expression of type float or of a type that can be implicitly converted to float.

@known_x

the x-values to be used in the calculation. @Known_x is an expression of type float or of a type that can be implicitly converted to float.

Return Type

float

Remarks

COVARIANCE_S is an AGGREGATE function and follows the same conventions as all other AGGREGATE functions in SQL Server.

Use COVARIANCE_P to calculate the population covariance.

Examples

In this example, we calculate the sample covariance for a single set of x- and y-values

SELECT wct.COVARIANCE_S(y, x) as COVARIANCE_S
FROM
(
    SELECT 0.75,
           1
    UNION ALL
    SELECT 2.5,
           2
    UNION ALL
    SELECT 6.75,
           3
    UNION ALL
    SELECT 10,
           4
) n(x, y);

This produces the following result

COVARIANCE_S
5.33333333333333

In this example, we will populate some temporary table with some historical financial information and then calculate the sample covariance. First, create the table and put some data in it:

CREATE TABLE #c
(
    SYM NVARCHAR(5),
    YE BIGINT,
    REV FLOAT,
    GPROF FLOAT,
    OPINC FLOAT,
    NETINC FLOAT
);
INSERT INTO #c
VALUES
('YHOO', 2009, 6460.32, 3588.57, 386.69, 597.99);
INSERT INTO #c
VALUES
('YHOO', 2008, 72.5, 4185.14, 12.96, 418.92);
INSERT INTO #c
VALUES
('YHOO', 2007, 6969.27, 4130.52, 695.41, 639.16);
INSERT INTO #c
VALUES
('YHOO', 2006, 6425.68, 3749.96, 940.97, 751.39);
INSERT INTO #c
VALUES
('YHOO', 2005, 5257.67, 3161.47, 1107.73, 1896.23);
INSERT INTO #c
VALUES
('GOOG', 2009, 23650.56, 14806.45, 8312.19, 6520.45);
INSERT INTO #c
VALUES
('GOOG', 2008, 21795.55, 13174.04, 5537.21, 4226.86);
INSERT INTO #c
VALUES
('GOOG', 2007, 16593.99, 9944.9, 54.44, 4203.72);
INSERT INTO #c
VALUES
('GOOG', 2006, 10604.92, 6379.89, 3550, 3077.45);
INSERT INTO #c
VALUES
('GOOG', 2005, 6138.56, 3561.47, 2017.28, 1465.4);
INSERT INTO #c
VALUES
('MSFT', 2010, 62484, 509, 24167, 18760);
INSERT INTO #c
VALUES
('MSFT', 2009, 58437, 46282, 21225, 14569);
INSERT INTO #c
VALUES
('MSFT', 2008, 60420, 48822, 22271, 17681);
INSERT INTO #c
VALUES
('MSFT', 2007, 51122, 40429, 18438, 14065);
INSERT INTO #c
VALUES
('MSFT', 2006, 44282, 36632, 16064, 12599);
INSERT INTO #c
VALUES
('ORCL', 2010, 26820, 21056, 9062, 6135);
INSERT INTO #c
VALUES
('ORCL', 2009, 23252, 18458, 8321, 5593);
INSERT INTO #c
VALUES
('ORCL', 2008, 22430, 17449, 7844, 5521);
INSERT INTO #c
VALUES
('ORCL', 2007, 17996, 13805, 5974, 4274);
INSERT INTO #c
VALUES
('ORCL', 2006, 14380, 11145, 4736, 3381);
INSERT INTO #c
VALUES
('SAP', 2009, 10672, 6980, 2588, 1748);
INSERT INTO #c
VALUES
('SAP', 2008, 11575, 7370, 2701, 1847);
INSERT INTO #c
VALUES
('SAP', 2007, 10256, 6631, 2698, 1906);
INSERT INTO #c
VALUES
('SAP', 2006, 9393, 6064, 2578, 1871);
INSERT INTO #c
VALUES
('SAP', 2005, 8509, 5460, 2337, 1496);

Now, calculate the population covariance of the revenue (REV) againt the year (YE) for each company (SYM)

SELECT #c.SYM
,wct.COVARIANCE_S(REV,YE) as COVARIANCE_S
FROM #c
GROUP BY SYM;

This produces the following result.

SYM             COVARIANCE_S
----- ----------------------
GOOG              11553.6575
MSFT                10929.75
ORCL                    7534
SAP                     1627
YHOO                 -986.97

Let's say we wanted to perform the same analysis as above, but we only want to return the results where the covariance is positive.

SELECT #c.SYM,
       wct.COVARIANCE_S(REV, YE) as COVARIANCE_S
FROM #c
GROUP BY SYM
HAVING wct.COVARIANCE_S(REV, YE) > 0;

This produces the following result.

SYM             COVARIANCE_S
----- ----------------------
GOOG              11553.6575
MSFT                10929.75
ORCL                    7534
SAP                     1627

In this example, we will calculate the correlation of the operating income (OPINC) against the revenue (REV)

SELECT #c.SYM,
       wct.COVARIANCE_S(OPINC, REV) as COVARIANCE_S
FROM #c
GROUP BY SYM;

This produces the following result.

SYMCOVARIANCE_S
GOOG15351903.708095
MSFT23798218
ORCL8568824.7
SAP147381
YHOO845158.97983