Logo

SQL Server POLYFIT Function

Updated 2024-03-06 21:30:48.917000

Description

Use the table-valued function POLYFIT for calculating the coefficients of a polynomial p(x) of degree n that fits the x- and y-values supplied to the function. The result is a two-column table having n+1 rows, containing the polynomial coefficients in descending powers:

y=p_1x^n+p_2x^{n-1}+\dots+p_nx^1+p_{n+1}x^0

Syntax

SELECT * FROM [westclintech].[wct].[POLYFIT](
  <@TableName, nvarchar(max),>
 ,<@x_ColumnName, nvarchar(4000),>
 ,<@y_ColumnName, nvarchar(4000),>
 ,<@GroupedColumnName, nvarchar(4000),>
 ,<@Matrix_GroupedColumnValue, sql_variant,>
 ,<@n, int,>)

Arguments

@x_ColumnName

the name, as text, of the column in the table or view specified by @TableName that contains the x-values to be used in the POLYFIT calculation. Data returned from the @x_ColumnName must be of the type float or of a type that implicitly converts to float.

@y_ColumnName

the name, as text, of the column in the table or view specified by @TableName that contains the y-values to be used in the POLYFIT calculation. Data returned from the @y_ColumnName must be of the type float or of a type that implicitly converts to float.

@GroupedColumnName

the name, as text, of the column in the table or view specified by @TableName which will be used for grouping the results.

@n

an integer specifying the degree of the polynomial.

Return Type

table

colNamecolDatatypecolDesc
coe_numintThe index-number of the coeeficient
coe_valintThe value of the coefficient

Remarks

The x- and y-values are passed to the function as pairs

If x is NULL or y is NULL, the pair is not used in the calculation.

@n must be less than the number of rows in @TableName for the @GroupedColumnValue.

Use the POLYFIT_q function for more complicated queries.

Use the POLYVAL function to evaluate the polynomial for an x-value.

Examples

In this example, we will use the SeriesFloat function to generate a series of x-values equally spaced in the interval [0, 2.5] and then evaluate the error function, ERF, at those points. We will specify an approximating polynomial of 6 degrees.

SET NOCOUNT ON;
SELECT SeriesValue as x,
       wct.ERF(SeriesValue) as y
INTO #erf
FROM wct.SeriesFloat(0, 2.5, 0.1, NULL, NULL);
SELECT *
FROM wct.POLYFIT(   '#erf', --@TableName
                    'x',    --@x_ColumnName
                    'y',    --@y_ColumnName
                    '',     --@GroupedColumnName
                    NULL,   --@GroupedColumnValue
                    6       --@n
                );
DROP TABLE #erf;

This produces the following result.

coe_numcoe_val
70.000441173961986879
61.10644604462551
50.147104056632884
4-0.743462849129463
30.421736169817827
2-0.0982995753366962
10.00841937177922449

This means that there are 7 coefficients. We could run the following SQL to see the structure of the approximating polynomial.

SET NOCOUNT ON;
SELECT SeriesValue as x,
       wct.ERF(SeriesValue) as y
INTO #erf
FROM wct.SeriesFloat(0, 2.5, 0.1, NULL, NULL);
SELECT *,
       ROW_NUMBER() OVER (ORDER BY coe_num DESC) - 1 as pow
FROM wct.POLYFIT(   '#erf', --@TableName
                    'x',    --@x_ColumnName
                    'y',    --@y_ColumnName
                    '',     --@GroupedColumnName
                    NULL,   --@GroupedColumnValue
                    6       --@n
                );
DROP TABLE #erf;

This produces the following result.

coe_numcoe_valpow
70.0004411739619868790
61.106446044625511
50.1471040566328842
4-0.7434628491294633
30.4217361698178274
2-0.09829957533669625
10.008419371779224496

Thus, the approximating polynomial would be:

0.00841937x 6 - 0.0982996x5 + 0.421736x4 - 0.743463x3 + 0.147104x2 + 1.10645x + 0.000441174

We can inspect the fit of the approximating polynomial by creating a table of the x- and y-values and evaluating the approximating polynomial for each x and comparing it to the actual y-values. While we could do this using the output of the POLYFIT_q table-valued function, it is much simpler to just use the POLYVAL function.

SET NOCOUNT ON;
SELECT SeriesValue as x,
       wct.ERF(SeriesValue) as y
INTO #erf
FROM wct.SeriesFloat(0, 2.5, 0.1, NULL, NULL);
SELECT a.x,
       ROUND(a.y, 8) as y,
       ROUND(wct.POLYVAL(b.x, b.y, 6, a.x), 8) as f,
       ROUND(wct.POLYVAL(b.x, b.y, 6, a.x) - a.y, 8) as [f - y]
FROM #erf a,
     #erf b
GROUP BY a.x,
         a.y
ORDER BY 1;
DROP TABLE #erf;

This produces the following result.

xyff - y
000.000441170.00044117
0.10.112462920.11185456-0.00060836
0.20.222702590.2223107-0.00039189
0.30.328626760.328724199.743E-05
0.40.428392360.428798960.00040661
0.50.520499880.520925560.00042568
0.60.603856090.604084330.00022824
0.70.677801190.67775481-4.638E-05
0.80.742100960.74183105-0.00026992
0.90.796908210.79654307-0.00036515
10.842700790.84238439-0.0003164
1.10.880205070.88004559-0.00015948
1.20.910313980.91035393.992E-05
1.30.934007940.934218940.000211
1.40.952285120.952584450.00029933
1.50.966105150.966386120.00028097
1.60.976348380.976515430.00016704
1.70.983790460.98378963-8.3E-07
1.80.98909050.98892772-0.00016278
1.90.992790430.99253252-0.00025791
20.995322270.9950788-0.00024347
2.10.997020530.99690744-0.0001131
2.20.998137150.99822578.855E-05
2.30.998856820.999113550.00025673
2.40.999311490.999535990.00022451
2.50.999593050.99936154-0.00023151

It looks like the approximating polynomial is accurate to about 3 or 4 decimal places. In fact if we graphed the results, they would look like this:

http://westclintech.com/Portals/0/images/doc_math_POLYFIT_img2.jpg

We can see that in this range, [0, 2.5] the fit between the y-values and the f-values is quite good. However, what if we extend the range of the interval from 2.5 to 6? We will simply change SeriesFloat to stop at 6.0 rather than 2.5 and then graph the output.

SELECT SeriesValue as x,
       wct.ERF(SeriesValue) as y
INTO #erf
FROM wct.SeriesFloat(0, 6.0, 0.1, NULL, NULL);
SELECT a.x,
       a.y,
       wct.POLYVAL(b.x, b.y, 6, a.x) as f,
       wct.POLYVAL(b.x, b.y, 6, a.x) - a.y as [f - y]
FROM #erf a,
     #erf b
GROUP BY a.x,
         a.y
ORDER BY 1;

This produces the following graph.

http://westclintech.com/Portals/0/images/doc_math_POLYFIT_img3.jpg

It's important to note that coefficients in the interval [0, 6] are different than the coefficients in the interval [0, 2.5] because there were more x- and y-values passed to the function. What if we wanted to use the coefficients from the [0, 2.5] interval to predict the values in the [0, 6] interval?

SELECT SeriesValue as x,
       wct.ERF(SeriesValue) as y
INTO #erf
FROM wct.SeriesFloat(0, 2.5, 0.1, NULL, NULL);
SELECT SeriesValue as x,
       wct.ERF(SeriesValue) as y,
       wct.POLYVAL(b.x, b.y, 6, a.SeriesValue) as f,
       wct.POLYVAL(b.x, b.y, 6, a.SeriesValue) - wct.ERF(SeriesValue) as [f - y]
FROM wct.SeriesFloat(0, 6.0, 0.1, NULL, NULL) a ,
     #erf b
GROUP BY a.SeriesValue,
         wct.ERF(SeriesValue)
ORDER BY 1;
DROP TABLE #erf;

This produces the following graph.

http://westclintech.com/Portals/0/images/doc_math_POLYFIT_img4.jpg

As you can see, the polynomial approximation quickly diverges from the function in the region above 2.5.

Let's look at a different example. In this example we will look at some interest rate values over a time-horizon and calculate the 3rd degree polynomial approximation. The x-values represent the number of years and the y-values are the decimal values of the interest rate (.01 = 1%).

SET NOCOUNT ON;
SELECT *
INTO #y
from
(
    VALUES
        (1, 0.0028),
        (2, 0.0056),
        (3, 0.0085),
        (5, 0.0164),
        (7, 0.0235),
        (10, 0.0299),
        (20, 0.0382),
        (30, 0.0406)
) n (x, y);
SELECT *
FROM wct.POLYFIT('#y', 'x', 'y', '', NULL, 3);
DROP TABLE #y;

This produces the following result.

coe_numcoe_val
4-0.00325347316656217
30.0049703228323151
2-0.000198230933179599
12.70683303153525E-06

While we would always recommend using the POLYVAL function to calculate the polynomial approximation for any x-value, in this example we will use the results of the POLYFIT_q TVF to calculate the rates for each year from 1 through 30 out to 4 decimal places.

SET NOCOUNT ON;
SELECT *
INTO #y
from
(
    VALUES
        (1, 0.0028),
        (2, 0.0056),
        (3, 0.0085),
        (5, 0.0164),
        (7, 0.0235),
        (10, 0.0299),
        (20, 0.0382),
        (30, 0.0406)
) n (x, y);
SELECT *,
       ROW_NUMBER() OVER (order by coe_num DESC) - 1 as pow
into #coe
FROM wct.POLYFIT('#y', 'x', 'y', '', NULL, 3);
SELECT SeriesValue as y,
       ROUND(SUM(POWER(SeriesValue, pow) * coe_val), 4) as r
FROM wct.SeriesInt(1, 30, NULL, NULL, NULL) ,
     #coe
GROUP BY SeriesValue
ORDER BY 1;
DROP TABLE #coe;
DROP TABLE #y;

This produces the following result.

yr
10.0015
20.0059
30.0099
40.0136
50.017
60.02
70.0228
80.0252
90.0274
100.0293
110.031
120.0325
130.0338
140.0349
150.0358
160.0366
170.0373
180.0378
190.0382
200.0385
210.0388
220.039
230.0391
240.0393
250.0394
260.0395
270.0397
280.0399
290.0402
300.0405