Logo

SQL Server HYPGEOM_DIST Function

Updated 2023-11-03 14:17:48.713000

Description

Use the scalar function HYPGEOM_DIST to calculate the probability mass function or the cumulative distributive function of the hypergeometric distribution. The hypergeometric distribution is generally described using colored balls in an urn. Given an urn containing N balls of which K are white (and N-K are black), the hypergeometric distribution calculates the probability of drawing k white balls from a sample of n without replacement (meaning that once a ball is removed from the urn it is not put back).

Syntax

SELECT [westclintech].[wct].[HYPGEOM_DIST](
  <@Sample_s, float,>
 ,<@Number_sample, float,>
 ,<@Population_s, float,>
 ,<@Number_population, float,>
 ,<@Cumulative, bit,>)

Arguments

@Sample_s

the number of successes in the sample. @Sample_s must be of a type float or of type that intrinsically converts to float.

@Number_sample

the size of the sample. @Number_sample must be of a type float or of type that intrinsically converts to float.

@Population_s

the number of successes in the population. @Population_s must be of a type float or of type that intrinsically converts to float.

@Number_population

the size of the population. @Number_population must be of a type float or of a type that intrinsically converts to float.

@Cumulative

a bit value identifying whether the probability mass function ( 'False') or the cumulative distribution function ( 'True') is to be returned. @Cumulative is of a type bit or a type that implicitly converts to bit.

Return Type

float

Remarks

@Sample_s, @Number_sample, @Population_s, and @Number_population are truncated; only the integer part is used.

0 ≤ @Sample_s

0 ≤ @Population_s

0 < @Number_sample

0 < @Number_population

@Population_s ≤ @Number_population.

Examples

In an urn with 1000 balls of which 300 are white we want to calculate the probability of having exactly 40 white balls when we randomly draw 150 from the urn.

SELECT wct.HYPGEOM_DIST(   40,   --@Sample_s
                           150,  --@Number_sample
                           300,  --@Population_s
                           1000, --@Number_population
                           'False'
                       ) as pmf;

This produces the following result.

pmf
0.0492662218155091

If we wanted to calculate the probability of having up to 40 balls we would use the following SQL.

SELECT wct.HYPGEOM_DIST(   40,   --@Sample_s
                           150,  --@Number_sample
                           300,  --@Population_s
                           1000, --@Number_population
                           'True'
                       ) as cdf;

This produces the following result.

                    cdf 
 ---------------------- 
      0.192865931527232 

One way to look at the cumulative distribution function is as the sum of probability mass functions, as shown in this example.

SELECT *,
       SUM(hgeom) OVER (ORDER BY sample_s) as cumulative
FROM
(
    SELECT seq - 1 as sample_s,
           wct.HYPGEOM_DIST(SeriesValue, 150, 300, 1000, 'False') hgeom
    FROM wct.SeriesInt(0, 40, NULL, NULL, NULL)
) n;

This produces the following result.

sample_shgeomcumulative
02.46777221211099E-262.46777221211099E-26
12.01542195181487E-242.04009967393598E-24
28.13306733471617E-238.33707730210977E-23
32.16215070361945E-212.24552147664055E-21
44.25980765204366E-204.48435979970771E-20
56.63394045011636E-197.08237643008713E-19
68.50619912331109E-189.2144367663198E-18
79.23617455975085E-171.01576182363828E-16
88.66905192381243E-169.68481374745071E-16
97.14477378192844E-158.11325515667351E-15
105.2349502329707E-146.04627574863805E-14
113.44415782626197E-134.04878540112577E-13
122.05153119867592E-122.4564097387885E-12
131.11403273616817E-111.35967371004702E-11
145.54744847434262E-116.90712218438964E-11
152.54600065909309E-103.23671287753205E-10
161.08168479857148E-091.40535608632469E-09
174.27062939043597E-095.67598547676066E-09
181.57220480855479E-082.13980335623086E-08
195.4133523072449E-087.55315566347576E-08
201.74799045422862E-072.5033060205762E-07
215.30621737419239E-077.80952339476859E-07
221.51761021546781E-062.29856255494467E-06
234.09762818933358E-066.39619074427825E-06
241.04638837160682E-051.68600744603464E-05
252.53142274859012E-054.21743019462477E-05
265.81047389040933E-050.000100279040850341
270.0001267197258492720.000226998766699613
280.0002629215073091710.000489920274008785
290.0005196109402986060.00100953121430739
300.0009792277208814220.00198875893518881
310.001761533404948860.00375029234013768
320.003027730123810430.00677802246394811
330.004976814669961380.0117548371339095
340.007829911033183690.0195847481670932
350.01179974284932880.031384491016422
360.0170457510112420.0484302420276639
370.02362028281386130.0720505248415252
380.03141654067960450.10346706552113
390.04013264419064070.14359970971177
400.04926622181550910.192865931527279

See Also

HYPGEOM_INV - Inverse of the hypergeometric distribution

NEGBINOM_DIST - Negative binomial distribution