Normal-inverse-gamma distribution¶

The probability density function of the normal-inverse gamma distribution \(\mathcal{N}\Gamma^{-1}(\mu, \lambda, \alpha, \beta)\) with location parameter \(\mu\), variance scale parameter \(\lambda > 0\), shape parameter \(\alpha > 0\) and scale parameter \(\beta > 0,\) for \(x \in \mathbb{R}\) and \(\sigma^2 \in \mathbb{R}^+\), is given by

\[f(x,\sigma^2; \mu,\lambda,\alpha,\beta) = \frac {\sqrt{\lambda}} {\sigma\sqrt{2\pi} } \, \frac{\beta^\alpha}{\Gamma(\alpha)} \, \left( \frac{1}{\sigma^2} \right)^{\alpha + 1} \exp \left( -\frac { 2\beta + \lambda(x - \mu)^2} {2\sigma^2} \right),\]

and the cumulative distribution function is

\[F(x,\sigma^2; \mu,\lambda,\alpha,\beta) = \frac{e^{-\frac{\beta}{\sigma^2}} \left(\frac{\beta }{\sigma ^2}\right)^\alpha \left(\operatorname{erf}\left(\frac{\sqrt{\lambda} (x-\mu )}{\sqrt{2} \sigma }\right)+1\right)}{2 \sigma^2 \Gamma (\alpha)}.\]

The expected value and variance are as follows

\[ \begin{align}\begin{aligned}\mathrm{E}[x] &= \mu, \quad \mathrm{E}[\sigma^2] = \frac{\beta}{\alpha-1}, \; \alpha > 1.\\\mathrm{Var}[x] &= \frac{\beta}{(\alpha - 1)\lambda}, \; \alpha > 1, \quad \mathrm{Var}[\sigma^2] = \frac{\beta^2}{(\alpha-1)^2(\alpha - 2)}, \; \alpha > 2.\end{aligned}\end{align} \]

The normal-inverse-gamma distribution is used as a conjugate prior distribution for the normal distribution with unknown mean and variance.

class cprior.cdist.NormalInverseGammaModel(name='', loc=0.001, variance_scale=0.001, shape=0.001, scale=0.001)¶

Bases: cprior.cdist.base.BayesModel

Normal-inverse-gamma prior distribution model.

Parameters:	name (str, optional (default="")) – Model name. loc (float, optional (default=0.001)) – Prior parameter location. variance_scale (float, optional (default=0.001)) – Prior parameter variance scale. shape (float, optional (default=0.001)) – Prior parameter shape. scale (float, optional (default=0.001)) – Prior parameter scale.

cdf(x, sig2)¶

Cumulative distribution function of the posterior distribution.

Parameters:	x (array-like) – Quantiles. sig2 (array-like) – Quantiles.
Returns:	cdf – Cumulative distribution function evaluated at (x, sig2).
Return type:	numpy.ndarray

credible_interval(interval_length)¶

Credible interval of the posterior distribution.

Parameters:	interval_length (float (default=0.9)) – Compute `interval_length`% credible interval. This is a value in [0, 1].
Returns:	interval – Lower and upper credible interval limits.
Return type:	tuple

loc_posterior¶

Posterior parameter mu (location).

Returns:	mu
Return type:	float

mean()¶

Mean of the posterior distribution.

Returns:	mean
Return type:	tuple of floats

pdf(x, sig2)¶

Probability density function of the posterior distribution.

Parameters:	x (array-like) – Quantiles. sig2 (array-like) – Quantiles.
Returns:	pdf – Probability density function evaluated at (x, sig2).
Return type:	numpy.ndarray

ppf(q)¶

Percent point function (quantile) of the posterior distribution.

Parameters:	x (array-like) – Lower tail probability.
Returns:	ppf – Quantile corresponding to the lower tail probability q.
Return type:	tuple of numpy.ndarray

rvs(size=1, random_state=None)¶

Random variates of the posterior distribution.

Parameters:	size (int (default=1)) – Number of random variates. random_state (int or None (default=None)) – The seed used by the random number generator.
Returns:	rvs – Random variates of given size (size, 2).
Return type:	numpy.ndarray

scale_posterior¶

Posterior parameter beta (scale).

Returns:	beta
Return type:	float

shape_posterior¶

Posterior parameter alpha (shape).

Returns:	alpha
Return type:	float

std()¶

Standard deviation of the posterior distribution.

Returns:	std
Return type:	tuple of floats

var()¶

Variance of the posterior distribution.

Returns:	var
Return type:	tuple of floats

variance_scale_posterior¶

Posterior parameter lambda (variance_scale).

Returns:	lambda
Return type:	float

class cprior.cdist.NormalInverseGammaABTest(modelA, modelB, simulations=1000000, random_state=None)¶

Bases: cprior.cdist.base.BayesABTest

Bayesian A/B testing with prior normal-inverse-gamma distribution.

Parameters:	modelA (object) – The normal-inverse-gamma model for variant A. modelB (object) – The normal-inverse-gamma model for variant B. simulations (int or None (default=1000000)) – Number of Monte Carlo simulations. random_state (int or None (default=None)) – The seed used by the random number generator.

expected_loss(method='exact', variant='A', lift=0)¶

Compute the expected loss. This is the expected uplift lost by choosing a given variant.

If variant == "A", \(\mathrm{E}[\max(B - A - lift, 0)]\)
If variant == "B", \(\mathrm{E}[\max(A - B - lift, 0)]\)
If variant == "all", both.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:	method (str (default="exact")) – The method of computation. Options are “exact” and “MC”. variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”. lift (float (default=0.0)) – The amount of uplift.
Returns:	expected_loss
Return type:	tuple of floats

Notes

Method “exact” uses the normal approximation of the Student’s t-distribution for the expected loss of the mean when the number of degrees of freedom is large. For small values, numerical intergration is used.

expected_loss_ci(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶

Compute credible intervals on the difference distribution of \(Z = B-A\) and/or \(Z = A-B\).

If variant == "A", \(Z = B - A\)
If variant == "B", \(Z = A - B\)
If variant == "all", both.

Parameters:	method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”. variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”. interval_length (float (default=0.9)) – Compute `interval_length`% credible interval. This is a value in [0, 1]. ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (`method="HDI`) and Equal-tailed interval (`method="ETI"`). Currently, `method="HDI` is only available for `method="MC"`.
Returns:	expected_loss_ci
Return type:	tuple of floats

expected_loss_relative(method='exact', variant='A')¶

Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift.

If variant == "A", \(\mathrm{E}[(B - A) / A]\)
If variant == "B", \(\mathrm{E}[(A - B) / B]\)
If variant == "all", both.

Parameters:	method (str (default="exact")) – The method of computation. Options are “exact” and “MC”. variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
Returns:	expected_loss_relative
Return type:	tuple of floats

Notes

Method “exact” uses an approximation of \(E[1/X]\) where \(X\) follows a Student’s t-distribution.

expected_loss_relative_ci(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶

Compute credible intervals on the relative difference distribution of \(Z = (B-A)/A\) and/or \(Z = (A-B)/B\).

If variant == "A", \(Z = (B-A)/A\)
If variant == "B", \(Z = (A-B)/B\)
If variant == "all", both.

Parameters:	method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”. variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”. interval_length (float (default=0.9)) – Compute `interval_length`% credible interval. This is a value in [0, 1]. ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (`method="HDI`) and Equal-tailed interval (`method="ETI"`). Currently, `method="HDI` is only available for `method="MC"`.
Returns:	expected_loss_relative_ci
Return type:	tuple of floats

Notes

Method “exact” uses the normal approximation of the Student’s t-distribution for the expected loss of the mean.

probability(method='exact', variant='A', lift=0)¶

Compute the error probability or chance to beat control.

If variant == "A", \(P[A > B + lift]\)
If variant == "B", \(P[B > A + lift]\)
If variant == "all", both.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:	method (str (default="exact")) – The method of computation. Options are “exact” and “MC”. variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”. lift (float (default=0.0)) – The amount of uplift.
Returns:	probability
Return type:	tuple of floats

Notes

Method “exact” uses the normal approximation of the Student’s t-distribution for the error probability of the mean when the number of degrees of freedom is large. For small values, numerical intergration is used.

update_A(data)¶

Update posterior parameters for variant A with new data samples.

Parameters:	data (array-like, shape = (n_samples)) –

update_B(data)¶

Update posterior parameters for variant B with new data samples.

Parameters:	data (array-like, shape = (n_samples)) –

class cprior.cdist.NormalInverseGammaMVTest(models, simulations=1000000, random_state=None, n_jobs=None)¶

Bases: cprior.cdist.base.BayesMVTest

Bayesian Multivariate testing with prior normal-inverse-gamma distribution.

Parameters:	models (object) – The normal-inverse-gamma models. simulations (int or None (default=1000000)) – Number of Monte Carlo simulations. random_state (int or None (default=None)) – The seed used by the random number generator.

expected_loss(method='exact', control='A', variant='B', lift=0)¶

Compute the expected loss. This is the expected uplift lost by choosing a given variant, i.e., \(\mathrm{E}[\max(control - variant - lift, 0)]\).

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:	method (str (default="exact")) – The method of computation. Options are “exact” and “MC”. control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant. lift (float (default=0.0)) – The amount of uplift.
Returns:	expected_loss
Return type:	tuple of floats

Notes

Method “exact” uses the normal approximation of the Student’s t-distribution for the expected loss of the mean when the number of degrees of freedom is large. For small values, numerical intergration is used.

expected_loss_ci(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶

Compute credible intervals on the difference distribution of \(Z = control-variant\).

Parameters:	method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”. control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant. interval_length (float (default=0.9)) – Compute `interval_length`% credible interval. This is a value in [0, 1]. ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (`method="HDI`) and Equal-tailed interval (`method="ETI"`). Currently, `method="HDI` is only available for `method="MC"`.
Returns:	expected_loss_ci
Return type:	tuple of floats

expected_loss_relative(method='exact', control='A', variant='B')¶

Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift, i.e., \(\mathrm{E}[(control - variant) / variant]\).

Parameters:	method (str (default="exact")) – The method of computation. Options are “exact” and “MC”. control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant.
Returns:	expected_loss_relative
Return type:	tuple of floats

Notes

Method “exact” uses an approximation of \(E[1/X]\) where \(X\) follows a Student’s t-distribution.

expected_loss_relative_ci(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶

Compute credible intervals on the relative difference distribution of \(Z = (control - variant) / variant\).

Parameters:	method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”. control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant. interval_length (float (default=0.9)) – Compute `interval_length`% credible interval. This is a value in [0, 1]. ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (`method="HDI`) and Equal-tailed interval (`method="ETI"`). Currently, `method="HDI` is only available for `method="MC"`.
Returns:	expected_loss_relative_ci
Return type:	tuple of floats

expected_loss_relative_vs_all(method='quad', control='A', variant='B', mlhs_samples=1000)¶

Compute the expected relative loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[(\max(A, C, D) - B) / B]\).

Parameters:	method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration). variant (str (default="B")) – The chosen variant. mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns:	expected_loss_relative_vs_all
Return type:	tuple of floats

expected_loss_vs_all(method='quad', variant='B', lift=0, mlhs_samples=1000)¶

Compute the expected loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[\max(\max(A, C, D) - B, 0)]\).

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:	method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration). variant (str (default="B")) – The chosen variant. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns:	expected_loss_vs_all
Return type:	tuple of floats

probability(method='exact', control='A', variant='B', lift=0)¶

Compute the error probability or chance to beat control, i.e., \(P[variant > control + lift]\).

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:	method (str (default="exact")) – The method of computation. Options are “exact” and “MC”. control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant. lift (float (default=0.0)) – The amount of uplift.
Returns:	probability
Return type:	tuple of floats

Notes

Method “exact” uses the normal approximation of the Student’s t-distribution for the error probability of the mean when the number of degrees of freedom is large. For small values, numerical intergration is used.

probability_vs_all(method='quad', variant='B', lift=0, mlhs_samples=1000)¶

Compute the error probability or chance to beat all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(P[B > \max(A, C, D) + lift]\).

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:	method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration). variant (str (default="B")) – The chosen variant. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns:	probability_vs_all
Return type:	tuple of floats

update(data, variant)¶

Update posterior parameters for a given variant with new data samples.

Parameters:	data (array-like, shape = (n_samples)) – variant (str) –