Normal distribution¶
The normal distribution or Gaussian distribution is a continuous probability distribution. The probability density function of a normal distribution with mean \(\mu\) and standard deviation \(\sigma\) for \(x \in \mathbb{R}\) is
and the cumulative distribution is
The expected value and variance are as follows
The normal distribution is used to model/approximate symmetric centralized distributions.
- 
class 
cprior.models.NormalModel(name='', loc=0.001, variance_scale=0.001, shape=0.001, scale=0.001)¶ Bases:
cprior.cdist.normal_inverse_gamma.NormalInverseGammaModelBayesian model with a normal likelihood and a normal-inverse-gamma prior distribution.
Given data samples \(\mathbf{x} = (x_1, \ldots, x_n)\) from a normal distribution with parameters mean \(\mu\), and variance \(\sigma^2\), the posterior distribution is
\[\mu, \sigma^2 | \mathbf{x} \sim \mathcal{N}\Gamma^{-1}\left(\mu_n, \lambda_n, \alpha_n, \beta_n\right),\]where,
\[ \begin{align}\begin{aligned}\mu_n &= \frac{\lambda \mu_0 + n \bar{x}}{\lambda + n},\\\lambda_n &= \lambda + n,\\\alpha_n &= \alpha + \frac{n}{2},\\\beta_n &= \beta + \frac{1}{2} \left(\sum_{i=1}^n (x_i - \bar{x})^2 + \frac{n \lambda (\bar{x} - \mu_0)^2}{\lambda + n} \right).\end{aligned}\end{align} \]with prior parameters \(\mu_0\) (loc), \(\lambda\) (variance_scale), \(\alpha\) (shape) and \(\beta\) (scale). Note that \(n \bar{x} = \sum_{i=1}^n x_i\).
Parameters: - name (str (default="")) – Model name.
 - loc (float (default=0.001)) – Prior parameter loc.
 - variance_scale (float (default=0.001)) – Prior parameter variance_scale.
 - shape (float (default=0.001)) – Prior parameter shape.
 - scale (float (default=0.001)) – Prior parameter scale.
 
- 
n_samples_¶ Number of samples.
Type: int 
- 
cdf(x, sig2)¶ Cumulative distribution function of the posterior distribution.
Parameters: - x (array-like) – Quantiles.
 - sig2 (array-like) – Quantiles.
 
Returns: cdf – Cumulative distribution function evaluated at (x, sig2).
Return type: numpy.ndarray
- 
credible_interval(interval_length)¶ Credible interval of the posterior distribution.
Parameters: interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1].Returns: interval – Lower and upper credible interval limits. Return type: tuple 
- 
loc_posterior¶ Posterior parameter mu (location).
Returns: mu Return type: float 
- 
mean()¶ Mean of the posterior distribution.
Returns: mean Return type: tuple of floats 
- 
pdf(x, sig2)¶ Probability density function of the posterior distribution.
Parameters: - x (array-like) – Quantiles.
 - sig2 (array-like) – Quantiles.
 
Returns: pdf – Probability density function evaluated at (x, sig2).
Return type: numpy.ndarray
- 
ppf(q)¶ Percent point function (quantile) of the posterior distribution.
Parameters: x (array-like) – Lower tail probability. Returns: ppf – Quantile corresponding to the lower tail probability q. Return type: tuple of numpy.ndarray 
- 
ppmean()¶ Posterior predictive mean.
If \(X\) follows a normal distribution with parameters \(\mu\) and \(\sigma^2\), then the posterior predictive expected value is given by
\[\mathrm{E}[X] = \mu_0,\]where \(\mu_0\) is the posterior value of the parameter.
Returns: mean Return type: float 
- 
pppdf(x)¶ Posterior predictive probability density function.
If \(X\) follows a normal distribution with parameters \(\mu\) and \(\sigma^2\), then the posterior predictive probability density function is given by the probability density function of the following Student’s t-distribution
\[t_{2 \alpha}\left(\mu_0, \frac{\beta (1 + \lambda^{-1})}{\alpha}\right),\]where \(\mu_0\), \(\lambda\), \(\alpha\) and \(\beta\) are the posterior values of the parameters.
Parameters: x (array-like) – Quantiles. Returns: pdf – Probability density function evaluated at x. Return type: float 
- 
ppvar()¶ Posterior predictive variance.
If \(X\) follows a normal distribution with parameters \(\mu\) and \(\sigma^2\), then the posterior predictive variance is given by
\[\mathrm{Var}[X] = \frac{\left(\beta(1 + \lambda^{-1})\right)^2} {\alpha(\alpha - 1)},\]where \(\lambda\), \(\alpha\) and \(\beta\) are the posterior values of the parameters.
Returns: var Return type: float 
- 
rvs(size=1, random_state=None)¶ Random variates of the posterior distribution.
Parameters: - size (int (default=1)) – Number of random variates.
 - random_state (int or None (default=None)) – The seed used by the random number generator.
 
Returns: rvs – Random variates of given size (size, 2).
Return type: numpy.ndarray
- 
scale_posterior¶ Posterior parameter beta (scale).
Returns: beta Return type: float 
- 
shape_posterior¶ Posterior parameter alpha (shape).
Returns: alpha Return type: float 
- 
std()¶ Standard deviation of the posterior distribution.
Returns: std Return type: tuple of floats 
- 
update(data)¶ Update posterior parameters with new data.
Parameters: data (array-like, shape = (n_samples)) – Data samples from a normal distribution. 
- 
var()¶ Variance of the posterior distribution.
Returns: var Return type: tuple of floats 
- 
variance_scale_posterior¶ Posterior parameter lambda (variance_scale).
Returns: lambda Return type: float 
- 
class 
cprior.models.NormalABTest(modelA, modelB, simulations=1000000, random_state=None)¶ Bases:
cprior.cdist.normal_inverse_gamma.NormalInverseGammaABTestNormal A/B test.
Parameters: - modelA (object) – The control model.
 - modelB (object) – The variation model.
 - simulations (int or None (default=1000000)) – Number of Monte Carlo simulations.
 - random_state (int or None (default=None)) – The seed used by the random number generator.
 
- 
expected_loss(method='exact', variant='A', lift=0)¶ Compute the expected loss. This is the expected uplift lost by choosing a given variant.
- If 
variant == "A", \(\mathrm{E}[\max(B - A - lift, 0)]\) - If 
variant == "B", \(\mathrm{E}[\max(A - B - lift, 0)]\) - If 
variant == "all", both. 
If
liftis positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
 - variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
 - lift (float (default=0.0)) – The amount of uplift.
 
Returns: expected_loss
Return type: tuple of floats
Notes
Method “exact” uses the normal approximation of the Student’s t-distribution for the expected loss of the mean when the number of degrees of freedom is large. For small values, numerical intergration is used.
- If 
 
- 
expected_loss_ci(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the difference distribution of \(Z = B-A\) and/or \(Z = A-B\).
- If 
variant == "A", \(Z = B - A\) - If 
variant == "B", \(Z = A - B\) - If 
variant == "all", both. 
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”.
 - variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
 - interval_length (float (default=0.9)) – Compute 
interval_length% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI) and Equal-tailed interval (method="ETI"). Currently,method="HDIis only available formethod="MC". 
Returns: expected_loss_ci
Return type: tuple of floats
- If 
 
- 
expected_loss_relative(method='exact', variant='A')¶ Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift.
- If 
variant == "A", \(\mathrm{E}[(B - A) / A]\) - If 
variant == "B", \(\mathrm{E}[(A - B) / B]\) - If 
variant == "all", both. 
Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
 - variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
 
Returns: expected_loss_relative
Return type: tuple of floats
Notes
Method “exact” uses an approximation of \(E[1/X]\) where \(X\) follows a Student’s t-distribution.
- If 
 
- 
expected_loss_relative_ci(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the relative difference distribution of \(Z = (B-A)/A\) and/or \(Z = (A-B)/B\).
- If 
variant == "A", \(Z = (B-A)/A\) - If 
variant == "B", \(Z = (A-B)/B\) - If 
variant == "all", both. 
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”.
 - variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
 - interval_length (float (default=0.9)) – Compute 
interval_length% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI) and Equal-tailed interval (method="ETI"). Currently,method="HDIis only available formethod="MC". 
Returns: expected_loss_relative_ci
Return type: tuple of floats
Notes
Method “exact” uses the normal approximation of the Student’s t-distribution for the expected loss of the mean.
- If 
 
- 
probability(method='exact', variant='A', lift=0)¶ Compute the error probability or chance to beat control.
- If 
variant == "A", \(P[A > B + lift]\) - If 
variant == "B", \(P[B > A + lift]\) - If 
variant == "all", both. 
If
liftis positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
 - variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
 - lift (float (default=0.0)) – The amount of uplift.
 
Returns: probability
Return type: tuple of floats
Notes
Method “exact” uses the normal approximation of the Student’s t-distribution for the error probability of the mean when the number of degrees of freedom is large. For small values, numerical intergration is used.
- If 
 
- 
update_A(data)¶ Update posterior parameters for variant A with new data samples.
Parameters: data (array-like, shape = (n_samples)) – 
- 
update_B(data)¶ Update posterior parameters for variant B with new data samples.
Parameters: data (array-like, shape = (n_samples)) – 
- 
class 
cprior.models.NormalMVTest(models, simulations=1000000, random_state=None, n_jobs=None)¶ Bases:
cprior.cdist.normal_inverse_gamma.NormalInverseGammaMVTestNormal Multivariate test.
Parameters: - models (dict) – The control and variations models.
 - simulations (int or None (default=1000000)) – Number of Monte Carlo simulations.
 - random_state (int or None (default=None)) – The seed used by the random number generator.
 
- 
expected_loss(method='exact', control='A', variant='B', lift=0)¶ Compute the expected loss. This is the expected uplift lost by choosing a given variant, i.e., \(\mathrm{E}[\max(control - variant - lift, 0)]\).
If
liftis positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
 - control (str (default="A")) – The control variant.
 - variant (str (default="B")) – The tested variant.
 - lift (float (default=0.0)) – The amount of uplift.
 
Returns: expected_loss
Return type: tuple of floats
Notes
Method “exact” uses the normal approximation of the Student’s t-distribution for the expected loss of the mean when the number of degrees of freedom is large. For small values, numerical intergration is used.
- 
expected_loss_ci(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the difference distribution of \(Z = control-variant\).
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”.
 - control (str (default="A")) – The control variant.
 - variant (str (default="B")) – The tested variant.
 - interval_length (float (default=0.9)) – Compute 
interval_length% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI) and Equal-tailed interval (method="ETI"). Currently,method="HDIis only available formethod="MC". 
Returns: expected_loss_ci
Return type: tuple of floats
- 
expected_loss_relative(method='exact', control='A', variant='B')¶ Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift, i.e., \(\mathrm{E}[(control - variant) / variant]\).
Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
 - control (str (default="A")) – The control variant.
 - variant (str (default="B")) – The tested variant.
 
Returns: expected_loss_relative
Return type: tuple of floats
Notes
Method “exact” uses an approximation of \(E[1/X]\) where \(X\) follows a Student’s t-distribution.
- 
expected_loss_relative_ci(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the relative difference distribution of \(Z = (control - variant) / variant\).
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”.
 - control (str (default="A")) – The control variant.
 - variant (str (default="B")) – The tested variant.
 - interval_length (float (default=0.9)) – Compute 
interval_length% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI) and Equal-tailed interval (method="ETI"). Currently,method="HDIis only available formethod="MC". 
Returns: expected_loss_relative_ci
Return type: tuple of floats
- 
expected_loss_relative_vs_all(method='quad', control='A', variant='B', mlhs_samples=1000)¶ Compute the expected relative loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[(\max(A, C, D) - B) / B]\).
Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
 - variant (str (default="B")) – The chosen variant.
 - mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
 
Returns: expected_loss_relative_vs_all
Return type: tuple of floats
- 
expected_loss_vs_all(method='quad', variant='B', lift=0, mlhs_samples=1000)¶ Compute the expected loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[\max(\max(A, C, D) - B, 0)]\).
If
liftis positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
 - variant (str (default="B")) – The chosen variant.
 - lift (float (default=0.0)) – The amount of uplift.
 - mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
 
Returns: expected_loss_vs_all
Return type: tuple of floats
- 
probability(method='exact', control='A', variant='B', lift=0)¶ Compute the error probability or chance to beat control, i.e., \(P[variant > control + lift]\).
If
liftis positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
 - control (str (default="A")) – The control variant.
 - variant (str (default="B")) – The tested variant.
 - lift (float (default=0.0)) – The amount of uplift.
 
Returns: probability
Return type: tuple of floats
Notes
Method “exact” uses the normal approximation of the Student’s t-distribution for the error probability of the mean when the number of degrees of freedom is large. For small values, numerical intergration is used.
- 
probability_vs_all(method='quad', variant='B', lift=0, mlhs_samples=1000)¶ Compute the error probability or chance to beat all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(P[B > \max(A, C, D) + lift]\).
If
liftis positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
 - variant (str (default="B")) – The chosen variant.
 - lift (float (default=0.0)) – The amount of uplift.
 - mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
 
Returns: probability_vs_all
Return type: tuple of floats
- 
update(data, variant)¶ Update posterior parameters for a given variant with new data samples.
Parameters: - data (array-like, shape = (n_samples)) –
 - variant (str) –