# Binomial distribution¶

The binomial distribution is the discrete probability distribution of the number of successes in a sequence of $$m$$ boolean-valued outcome independent trials with probability of success $$p$$. The probability mass function for $$k \in \{0, 1, \ldots, m\}$$ is

$f(k; m, p) = \binom{m}{k} p^k (1-p)^{m-k},$

and the cumulative distribution function is

$F(k; m, p) = I_{1-p}(m - k, 1 + k),$

where $$I_x(a, b)$$ is the regularized incomplete beta function. The expected value and variance are as follows

$\mathrm{E}[X] = mp, \quad \mathrm{Var}[X] = mp(1-p).$

The Bernoulli distribution is suitable for binary-outcome tests, for example, CRO (conversion rate) or CTR (click-through rate) testing.

class cprior.models.BinomialModel(m, name='', alpha=1, beta=1)

Bases: cprior.cdist.beta.BetaModel

Bayesian model with a binomial likelihood and a beta prior distribution.

Given data samples $$\mathbf{x} = (x_1, \ldots, x_n)$$ from a binomial distribution with parameters $$m$$ and $$p$$, the posterior distribution is

$p | \mathbf{x} \sim \mathcal{B}\left(\alpha + \sum_{i=1}^n x_i, \beta + mn - \sum_{i=1}^n x_i\right),$

with prior parameters $$\alpha$$ and $$\beta$$.

Parameters: m (int) – Number of trials. name (str (default="")) – Model name. alpha (int or float (default=1)) – Prior parameter alpha. beta (int or float (default=1)) – Prior parameter beta.
n_samples_

Number of samples.

Type: int
alpha_posterior

Posterior parameter alpha.

Returns: alpha float
beta_posterior

Posterior parameter beta.

Returns: beta float
cdf(x)

Cumulative distribution function of the posterior distribution.

Parameters: x (array-like) – Quantiles. cdf – Cumulative distribution function evaluated at x. numpy.ndarray
credible_interval(interval_length)

Credible interval of the posterior distribution.

Parameters: interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1]. interval – Lower and upper credible interval limits. tuple
mean()

Mean of the posterior distribution.

Returns: mean float
pdf(x)

Probability density function of the posterior distribution.

Parameters: x (array-like) – Quantiles. pdf – Probability density function evaluated at x. numpy.ndarray
ppf(q)

Percent point function (quantile) of the posterior distribution.

Parameters: x (array-like) – Lower tail probability. ppf – Quantile corresponding to the lower tail probability q. numpy.ndarray
ppmean()

Posterior predictive mean.

If $$X$$ follows a binomial distribution with parameters $$m$$ and $$p$$, then the posterior predictive expected value is given by

$\mathrm{E}[X] = m \frac{\alpha}{\alpha + \beta},$

where $$\alpha$$ and $$\beta$$ are the posterior values of the parameters.

Returns: mean float
pppdf(x)

Posterior predictive probability density function.

If $$X$$ follows a binomial distribution with parameters $$m$$ and $$p$$, then the posterior predictive probability density function is given by

$f(x; m, \alpha, \beta) = \binom{m}{x} \frac{B(\alpha + x, \beta + m - x)}{B(\alpha, \beta)},$

where $$\alpha$$ and $$\beta$$ are the posterior values of the parameters.

Parameters: x (array-like) – Quantiles. pdf – Probability density function evaluated at x. float
ppvar()

Posterior predictive variance.

If $$X$$ follows a binomial distribution with parameters $$m$$ and $$p$$, then the posterior predictive variance is given by

$\mathrm{Var}[X] = \frac{m \alpha \beta (m + \alpha + \beta)} {(\alpha + \beta)^2 (\alpha + \beta + 1)}$

where $$\alpha$$ and $$\beta$$ are the posterior values of the parameters.

Returns: var float
rvs(size=1, random_state=None)

Random variates of the posterior distribution.

Parameters: size (int (default=1)) – Number of random variates. random_state (int or None (default=None)) – The seed used by the random number generator. rvs – Random variates of given size. numpy.ndarray or scalar
std()

Standard deviation of the posterior distribution.

Returns: std float
update(data)

Update posterior parameters with new data samples.

Parameters: data (array-like, shape = (n_samples)) – Data samples from a binomial distribution.
var()

Variance of the posterior distribution.

Returns: var float
class cprior.models.BinomialABTest(modelA, modelB, simulations=1000000, random_state=None)

Bases: cprior.cdist.beta.BetaABTest

Binomial A/B test.

Parameters: modelA (object) – The control model. modelB (object) – The variation model. simulations (int or None (default=1000000)) – Number of Monte Carlo simulations. random_state (int or None (default=None)) – The seed used by the random number generator.
expected_loss(method='exact', variant='A', lift=0, mlhs_samples=10000)

Compute the expected loss. This is the expected uplift lost by choosing a given variant.

• If variant == "A", $$\mathrm{E}[\max(B - A - lift, 0)]$$
• If variant == "B", $$\mathrm{E}[\max(A - B - lift, 0)]$$
• If variant == "all", both.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters: method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling). variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=10000)) – Number of samples for MLHS method. expected_loss float or tuple of floats
expected_loss_ci(method='MC', variant='A', interval_length=0.9, ci_method='ETI')

Compute credible intervals on the difference distribution of $$Z = B-A$$ and/or $$Z = A-B$$.

• If variant == "A", $$Z = B - A$$
• If variant == "B", $$Z = A - B$$
• If variant == "all", both.
Parameters: method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”. variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”. interval_length (float (default=0.9)) – Compute interval_length % credible interval. This is a value in [0, 1]. ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (method="HDI) and Equal-tailed interval (method="ETI"). Currently, method="HDI is only available for method="MC". expected_loss_ci np.ndarray or tuple of np.ndarray
expected_loss_relative(method='exact', variant='A')

Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift.

• If variant == "A", $$\mathrm{E}[(B - A) / A]$$
• If variant == "B", $$\mathrm{E}[(A - B) / B]$$
• If variant == "all", both.
Parameters: method (str (default="exact")) – The method of computation. Options are “exact” and “MC”. variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”. expected_loss_relative float or tuple of floats
expected_loss_relative_ci(method='MC', variant='A', interval_length=0.9, ci_method='ETI')

Compute credible intervals on the relative difference distribution of $$Z = (B-A)/A$$ and/or $$Z = (A-B)/B$$.

• If variant == "A", $$Z = (B-A)/A$$
• If variant == "B", $$Z = (A-B)/B$$
• If variant == "all", both.
Parameters: method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”. variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”. interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1]. ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (method="HDI) and Equal-tailed interval (method="ETI"). Currently, method="HDI is only available for method="MC". expected_loss_relative_ci np.ndarray or tuple of np.ndarray
probability(method='exact', variant='A', lift=0, mlhs_samples=10000)

Compute the error probability or chance to beat control.

• If variant == "A", $$P[A > B + lift]$$
• If variant == "B", $$P[B > A + lift]$$
• If variant == "all", both.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters: method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling). variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=10000)) – Number of samples for MLHS method. probability float or tuple of floats
update_A(data)

Update posterior parameters for variant A with new data samples.

Parameters: data (array-like, shape = (n_samples)) –
update_B(data)

Update posterior parameters for variant B with new data samples.

Parameters: data (array-like, shape = (n_samples)) –
class cprior.models.BinomialMVTest(models, simulations=1000000, random_state=None, n_jobs=None)

Bases: cprior.cdist.beta.BetaMVTest

Binomial Multivariate test.

Parameters: models (dict) – The control and variations models. simulations (int or None (default=1000000)) – Number of Monte Carlo simulations. random_state (int or None (default=None)) – The seed used by the random number generator.
expected_loss(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)

Compute the expected loss. This is the expected uplift lost by choosing a given variant, i.e., $$\mathrm{E}[\max(control - variant - lift, 0)]$$.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters: method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling). control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=10000)) – Number of samples for MLHS method. expected_loss float
expected_loss_ci(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')

Compute credible intervals on the difference distribution of $$Z = control-variant$$.

Parameters: method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”. control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant. interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1]. ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (method="HDI) and Equal-tailed interval (method="ETI"). Currently, method="HDI is only available for method="MC". expected_loss_ci np.ndarray or tuple of np.ndarray
expected_loss_relative(method='exact', control='A', variant='B')

Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift, i.e., $$\mathrm{E}[(control - variant) / variant]$$.

Parameters: method (str (default="exact")) – The method of computation. Options are “exact” and “MC”. control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant. expected_loss_relative float
expected_loss_relative_ci(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')

Compute credible intervals on the relative difference distribution of $$Z = (control - variant) / variant$$.

Parameters: method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”. control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant. interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1]. ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (method="HDI) and Equal-tailed interval (method="ETI"). Currently, method="HDI is only available for method="MC". expected_loss_relative_ci np.ndarray or tuple of np.ndarray
expected_loss_relative_vs_all(method='quad', control='A', variant='B', mlhs_samples=1000)

Compute the expected relative loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute $$\mathrm{E}[(\max(A, C, D) - B) / B]$$.

Parameters: method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration). variant (str (default="B")) – The chosen variant. mlhs_samples (int (default=1000)) – Number of samples for MLHS method. expected_loss_relative_vs_all float
expected_loss_vs_all(method='quad', variant='B', lift=0, mlhs_samples=1000)

Compute the expected loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute $$\mathrm{E}[\max(\max(A, C, D) - B, 0)]$$.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters: method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration). variant (str (default="B")) – The chosen variant. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=1000)) – Number of samples for MLHS method. expected_loss_vs_all float
probability(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)

Compute the error probability or chance to beat control, i.e., $$P[variant > control + lift]$$.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters: method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling). control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=10000)) – Number of samples for MLHS method. probability float
probability_vs_all(method='quad', variant='B', lift=0, mlhs_samples=1000)

Compute the error probability or chance to beat all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute $$P[B > \max(A, C, D) + lift]$$.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters: method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration). variant (str (default="B")) – The chosen variant. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=1000)) – Number of samples for MLHS method. probability_vs_all float
update(data, variant)

Update posterior parameters for a given variant with new data samples.

Parameters: data (array-like, shape = (n_samples)) – variant (str) –