Binomial distribution¶
The binomial distribution is the discrete probability distribution of the number of successes in a sequence of \(m\) boolean-valued outcome independent trials with probability of success \(p\). The probability mass function for \(k \in \{0, 1, \ldots, m\}\) is
and the cumulative distribution function is
where \(I_x(a, b)\) is the regularized incomplete beta function. The expected value and variance are as follows
The Bernoulli distribution is suitable for binary-outcome tests, for example, CRO (conversion rate) or CTR (click-through rate) testing.
- 
class cprior.models.BinomialModel(m, name='', alpha=1, beta=1)¶
- Bases: - cprior.cdist.beta.BetaModel- Bayesian model with a binomial likelihood and a beta prior distribution. - Given data samples \(\mathbf{x} = (x_1, \ldots, x_n)\) from a binomial distribution with parameters \(m\) and \(p\), the posterior distribution is \[p | \mathbf{x} \sim \mathcal{B}\left(\alpha + \sum_{i=1}^n x_i, \beta + mn - \sum_{i=1}^n x_i\right),\]- with prior parameters \(\alpha\) and \(\beta\). - Parameters: - m (int) – Number of trials.
- name (str (default="")) – Model name.
- alpha (int or float (default=1)) – Prior parameter alpha.
- beta (int or float (default=1)) – Prior parameter beta.
 - 
n_samples_¶
- Number of samples. - Type: - int 
 - 
alpha_posterior¶
- Posterior parameter alpha. - Returns: - alpha - Return type: - float 
 - 
beta_posterior¶
- Posterior parameter beta. - Returns: - beta - Return type: - float 
 - 
cdf(x)¶
- Cumulative distribution function of the posterior distribution. - Parameters: - x (array-like) – Quantiles. - Returns: - cdf – Cumulative distribution function evaluated at x. - Return type: - numpy.ndarray 
 - 
credible_interval(interval_length)¶
- Credible interval of the posterior distribution. - Parameters: - interval_length (float (default=0.9)) – Compute - interval_length% credible interval. This is a value in [0, 1].- Returns: - interval – Lower and upper credible interval limits. - Return type: - tuple 
 - 
mean()¶
- Mean of the posterior distribution. - Returns: - mean - Return type: - float 
 - 
pdf(x)¶
- Probability density function of the posterior distribution. - Parameters: - x (array-like) – Quantiles. - Returns: - pdf – Probability density function evaluated at x. - Return type: - numpy.ndarray 
 - 
ppf(q)¶
- Percent point function (quantile) of the posterior distribution. - Parameters: - x (array-like) – Lower tail probability. - Returns: - ppf – Quantile corresponding to the lower tail probability q. - Return type: - numpy.ndarray 
 - 
ppmean()¶
- Posterior predictive mean. - If \(X\) follows a binomial distribution with parameters \(m\) and \(p\), then the posterior predictive expected value is given by \[\mathrm{E}[X] = m \frac{\alpha}{\alpha + \beta},\]- where \(\alpha\) and \(\beta\) are the posterior values of the parameters. - Returns: - mean - Return type: - float 
 - 
pppdf(x)¶
- Posterior predictive probability density function. - If \(X\) follows a binomial distribution with parameters \(m\) and \(p\), then the posterior predictive probability density function is given by \[f(x; m, \alpha, \beta) = \binom{m}{x} \frac{B(\alpha + x, \beta + m - x)}{B(\alpha, \beta)},\]- where \(\alpha\) and \(\beta\) are the posterior values of the parameters. - Parameters: - x (array-like) – Quantiles. - Returns: - pdf – Probability density function evaluated at x. - Return type: - float 
 - 
ppvar()¶
- Posterior predictive variance. - If \(X\) follows a binomial distribution with parameters \(m\) and \(p\), then the posterior predictive variance is given by \[\mathrm{Var}[X] = \frac{m \alpha \beta (m + \alpha + \beta)} {(\alpha + \beta)^2 (\alpha + \beta + 1)}\]- where \(\alpha\) and \(\beta\) are the posterior values of the parameters. - Returns: - var - Return type: - float 
 - 
rvs(size=1, random_state=None)¶
- Random variates of the posterior distribution. - Parameters: - size (int (default=1)) – Number of random variates.
- random_state (int or None (default=None)) – The seed used by the random number generator.
 - Returns: - rvs – Random variates of given size. - Return type: - numpy.ndarray or scalar 
 - 
std()¶
- Standard deviation of the posterior distribution. - Returns: - std - Return type: - float 
 - 
update(data)¶
- Update posterior parameters with new data samples. - Parameters: - data (array-like, shape = (n_samples)) – Data samples from a binomial distribution. 
 - 
var()¶
- Variance of the posterior distribution. - Returns: - var - Return type: - float 
 
- 
class cprior.models.BinomialABTest(modelA, modelB, simulations=1000000, random_state=None)¶
- Bases: - cprior.cdist.beta.BetaABTest- Binomial A/B test. - Parameters: - modelA (object) – The control model.
- modelB (object) – The variation model.
- simulations (int or None (default=1000000)) – Number of Monte Carlo simulations.
- random_state (int or None (default=None)) – The seed used by the random number generator.
 - 
expected_loss(method='exact', variant='A', lift=0, mlhs_samples=10000)¶
- Compute the expected loss. This is the expected uplift lost by choosing a given variant. - If variant == "A", \(\mathrm{E}[\max(B - A - lift, 0)]\)
- If variant == "B", \(\mathrm{E}[\max(A - B - lift, 0)]\)
- If variant == "all", both.
 - If - liftis positive value, the computation method must be Monte Carlo sampling.- Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
 - Returns: - expected_loss - Return type: - float or tuple of floats 
- If 
 - 
expected_loss_ci(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶
- Compute credible intervals on the difference distribution of \(Z = B-A\) and/or \(Z = A-B\). - If variant == "A", \(Z = B - A\)
- If variant == "B", \(Z = A - B\)
- If variant == "all", both.
 - Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”.
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1].
- ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (method="HDI) and Equal-tailed interval (method="ETI"). Currently,method="HDIis only available formethod="MC".
 - Returns: - expected_loss_ci - Return type: - np.ndarray or tuple of np.ndarray 
- If 
 - 
expected_loss_relative(method='exact', variant='A')¶
- Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift. - If variant == "A", \(\mathrm{E}[(B - A) / A]\)
- If variant == "B", \(\mathrm{E}[(A - B) / B]\)
- If variant == "all", both.
 - Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
 - Returns: - expected_loss_relative - Return type: - float or tuple of floats 
- If 
 - 
expected_loss_relative_ci(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶
- Compute credible intervals on the relative difference distribution of \(Z = (B-A)/A\) and/or \(Z = (A-B)/B\). - If variant == "A", \(Z = (B-A)/A\)
- If variant == "B", \(Z = (A-B)/B\)
- If variant == "all", both.
 - Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”.
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1].
- ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (method="HDI) and Equal-tailed interval (method="ETI"). Currently,method="HDIis only available formethod="MC".
 - Returns: - expected_loss_relative_ci - Return type: - np.ndarray or tuple of np.ndarray 
- If 
 - 
probability(method='exact', variant='A', lift=0, mlhs_samples=10000)¶
- Compute the error probability or chance to beat control. - If variant == "A", \(P[A > B + lift]\)
- If variant == "B", \(P[B > A + lift]\)
- If variant == "all", both.
 - If - liftis positive value, the computation method must be Monte Carlo sampling.- Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
 - Returns: - probability - Return type: - float or tuple of floats 
- If 
 - 
update_A(data)¶
- Update posterior parameters for variant A with new data samples. - Parameters: - data (array-like, shape = (n_samples)) – 
 - 
update_B(data)¶
- Update posterior parameters for variant B with new data samples. - Parameters: - data (array-like, shape = (n_samples)) – 
 
- 
class cprior.models.BinomialMVTest(models, simulations=1000000, random_state=None, n_jobs=None)¶
- Bases: - cprior.cdist.beta.BetaMVTest- Binomial Multivariate test. - Parameters: - models (dict) – The control and variations models.
- simulations (int or None (default=1000000)) – Number of Monte Carlo simulations.
- random_state (int or None (default=None)) – The seed used by the random number generator.
 - 
expected_loss(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)¶
- Compute the expected loss. This is the expected uplift lost by choosing a given variant, i.e., \(\mathrm{E}[\max(control - variant - lift, 0)]\). - If - liftis positive value, the computation method must be Monte Carlo sampling.- Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
 - Returns: - expected_loss - Return type: - float 
 - 
expected_loss_ci(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶
- Compute credible intervals on the difference distribution of \(Z = control-variant\). - Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”.
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1].
- ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (method="HDI) and Equal-tailed interval (method="ETI"). Currently,method="HDIis only available formethod="MC".
 - Returns: - expected_loss_ci - Return type: - np.ndarray or tuple of np.ndarray 
 - 
expected_loss_relative(method='exact', control='A', variant='B')¶
- Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift, i.e., \(\mathrm{E}[(control - variant) / variant]\). - Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
 - Returns: - expected_loss_relative - Return type: - float 
 - 
expected_loss_relative_ci(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶
- Compute credible intervals on the relative difference distribution of \(Z = (control - variant) / variant\). - Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”.
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1].
- ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (method="HDI) and Equal-tailed interval (method="ETI"). Currently,method="HDIis only available formethod="MC".
 - Returns: - expected_loss_relative_ci - Return type: - np.ndarray or tuple of np.ndarray 
 - 
expected_loss_relative_vs_all(method='quad', control='A', variant='B', mlhs_samples=1000)¶
- Compute the expected relative loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[(\max(A, C, D) - B) / B]\). - Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
- variant (str (default="B")) – The chosen variant.
- mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
 - Returns: - expected_loss_relative_vs_all - Return type: - float 
 - 
expected_loss_vs_all(method='quad', variant='B', lift=0, mlhs_samples=1000)¶
- Compute the expected loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[\max(\max(A, C, D) - B, 0)]\). - If - liftis positive value, the computation method must be Monte Carlo sampling.- Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
- variant (str (default="B")) – The chosen variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
 - Returns: - expected_loss_vs_all - Return type: - float 
 - 
probability(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)¶
- Compute the error probability or chance to beat control, i.e., \(P[variant > control + lift]\). - If - liftis positive value, the computation method must be Monte Carlo sampling.- Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
 - Returns: - probability - Return type: - float 
 - 
probability_vs_all(method='quad', variant='B', lift=0, mlhs_samples=1000)¶
- Compute the error probability or chance to beat all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(P[B > \max(A, C, D) + lift]\). - If - liftis positive value, the computation method must be Monte Carlo sampling.- Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
- variant (str (default="B")) – The chosen variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
 - Returns: - probability_vs_all - Return type: - float 
 - 
update(data, variant)¶
- Update posterior parameters for a given variant with new data samples. - Parameters: - data (array-like, shape = (n_samples)) –
- variant (str) –