Bernoulli distribution¶

The Bernoulli distribution is a discrete distribution with boolean-valued outcome; 1 indicating success with probability $p$ and 0 indicating failure with probability $q = 1 -p$, where $p \in [0, 1]$. The probability mass function for $k \in \{0, 1\}$ is

\[\begin{split}f(k; p) = p^k (1-p)^{k-1} = \begin{cases} 1-p & \text{if } k = 0\\ p & \text{if }k = 1, \end{cases}\end{split}\]

and the cumulative distribution function is

\[\begin{split}F(k; p) = \begin{cases} 1-p & \text{if } k = 0\\ 1 & \text{if }k = 1. \end{cases}\end{split}\]

The expected value and variance are as follows

\[\mathrm{E}[X] = p, \quad \mathrm{Var}[X]= p(1-p).\]

The Bernoulli distribution is suitable for binary-outcome tests, for example, CRO (conversion rate) or CTR (click-through rate) testing.

class cprior.models.BernoulliModel(name='', alpha=1, beta=1)¶

Bases: cprior.cdist.beta.BetaModel

Bayesian model with a Bernoulli likelihood and a beta prior distribution.

Given data samples $\mathbf{x} = (x_1, \ldots, x_n)$ from a Bernoulli distribution with parameter $p$, the posterior distribution is

\[p | \mathbf{x} \sim \mathcal{B}\left(\alpha + \sum_{i=1}^n x_i, \beta + n - \sum_{i=1}^n x_i\right),\]

with prior parameters $\alpha$ and $\beta$.

Parameters:	name (str (default="")) – Model name. alpha (int or float (default=1)) – Prior parameter alpha. beta (int or float (default=1)) – Prior parameter beta.

n_success_¶

Number of successes.

Type:	int

n_samples_¶

Number of samples.

Type:	int

alpha_posterior¶

Posterior parameter alpha.

Returns:	alpha
Return type:	float

beta_posterior¶

Posterior parameter beta.

Returns:	beta
Return type:	float

cdf(x)¶

Cumulative distribution function of the posterior distribution.

Parameters:	x (array-like) – Quantiles.
Returns:	cdf – Cumulative distribution function evaluated at x.
Return type:	numpy.ndarray

credible_interval(interval_length)¶

Credible interval of the posterior distribution.

Parameters:	interval_length (float (default=0.9)) – Compute `interval_length`% credible interval. This is a value in [0, 1].
Returns:	interval – Lower and upper credible interval limits.
Return type:	tuple

mean()¶

Mean of the posterior distribution.

Returns:	mean
Return type:	float

pdf(x)¶

Probability density function of the posterior distribution.

Parameters:	x (array-like) – Quantiles.
Returns:	pdf – Probability density function evaluated at x.
Return type:	numpy.ndarray

ppf(q)¶

Percent point function (quantile) of the posterior distribution.

Parameters:	x (array-like) – Lower tail probability.
Returns:	ppf – Quantile corresponding to the lower tail probability q.
Return type:	numpy.ndarray

ppmean()¶

Posterior predictive mean.

If $X$ is a Bernoulli trial with parameter $p \sim \mathcal{B}(\alpha, \beta)$, then the posterior predictive expected value is given by

\[\mathrm{E}[X] = \frac{\alpha}{\alpha + \beta},\]

where $\alpha$ and $\beta$ are the posterior values of the parameters.

Returns:	mean
Return type:	float

pppdf(x)¶

Posterior predictive probability density function.

If $X$ is a Bernoulli trial with parameter $p \sim \mathcal{B}(\alpha, \beta)$, then the posterior predictive probability density function is given by

\[\begin{split}f(x; \alpha, \beta) = \begin{cases} \frac{\beta}{\alpha+ \beta} & \text{if $x = 0$}\\ \frac{\alpha}{\alpha+ \beta} & \text{if $x = 1$}\, \end{cases}\end{split}\]

where $\alpha$ and $\beta$ are the posterior values of the parameters.

Parameters:	x (array-like) – Quantiles.
Returns:	pdf – Probability density function evaluated at x.
Return type:	float

ppvar()¶

Posterior predictive variance.

If $X$ is a Bernoulli trial with parameter $p \sim \mathcal{B}(\alpha, \beta)$, then the posterior predictive variance is given by

\[\mathrm{Var}[X] = \frac{\alpha \beta}{(\alpha + \beta)^2},\]

where $\alpha$ and $\beta$ are the posterior values of the parameters.

Returns:	var
Return type:	float

rvs(size=1, random_state=None)¶

Random variates of the posterior distribution.

Parameters:	size (int (default=1)) – Number of random variates. random_state (int or None (default=None)) – The seed used by the random number generator.
Returns:	rvs – Random variates of given size.
Return type:	numpy.ndarray or scalar

std()¶

Standard deviation of the posterior distribution.

Returns:	std
Return type:	float

update(data)¶

Update posterior parameters with new data samples.

Parameters:	data (array-like, shape = (n_samples)) – Data samples from a Bernoulli distribution.

var()¶

Variance of the posterior distribution.

Returns:	var
Return type:	float

class cprior.models.BernoulliABTest(modelA, modelB, simulations=1000000, random_state=None)¶

Bases: cprior.cdist.beta.BetaABTest

Bernoulli A/B test.

Parameters:	modelA (object) – The control model. modelB (object) – The variation model. simulations (int or None (default=1000000)) – Number of Monte Carlo simulations. random_state (int or None (default=None)) – The seed used by the random number generator.

expected_loss(method='exact', variant='A', lift=0, mlhs_samples=10000)¶

Compute the expected loss. This is the expected uplift lost by choosing a given variant.

If variant == "A", $\mathrm{E}[\max(B - A - lift, 0)]$
If variant == "B", $\mathrm{E}[\max(A - B - lift, 0)]$
If variant == "all", both.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:	method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling). variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns:	expected_loss
Return type:	float or tuple of floats

expected_loss_ci(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶

Compute credible intervals on the difference distribution of $Z = B-A$ and/or $Z = A-B$.

If variant == "A", $Z = B - A$
If variant == "B", $Z = A - B$
If variant == "all", both.

Parameters:	method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”. variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”. interval_length (float (default=0.9)) – Compute `interval_length` % credible interval. This is a value in [0, 1]. ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (`method="HDI`) and Equal-tailed interval (`method="ETI"`). Currently, `method="HDI` is only available for `method="MC"`.
Returns:	expected_loss_ci
Return type:	np.ndarray or tuple of np.ndarray

expected_loss_relative(method='exact', variant='A')¶

Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift.

If variant == "A", $\mathrm{E}[(B - A) / A]$
If variant == "B", $\mathrm{E}[(A - B) / B]$
If variant == "all", both.

Parameters:	method (str (default="exact")) – The method of computation. Options are “exact” and “MC”. variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
Returns:	expected_loss_relative
Return type:	float or tuple of floats

expected_loss_relative_ci(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶

Compute credible intervals on the relative difference distribution of $Z = (B-A)/A$ and/or $Z = (A-B)/B$.

If variant == "A", $Z = (B-A)/A$
If variant == "B", $Z = (A-B)/B$
If variant == "all", both.

Parameters:	method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”. variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”. interval_length (float (default=0.9)) – Compute `interval_length`% credible interval. This is a value in [0, 1]. ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (`method="HDI`) and Equal-tailed interval (`method="ETI"`). Currently, `method="HDI` is only available for `method="MC"`.
Returns:	expected_loss_relative_ci
Return type:	np.ndarray or tuple of np.ndarray

probability(method='exact', variant='A', lift=0, mlhs_samples=10000)¶

Compute the error probability or chance to beat control.

If variant == "A", $P[A > B + lift]$
If variant == "B", $P[B > A + lift]$
If variant == "all", both.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:	method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling). variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns:	probability
Return type:	float or tuple of floats

update_A(data)¶

Update posterior parameters for variant A with new data samples.

Parameters:	data (array-like, shape = (n_samples)) –

update_B(data)¶

Update posterior parameters for variant B with new data samples.

Parameters:	data (array-like, shape = (n_samples)) –

class cprior.models.BernoulliMVTest(models, simulations=1000000, random_state=None, n_jobs=None)¶

Bases: cprior.cdist.beta.BetaMVTest

Bernoulli Multivariate test.

Parameters:	models (dict) – The control and variations models. simulations (int or None (default=1000000)) – Number of Monte Carlo simulations. random_state (int or None (default=None)) – The seed used by the random number generator.

expected_loss(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)¶

Compute the expected loss. This is the expected uplift lost by choosing a given variant, i.e., $\mathrm{E}[\max(control - variant - lift, 0)]$.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:	method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling). control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns:	expected_loss
Return type:	float

expected_loss_ci(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶

Compute credible intervals on the difference distribution of $Z = control-variant$.

Parameters:	method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”. control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant. interval_length (float (default=0.9)) – Compute `interval_length`% credible interval. This is a value in [0, 1]. ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (`method="HDI`) and Equal-tailed interval (`method="ETI"`). Currently, `method="HDI` is only available for `method="MC"`.
Returns:	expected_loss_ci
Return type:	np.ndarray or tuple of np.ndarray

expected_loss_relative(method='exact', control='A', variant='B')¶

Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift, i.e., $\mathrm{E}[(control - variant) / variant]$.

Parameters:	method (str (default="exact")) – The method of computation. Options are “exact” and “MC”. control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant.
Returns:	expected_loss_relative
Return type:	float

expected_loss_relative_ci(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶

Compute credible intervals on the relative difference distribution of $Z = (control - variant) / variant$.

Parameters:	method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”. control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant. interval_length (float (default=0.9)) – Compute `interval_length`% credible interval. This is a value in [0, 1]. ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (`method="HDI`) and Equal-tailed interval (`method="ETI"`). Currently, `method="HDI` is only available for `method="MC"`.
Returns:	expected_loss_relative_ci
Return type:	np.ndarray or tuple of np.ndarray

expected_loss_relative_vs_all(method='quad', control='A', variant='B', mlhs_samples=1000)¶

Compute the expected relative loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute $\mathrm{E}[(\max(A, C, D) - B) / B]$.

Parameters:	method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration). variant (str (default="B")) – The chosen variant. mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns:	expected_loss_relative_vs_all
Return type:	float

expected_loss_vs_all(method='quad', variant='B', lift=0, mlhs_samples=1000)¶

Compute the expected loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute $\mathrm{E}[\max(\max(A, C, D) - B, 0)]$.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:	method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration). variant (str (default="B")) – The chosen variant. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns:	expected_loss_vs_all
Return type:	float

probability(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)¶

Compute the error probability or chance to beat control, i.e., $P[variant > control + lift]$.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:	method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling). control (str (default="A")) – The control variant. variant (str (default="B")) – The tested variant. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns:	probability
Return type:	float

probability_vs_all(method='quad', variant='B', lift=0, mlhs_samples=1000)¶

Compute the error probability or chance to beat all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute $P[B > \max(A, C, D) + lift]$.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:	method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration). variant (str (default="B")) – The chosen variant. lift (float (default=0.0)) – The amount of uplift. mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns:	probability_vs_all
Return type:	float

update(data, variant)¶

Update posterior parameters for a given variant with new data samples.

Parameters:	data (array-like, shape = (n_samples)) – variant (str) –