Geometric distribution

The geometric distribution is a discrete probability distribution with parameter \(p \in (0, 1)\). It can be defined as the number of Bernoulli trials, with probability of success \(p\), required to obtain a success. The probability mass function for \(k \ge 1\) is

\[f(k; p) = (1 - p)^{k - 1} p,\]

and the cumulative distribution function is

\[F(k; p) = 1 - (1 - p)^k.\]

The expected value and variance are as follows

\[\mathrm{E}[X] = \frac{1}{p}, \quad \mathrm{Var}[X] = \frac{1 - p}{p^2}.\]

The geometric distribution is suitable to model the number of failures before the first success.

class cprior.models.GeometricModel(name='', alpha=1, beta=1)

Bases: cprior.cdist.beta.BetaModel

Bayesian model with geometric likelihood and a beta prior distribution.

Given data samples \(\mathbf{x} = (x_1, \ldots, x_n)\) from a geometric distribution with parameter \(p\), the posterior distribution is

\[p | \mathbf{x} \sim \mathcal{B}\left(\alpha + n, \beta + \sum_{i=1}^n x_i - n \right),\]

with prior parameters \(\alpha\) and \(\beta\).

Parameters:
  • name (str (default="")) – Model name.
  • alpha (int or float (default=1)) – Prior parameter alpha.
  • beta (int or float (default=1)) – Prior parameter beta.
n_samples_

Number of samples.

Type:int
alpha_posterior

Posterior parameter alpha.

Returns:alpha
Return type:float
beta_posterior

Posterior parameter beta.

Returns:beta
Return type:float
cdf(x)

Cumulative distribution function of the posterior distribution.

Parameters:x (array-like) – Quantiles.
Returns:cdf – Cumulative distribution function evaluated at x.
Return type:numpy.ndarray
credible_interval(interval_length)

Credible interval of the posterior distribution.

Parameters:interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1].
Returns:interval – Lower and upper credible interval limits.
Return type:tuple
mean()

Mean of the posterior distribution.

Returns:mean
Return type:float
pdf(x)

Probability density function of the posterior distribution.

Parameters:x (array-like) – Quantiles.
Returns:pdf – Probability density function evaluated at x.
Return type:numpy.ndarray
ppf(q)

Percent point function (quantile) of the posterior distribution.

Parameters:x (array-like) – Lower tail probability.
Returns:ppf – Quantile corresponding to the lower tail probability q.
Return type:numpy.ndarray
ppmean()

Posterior predictive mean.

If \(X\) follows a geometric distribution with parameter \(\lambda\), then the posterior predictive expected value is given by

\[\mathrm{E}[X] = \frac{\alpha + \beta - 1}{\alpha - 1},\]

where \(\alpha\) and \(\beta\) are the posterior values of the parameters.

Returns:mean
Return type:float
pppdf(x)

Posterior predictive probability density function.

If \(X\) follows a geometric distribution with parameter \(p \sim \mathcal{B}(\alpha, \beta)\), then the posterior predictive probability density function is given by

\[f(x; \alpha, \beta) = \frac{B(\alpha + 1, \beta + x - 1)}{B( \alpha, \beta)},\]

where \(\alpha\) and \(\beta\) are the posterior values of the parameters.

Parameters:x (array-like) – Quantiles.
Returns:pdf – Probability density function evaluated at x.
Return type:float
ppvar()

Posterior predictive variance.

If \(X\) follows a geometric distribution with parameter \(p \sim \mathcal{B}(\alpha, \beta)\), then the posterior predictive variance is given by

\[\mathrm{Var}[X] = \frac{\beta (\alpha + \beta - 1)}{ (\alpha - 1)^2 (\alpha - 2)},\]

where \(\alpha\) and \(\beta\) are the posterior values of the parameters.

Returns:var
Return type:float
rvs(size=1, random_state=None)

Random variates of the posterior distribution.

Parameters:
  • size (int (default=1)) – Number of random variates.
  • random_state (int or None (default=None)) – The seed used by the random number generator.
Returns:

rvs – Random variates of given size.

Return type:

numpy.ndarray or scalar

std()

Standard deviation of the posterior distribution.

Returns:std
Return type:float
update(data)

Update posterior parameters with new data samples.

Parameters:data (array-like, shape = (n_samples)) – Data samples from a geometric distribution.
var()

Variance of the posterior distribution.

Returns:var
Return type:float
class cprior.models.GeometricABTest(modelA, modelB, simulations=1000000, random_state=None)

Bases: cprior.cdist.beta.BetaABTest

Geometric A/B test.

Parameters:
  • modelA (object) – The control model.
  • modelB (object) – The variation model.
  • simulations (int or None (default=1000000)) – Number of Monte Carlo simulations.
  • random_state (int or None (default=None)) – The seed used by the random number generator.
expected_loss(method='exact', variant='A', lift=0, mlhs_samples=10000)

Compute the expected loss. This is the expected uplift lost by choosing a given variant.

  • If variant == "A", \(\mathrm{E}[\max(B - A - lift, 0)]\)
  • If variant == "B", \(\mathrm{E}[\max(A - B - lift, 0)]\)
  • If variant == "all", both.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:
  • method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
  • variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
  • lift (float (default=0.0)) – The amount of uplift.
  • mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns:

expected_loss

Return type:

float or tuple of floats

expected_loss_ci(method='MC', variant='A', interval_length=0.9, ci_method='ETI')

Compute credible intervals on the difference distribution of \(Z = B-A\) and/or \(Z = A-B\).

  • If variant == "A", \(Z = B - A\)
  • If variant == "B", \(Z = A - B\)
  • If variant == "all", both.
Parameters:
  • method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”.
  • variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
  • interval_length (float (default=0.9)) – Compute interval_length % credible interval. This is a value in [0, 1].
  • ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (method="HDI) and Equal-tailed interval (method="ETI"). Currently, method="HDI is only available for method="MC".
Returns:

expected_loss_ci

Return type:

np.ndarray or tuple of np.ndarray

expected_loss_relative(method='exact', variant='A')

Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift.

  • If variant == "A", \(\mathrm{E}[(B - A) / A]\)
  • If variant == "B", \(\mathrm{E}[(A - B) / B]\)
  • If variant == "all", both.
Parameters:
  • method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
  • variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
Returns:

expected_loss_relative

Return type:

float or tuple of floats

expected_loss_relative_ci(method='MC', variant='A', interval_length=0.9, ci_method='ETI')

Compute credible intervals on the relative difference distribution of \(Z = (B-A)/A\) and/or \(Z = (A-B)/B\).

  • If variant == "A", \(Z = (B-A)/A\)
  • If variant == "B", \(Z = (A-B)/B\)
  • If variant == "all", both.
Parameters:
  • method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”.
  • variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
  • interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1].
  • ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (method="HDI) and Equal-tailed interval (method="ETI"). Currently, method="HDI is only available for method="MC".
Returns:

expected_loss_relative_ci

Return type:

np.ndarray or tuple of np.ndarray

probability(method='exact', variant='A', lift=0, mlhs_samples=10000)

Compute the error probability or chance to beat control.

  • If variant == "A", \(P[A > B + lift]\)
  • If variant == "B", \(P[B > A + lift]\)
  • If variant == "all", both.

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:
  • method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
  • variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
  • lift (float (default=0.0)) – The amount of uplift.
  • mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns:

probability

Return type:

float or tuple of floats

update_A(data)

Update posterior parameters for variant A with new data samples.

Parameters:data (array-like, shape = (n_samples)) –
update_B(data)

Update posterior parameters for variant B with new data samples.

Parameters:data (array-like, shape = (n_samples)) –
class cprior.models.GeometricMVTest(models, simulations=1000000, random_state=None, n_jobs=None)

Bases: cprior.cdist.beta.BetaMVTest

Geometric Multivariate test.

Parameters:
  • models (dict) – The control and variations models.
  • simulations (int or None (default=1000000)) – Number of Monte Carlo simulations.
  • random_state (int or None (default=None)) – The seed used by the random number generator.
expected_loss(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)

Compute the expected loss. This is the expected uplift lost by choosing a given variant, i.e., \(\mathrm{E}[\max(control - variant - lift, 0)]\).

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:
  • method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
  • control (str (default="A")) – The control variant.
  • variant (str (default="B")) – The tested variant.
  • lift (float (default=0.0)) – The amount of uplift.
  • mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns:

expected_loss

Return type:

float

expected_loss_ci(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')

Compute credible intervals on the difference distribution of \(Z = control-variant\).

Parameters:
  • method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”.
  • control (str (default="A")) – The control variant.
  • variant (str (default="B")) – The tested variant.
  • interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1].
  • ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (method="HDI) and Equal-tailed interval (method="ETI"). Currently, method="HDI is only available for method="MC".
Returns:

expected_loss_ci

Return type:

np.ndarray or tuple of np.ndarray

expected_loss_relative(method='exact', control='A', variant='B')

Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift, i.e., \(\mathrm{E}[(control - variant) / variant]\).

Parameters:
  • method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
  • control (str (default="A")) – The control variant.
  • variant (str (default="B")) – The tested variant.
Returns:

expected_loss_relative

Return type:

float

expected_loss_relative_ci(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')

Compute credible intervals on the relative difference distribution of \(Z = (control - variant) / variant\).

Parameters:
  • method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”.
  • control (str (default="A")) – The control variant.
  • variant (str (default="B")) – The tested variant.
  • interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1].
  • ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (method="HDI) and Equal-tailed interval (method="ETI"). Currently, method="HDI is only available for method="MC".
Returns:

expected_loss_relative_ci

Return type:

np.ndarray or tuple of np.ndarray

expected_loss_relative_vs_all(method='quad', control='A', variant='B', mlhs_samples=1000)

Compute the expected relative loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[(\max(A, C, D) - B) / B]\).

Parameters:
  • method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
  • variant (str (default="B")) – The chosen variant.
  • mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns:

expected_loss_relative_vs_all

Return type:

float

expected_loss_vs_all(method='quad', variant='B', lift=0, mlhs_samples=1000)

Compute the expected loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[\max(\max(A, C, D) - B, 0)]\).

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:
  • method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
  • variant (str (default="B")) – The chosen variant.
  • lift (float (default=0.0)) – The amount of uplift.
  • mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns:

expected_loss_vs_all

Return type:

float

probability(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)

Compute the error probability or chance to beat control, i.e., \(P[variant > control + lift]\).

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:
  • method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
  • control (str (default="A")) – The control variant.
  • variant (str (default="B")) – The tested variant.
  • lift (float (default=0.0)) – The amount of uplift.
  • mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns:

probability

Return type:

float

probability_vs_all(method='quad', variant='B', lift=0, mlhs_samples=1000)

Compute the error probability or chance to beat all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(P[B > \max(A, C, D) + lift]\).

If lift is positive value, the computation method must be Monte Carlo sampling.

Parameters:
  • method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
  • variant (str (default="B")) – The chosen variant.
  • lift (float (default=0.0)) – The amount of uplift.
  • mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns:

probability_vs_all

Return type:

float

update(data, variant)

Update posterior parameters for a given variant with new data samples.

Parameters:
  • data (array-like, shape = (n_samples)) –
  • variant (str) –