Geometric distribution¶
The geometric distribution is a discrete probability distribution with parameter \(p \in (0, 1)\). It can be defined as the number of Bernoulli trials, with probability of success \(p\), required to obtain a success. The probability mass function for \(k \ge 1\) is
and the cumulative distribution function is
The expected value and variance are as follows
The geometric distribution is suitable to model the number of failures before the first success.
-
class
cprior.models.
GeometricModel
(name='', alpha=1, beta=1)¶ Bases:
cprior.cdist.beta.BetaModel
Bayesian model with geometric likelihood and a beta prior distribution.
Given data samples \(\mathbf{x} = (x_1, \ldots, x_n)\) from a geometric distribution with parameter \(p\), the posterior distribution is
\[p | \mathbf{x} \sim \mathcal{B}\left(\alpha + n, \beta + \sum_{i=1}^n x_i - n \right),\]with prior parameters \(\alpha\) and \(\beta\).
Parameters: - name (str (default="")) – Model name.
- alpha (int or float (default=1)) – Prior parameter alpha.
- beta (int or float (default=1)) – Prior parameter beta.
-
n_samples_
¶ Number of samples.
Type: int
-
alpha_posterior
¶ Posterior parameter alpha.
Returns: alpha Return type: float
-
beta_posterior
¶ Posterior parameter beta.
Returns: beta Return type: float
-
cdf
(x)¶ Cumulative distribution function of the posterior distribution.
Parameters: x (array-like) – Quantiles. Returns: cdf – Cumulative distribution function evaluated at x. Return type: numpy.ndarray
-
credible_interval
(interval_length)¶ Credible interval of the posterior distribution.
Parameters: interval_length (float (default=0.9)) – Compute interval_length
% credible interval. This is a value in [0, 1].Returns: interval – Lower and upper credible interval limits. Return type: tuple
-
mean
()¶ Mean of the posterior distribution.
Returns: mean Return type: float
-
pdf
(x)¶ Probability density function of the posterior distribution.
Parameters: x (array-like) – Quantiles. Returns: pdf – Probability density function evaluated at x. Return type: numpy.ndarray
-
ppf
(q)¶ Percent point function (quantile) of the posterior distribution.
Parameters: x (array-like) – Lower tail probability. Returns: ppf – Quantile corresponding to the lower tail probability q. Return type: numpy.ndarray
-
ppmean
()¶ Posterior predictive mean.
If \(X\) follows a geometric distribution with parameter \(\lambda\), then the posterior predictive expected value is given by
\[\mathrm{E}[X] = \frac{\alpha + \beta - 1}{\alpha - 1},\]where \(\alpha\) and \(\beta\) are the posterior values of the parameters.
Returns: mean Return type: float
-
pppdf
(x)¶ Posterior predictive probability density function.
If \(X\) follows a geometric distribution with parameter \(p \sim \mathcal{B}(\alpha, \beta)\), then the posterior predictive probability density function is given by
\[f(x; \alpha, \beta) = \frac{B(\alpha + 1, \beta + x - 1)}{B( \alpha, \beta)},\]where \(\alpha\) and \(\beta\) are the posterior values of the parameters.
Parameters: x (array-like) – Quantiles. Returns: pdf – Probability density function evaluated at x. Return type: float
-
ppvar
()¶ Posterior predictive variance.
If \(X\) follows a geometric distribution with parameter \(p \sim \mathcal{B}(\alpha, \beta)\), then the posterior predictive variance is given by
\[\mathrm{Var}[X] = \frac{\beta (\alpha + \beta - 1)}{ (\alpha - 1)^2 (\alpha - 2)},\]where \(\alpha\) and \(\beta\) are the posterior values of the parameters.
Returns: var Return type: float
-
rvs
(size=1, random_state=None)¶ Random variates of the posterior distribution.
Parameters: - size (int (default=1)) – Number of random variates.
- random_state (int or None (default=None)) – The seed used by the random number generator.
Returns: rvs – Random variates of given size.
Return type: numpy.ndarray or scalar
-
std
()¶ Standard deviation of the posterior distribution.
Returns: std Return type: float
-
update
(data)¶ Update posterior parameters with new data samples.
Parameters: data (array-like, shape = (n_samples)) – Data samples from a geometric distribution.
-
var
()¶ Variance of the posterior distribution.
Returns: var Return type: float
-
class
cprior.models.
GeometricABTest
(modelA, modelB, simulations=1000000, random_state=None)¶ Bases:
cprior.cdist.beta.BetaABTest
Geometric A/B test.
Parameters: - modelA (object) – The control model.
- modelB (object) – The variation model.
- simulations (int or None (default=1000000)) – Number of Monte Carlo simulations.
- random_state (int or None (default=None)) – The seed used by the random number generator.
-
expected_loss
(method='exact', variant='A', lift=0, mlhs_samples=10000)¶ Compute the expected loss. This is the expected uplift lost by choosing a given variant.
- If
variant == "A"
, \(\mathrm{E}[\max(B - A - lift, 0)]\) - If
variant == "B"
, \(\mathrm{E}[\max(A - B - lift, 0)]\) - If
variant == "all"
, both.
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns: expected_loss
Return type: float or tuple of floats
- If
-
expected_loss_ci
(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the difference distribution of \(Z = B-A\) and/or \(Z = A-B\).
- If
variant == "A"
, \(Z = B - A\) - If
variant == "B"
, \(Z = A - B\) - If
variant == "all"
, both.
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”.
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- interval_length (float (default=0.9)) – Compute
interval_length
% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI
) and Equal-tailed interval (method="ETI"
). Currently,method="HDI
is only available formethod="MC"
.
Returns: expected_loss_ci
Return type: np.ndarray or tuple of np.ndarray
- If
-
expected_loss_relative
(method='exact', variant='A')¶ Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift.
- If
variant == "A"
, \(\mathrm{E}[(B - A) / A]\) - If
variant == "B"
, \(\mathrm{E}[(A - B) / B]\) - If
variant == "all"
, both.
Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
Returns: expected_loss_relative
Return type: float or tuple of floats
- If
-
expected_loss_relative_ci
(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the relative difference distribution of \(Z = (B-A)/A\) and/or \(Z = (A-B)/B\).
- If
variant == "A"
, \(Z = (B-A)/A\) - If
variant == "B"
, \(Z = (A-B)/B\) - If
variant == "all"
, both.
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”.
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- interval_length (float (default=0.9)) – Compute
interval_length
% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI
) and Equal-tailed interval (method="ETI"
). Currently,method="HDI
is only available formethod="MC"
.
Returns: expected_loss_relative_ci
Return type: np.ndarray or tuple of np.ndarray
- If
-
probability
(method='exact', variant='A', lift=0, mlhs_samples=10000)¶ Compute the error probability or chance to beat control.
- If
variant == "A"
, \(P[A > B + lift]\) - If
variant == "B"
, \(P[B > A + lift]\) - If
variant == "all"
, both.
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns: probability
Return type: float or tuple of floats
- If
-
update_A
(data)¶ Update posterior parameters for variant A with new data samples.
Parameters: data (array-like, shape = (n_samples)) –
-
update_B
(data)¶ Update posterior parameters for variant B with new data samples.
Parameters: data (array-like, shape = (n_samples)) –
-
class
cprior.models.
GeometricMVTest
(models, simulations=1000000, random_state=None, n_jobs=None)¶ Bases:
cprior.cdist.beta.BetaMVTest
Geometric Multivariate test.
Parameters: - models (dict) – The control and variations models.
- simulations (int or None (default=1000000)) – Number of Monte Carlo simulations.
- random_state (int or None (default=None)) – The seed used by the random number generator.
-
expected_loss
(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)¶ Compute the expected loss. This is the expected uplift lost by choosing a given variant, i.e., \(\mathrm{E}[\max(control - variant - lift, 0)]\).
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns: expected_loss
Return type: float
-
expected_loss_ci
(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the difference distribution of \(Z = control-variant\).
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”.
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- interval_length (float (default=0.9)) – Compute
interval_length
% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI
) and Equal-tailed interval (method="ETI"
). Currently,method="HDI
is only available formethod="MC"
.
Returns: expected_loss_ci
Return type: np.ndarray or tuple of np.ndarray
-
expected_loss_relative
(method='exact', control='A', variant='B')¶ Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift, i.e., \(\mathrm{E}[(control - variant) / variant]\).
Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
Returns: expected_loss_relative
Return type: float
-
expected_loss_relative_ci
(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the relative difference distribution of \(Z = (control - variant) / variant\).
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”.
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- interval_length (float (default=0.9)) – Compute
interval_length
% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI
) and Equal-tailed interval (method="ETI"
). Currently,method="HDI
is only available formethod="MC"
.
Returns: expected_loss_relative_ci
Return type: np.ndarray or tuple of np.ndarray
-
expected_loss_relative_vs_all
(method='quad', control='A', variant='B', mlhs_samples=1000)¶ Compute the expected relative loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[(\max(A, C, D) - B) / B]\).
Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
- variant (str (default="B")) – The chosen variant.
- mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns: expected_loss_relative_vs_all
Return type: float
-
expected_loss_vs_all
(method='quad', variant='B', lift=0, mlhs_samples=1000)¶ Compute the expected loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[\max(\max(A, C, D) - B, 0)]\).
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
- variant (str (default="B")) – The chosen variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns: expected_loss_vs_all
Return type: float
-
probability
(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)¶ Compute the error probability or chance to beat control, i.e., \(P[variant > control + lift]\).
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns: probability
Return type: float
-
probability_vs_all
(method='quad', variant='B', lift=0, mlhs_samples=1000)¶ Compute the error probability or chance to beat all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(P[B > \max(A, C, D) + lift]\).
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
- variant (str (default="B")) – The chosen variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns: probability_vs_all
Return type: float
-
update
(data, variant)¶ Update posterior parameters for a given variant with new data samples.
Parameters: - data (array-like, shape = (n_samples)) –
- variant (str) –