Beta distribution¶
The probability density function of the beta distribution \(\mathcal{B}(\alpha, \beta)\) with two shape parameters \(\alpha, \beta > 0\), for \(x \in [0, 1]\), is defined by
and the cumulative distribution function is
where \(B(\alpha, \beta)\) is the beta function and \(I_x(\alpha, \beta)\) is the regularized incomplete beta function. The expected value and variance are as follows
In Bayesian inference, the beta distribution is the conjugate prior probability distribution for parameters of the Bernoulli, binomial, negative binomial and geometric distribution. The beta distribution is a suitable model for the random behaviour of percentages and proportions.
-
class
cprior.cdist.
BetaModel
(name='', alpha=1, beta=1)¶ Bases:
cprior.cdist.base.BayesModel
Beta conjugate prior distribution model.
Parameters: - alpha (int or float (default=1)) – Prior parameter alpha.
- beta (int or float (default=1)) – Prior parameter beta.
-
alpha_posterior
¶ Posterior parameter alpha.
Returns: alpha Return type: float
-
beta_posterior
¶ Posterior parameter beta.
Returns: beta Return type: float
-
cdf
(x)¶ Cumulative distribution function of the posterior distribution.
Parameters: x (array-like) – Quantiles. Returns: cdf – Cumulative distribution function evaluated at x. Return type: numpy.ndarray
-
credible_interval
(interval_length)¶ Credible interval of the posterior distribution.
Parameters: interval_length (float (default=0.9)) – Compute interval_length
% credible interval. This is a value in [0, 1].Returns: interval – Lower and upper credible interval limits. Return type: tuple
-
mean
()¶ Mean of the posterior distribution.
Returns: mean Return type: float
-
pdf
(x)¶ Probability density function of the posterior distribution.
Parameters: x (array-like) – Quantiles. Returns: pdf – Probability density function evaluated at x. Return type: numpy.ndarray
-
ppf
(q)¶ Percent point function (quantile) of the posterior distribution.
Parameters: x (array-like) – Lower tail probability. Returns: ppf – Quantile corresponding to the lower tail probability q. Return type: numpy.ndarray
-
rvs
(size=1, random_state=None)¶ Random variates of the posterior distribution.
Parameters: - size (int (default=1)) – Number of random variates.
- random_state (int or None (default=None)) – The seed used by the random number generator.
Returns: rvs – Random variates of given size.
Return type: numpy.ndarray or scalar
-
std
()¶ Standard deviation of the posterior distribution.
Returns: std Return type: float
-
var
()¶ Variance of the posterior distribution.
Returns: var Return type: float
-
class
cprior.cdist.
BetaABTest
(modelA, modelB, simulations=None, random_state=None)¶ Bases:
cprior.cdist.base.BayesABTest
Bayesian A/B testing with prior beta distribution.
Parameters: - modelA (object) – The beta model for variant A.
- modelB (object) – The beta model for variant B.
- simulations (int or None (default=1000000)) – Number of Monte Carlo simulations.
- random_state (int or None (default=None)) – The seed used by the random number generator.
-
expected_loss
(method='exact', variant='A', lift=0, mlhs_samples=10000)¶ Compute the expected loss. This is the expected uplift lost by choosing a given variant.
- If
variant == "A"
, \(\mathrm{E}[\max(B - A - lift, 0)]\) - If
variant == "B"
, \(\mathrm{E}[\max(A - B - lift, 0)]\) - If
variant == "all"
, both.
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns: expected_loss
Return type: float or tuple of floats
- If
-
expected_loss_ci
(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the difference distribution of \(Z = B-A\) and/or \(Z = A-B\).
- If
variant == "A"
, \(Z = B - A\) - If
variant == "B"
, \(Z = A - B\) - If
variant == "all"
, both.
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”.
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- interval_length (float (default=0.9)) – Compute
interval_length
% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI
) and Equal-tailed interval (method="ETI"
). Currently,method="HDI
is only available formethod="MC"
.
Returns: expected_loss_ci
Return type: np.ndarray or tuple of np.ndarray
- If
-
expected_loss_relative
(method='exact', variant='A')¶ Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift.
- If
variant == "A"
, \(\mathrm{E}[(B - A) / A]\) - If
variant == "B"
, \(\mathrm{E}[(A - B) / B]\) - If
variant == "all"
, both.
Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
Returns: expected_loss_relative
Return type: float or tuple of floats
- If
-
expected_loss_relative_ci
(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the relative difference distribution of \(Z = (B-A)/A\) and/or \(Z = (A-B)/B\).
- If
variant == "A"
, \(Z = (B-A)/A\) - If
variant == "B"
, \(Z = (A-B)/B\) - If
variant == "all"
, both.
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”.
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- interval_length (float (default=0.9)) – Compute
interval_length
% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI
) and Equal-tailed interval (method="ETI"
). Currently,method="HDI
is only available formethod="MC"
.
Returns: expected_loss_relative_ci
Return type: np.ndarray or tuple of np.ndarray
- If
-
probability
(method='exact', variant='A', lift=0, mlhs_samples=10000)¶ Compute the error probability or chance to beat control.
- If
variant == "A"
, \(P[A > B + lift]\) - If
variant == "B"
, \(P[B > A + lift]\) - If
variant == "all"
, both.
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns: probability
Return type: float or tuple of floats
- If
-
update_A
(data)¶ Update posterior parameters for variant A with new data samples.
Parameters: data (array-like, shape = (n_samples)) –
-
update_B
(data)¶ Update posterior parameters for variant B with new data samples.
Parameters: data (array-like, shape = (n_samples)) –
-
class
cprior.cdist.
BetaMVTest
(models, simulations=None, random_state=None, n_jobs=None)¶ Bases:
cprior.cdist.base.BayesMVTest
Bayesian Multivariate testing with prior beta distribution.
Parameters: - models (object) – The beta models.
- simulations (int or None (default=1000000)) – Number of Monte Carlo simulations.
- random_state (int or None (default=None)) – The seed used by the random number generator.
-
expected_loss
(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)¶ Compute the expected loss. This is the expected uplift lost by choosing a given variant, i.e., \(\mathrm{E}[\max(control - variant - lift, 0)]\).
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns: expected_loss
Return type: float
-
expected_loss_ci
(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the difference distribution of \(Z = control-variant\).
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”.
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- interval_length (float (default=0.9)) – Compute
interval_length
% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI
) and Equal-tailed interval (method="ETI"
). Currently,method="HDI
is only available formethod="MC"
.
Returns: expected_loss_ci
Return type: np.ndarray or tuple of np.ndarray
-
expected_loss_relative
(method='exact', control='A', variant='B')¶ Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift, i.e., \(\mathrm{E}[(control - variant) / variant]\).
Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
Returns: expected_loss_relative
Return type: float
-
expected_loss_relative_ci
(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the relative difference distribution of \(Z = (control - variant) / variant\).
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”.
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- interval_length (float (default=0.9)) – Compute
interval_length
% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI
) and Equal-tailed interval (method="ETI"
). Currently,method="HDI
is only available formethod="MC"
.
Returns: expected_loss_relative_ci
Return type: np.ndarray or tuple of np.ndarray
-
expected_loss_relative_vs_all
(method='quad', control='A', variant='B', mlhs_samples=1000)¶ Compute the expected relative loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[(\max(A, C, D) - B) / B]\).
Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
- variant (str (default="B")) – The chosen variant.
- mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns: expected_loss_relative_vs_all
Return type: float
-
expected_loss_vs_all
(method='quad', variant='B', lift=0, mlhs_samples=1000)¶ Compute the expected loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[\max(\max(A, C, D) - B, 0)]\).
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
- variant (str (default="B")) – The chosen variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns: expected_loss_vs_all
Return type: float
-
probability
(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)¶ Compute the error probability or chance to beat control, i.e., \(P[variant > control + lift]\).
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns: probability
Return type: float
-
probability_vs_all
(method='quad', variant='B', lift=0, mlhs_samples=1000)¶ Compute the error probability or chance to beat all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(P[B > \max(A, C, D) + lift]\).
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
- variant (str (default="B")) – The chosen variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns: probability_vs_all
Return type: float
-
update
(data, variant)¶ Update posterior parameters for a given variant with new data samples.
Parameters: - data (array-like, shape = (n_samples)) –
- variant (str) –