Negative binomial distribution¶
The negative binomial is a discrete probability distribution of the number of successes in a sequence of Bernoulli independent trials with probability of success \(p\) before a number of \(r\) failures occurs. The probability mass function for \(k \in \{0, 1, \ldots, m\}\) is
and the cumulative distribution function is
where \(I_x(a, b)\) is the regularized incomplete beta function. The expected value and variance are as follows
The negative binomial distribution can be used to model how many clicks are required before clicking a particular bottom of interest or stop clicking.
-
class
cprior.models.
NegativeBinomialModel
(r, name='', alpha=1, beta=1)¶ Bases:
cprior.cdist.beta.BetaModel
Bayesian model with a negative binomial likelihood and a beta prior distribution.
Given data samples \(\mathbf{x} = (x_1, \ldots, x_n)\) from a negative binomial distribution with parameters \(r\) and \(p\), the posterior distribution is
\[p | \mathbf{x} \sim \mathcal{B}\left(\alpha + rn, \beta + \sum_{i=1}^n x_i\right),\]with prior parameters \(\alpha\) and \(\beta\).
Parameters: - r (int) – Number of failures.
- name (str (default="")) – Model name.
- alpha (int or float (default=1)) – Prior parameter alpha.
- beta (int or float (default=1)) – Prior parameter beta.
-
n_samples_
¶ Number of samples.
Type: int
-
alpha_posterior
¶ Posterior parameter alpha.
Returns: alpha Return type: float
-
beta_posterior
¶ Posterior parameter beta.
Returns: beta Return type: float
-
cdf
(x)¶ Cumulative distribution function of the posterior distribution.
Parameters: x (array-like) – Quantiles. Returns: cdf – Cumulative distribution function evaluated at x. Return type: numpy.ndarray
-
credible_interval
(interval_length)¶ Credible interval of the posterior distribution.
Parameters: interval_length (float (default=0.9)) – Compute interval_length
% credible interval. This is a value in [0, 1].Returns: interval – Lower and upper credible interval limits. Return type: tuple
-
mean
()¶ Mean of the posterior distribution.
Returns: mean Return type: float
-
pdf
(x)¶ Probability density function of the posterior distribution.
Parameters: x (array-like) – Quantiles. Returns: pdf – Probability density function evaluated at x. Return type: numpy.ndarray
-
ppf
(q)¶ Percent point function (quantile) of the posterior distribution.
Parameters: x (array-like) – Lower tail probability. Returns: ppf – Quantile corresponding to the lower tail probability q. Return type: numpy.ndarray
-
ppmean
()¶ Posterior predictive mean.
If \(X\) follows a negative binomial distribution with parameters \(r\) and \(p\), then the posterior predictive expected value is given by
\[\mathrm{E}[X] = r \frac{\beta}{\alpha - 1},\]where \(\alpha\) and \(\beta\) are the posterior values of the parameters.
Returns: mean Return type: float
-
pppdf
(x)¶ Posterior predictive probability density function.
If \(X\) follows a negative binomial distribution with parameters \(r\) and \(p\), then the posterior predictive probability density function is given by
\[f(x; r, \alpha, \beta) = \binom{x + r - 1}{r - 1} \frac{B(\alpha + r, \beta + x)}{B(\alpha, \beta)},\]where \(\alpha\) and \(\beta\) are the posterior values of the parameters.
Parameters: x (array-like) – Quantiles. Returns: pdf – Probability density function evaluated at x. Return type: float
-
ppvar
()¶ Posterior predictive variance.
If \(X\) follows a negative binomial distribution with parameters \(r\) and \(p\), then the posterior predictive variance is given by
\[\mathrm{Var}[X] = \frac{r \beta (\alpha + r - 1)( \alpha + \beta - 1)}{(\alpha - 1)^2 (\alpha - 2)},\]where \(\alpha\) and \(\beta\) are the posterior values of the parameters.
Returns: var Return type: float
-
rvs
(size=1, random_state=None)¶ Random variates of the posterior distribution.
Parameters: - size (int (default=1)) – Number of random variates.
- random_state (int or None (default=None)) – The seed used by the random number generator.
Returns: rvs – Random variates of given size.
Return type: numpy.ndarray or scalar
-
std
()¶ Standard deviation of the posterior distribution.
Returns: std Return type: float
-
update
(data)¶ Update posterior parameters with new data samples.
Parameters: data (array-like, shape = (n_samples)) – Data samples from a negative binomial distribution.
-
var
()¶ Variance of the posterior distribution.
Returns: var Return type: float
-
class
cprior.models.
NegativeBinomialABTest
(modelA, modelB, simulations=1000000, random_state=None)¶ Bases:
cprior.cdist.beta.BetaABTest
Negative binomial A/B test.
Parameters: - modelA (object) – The control model.
- modelB (object) – The variation model.
- simulations (int or None (default=1000000)) – Number of Monte Carlo simulations.
- random_state (int or None (default=None)) – The seed used by the random number generator.
-
expected_loss
(method='exact', variant='A', lift=0, mlhs_samples=10000)¶ Compute the expected loss. This is the expected uplift lost by choosing a given variant.
- If
variant == "A"
, \(\mathrm{E}[\max(B - A - lift, 0)]\) - If
variant == "B"
, \(\mathrm{E}[\max(A - B - lift, 0)]\) - If
variant == "all"
, both.
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns: expected_loss
Return type: float or tuple of floats
- If
-
expected_loss_ci
(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the difference distribution of \(Z = B-A\) and/or \(Z = A-B\).
- If
variant == "A"
, \(Z = B - A\) - If
variant == "B"
, \(Z = A - B\) - If
variant == "all"
, both.
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”.
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- interval_length (float (default=0.9)) – Compute
interval_length
% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI
) and Equal-tailed interval (method="ETI"
). Currently,method="HDI
is only available formethod="MC"
.
Returns: expected_loss_ci
Return type: np.ndarray or tuple of np.ndarray
- If
-
expected_loss_relative
(method='exact', variant='A')¶ Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift.
- If
variant == "A"
, \(\mathrm{E}[(B - A) / A]\) - If
variant == "B"
, \(\mathrm{E}[(A - B) / B]\) - If
variant == "all"
, both.
Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
Returns: expected_loss_relative
Return type: float or tuple of floats
- If
-
expected_loss_relative_ci
(method='MC', variant='A', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the relative difference distribution of \(Z = (B-A)/A\) and/or \(Z = (A-B)/B\).
- If
variant == "A"
, \(Z = (B-A)/A\) - If
variant == "B"
, \(Z = (A-B)/B\) - If
variant == "all"
, both.
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”.
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- interval_length (float (default=0.9)) – Compute
interval_length
% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI
) and Equal-tailed interval (method="ETI"
). Currently,method="HDI
is only available formethod="MC"
.
Returns: expected_loss_relative_ci
Return type: np.ndarray or tuple of np.ndarray
- If
-
probability
(method='exact', variant='A', lift=0, mlhs_samples=10000)¶ Compute the error probability or chance to beat control.
- If
variant == "A"
, \(P[A > B + lift]\) - If
variant == "B"
, \(P[B > A + lift]\) - If
variant == "all"
, both.
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- variant (str (default="A")) – The chosen variant. Options are “A”, “B”, “all”.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns: probability
Return type: float or tuple of floats
- If
-
update_A
(data)¶ Update posterior parameters for variant A with new data samples.
Parameters: data (array-like, shape = (n_samples)) –
-
update_B
(data)¶ Update posterior parameters for variant B with new data samples.
Parameters: data (array-like, shape = (n_samples)) –
-
class
cprior.models.
NegativeBinomialMVTest
(models, simulations=1000000, random_state=None, n_jobs=None)¶ Bases:
cprior.cdist.beta.BetaMVTest
Negative binomial Multivariate test.
Parameters: - models (dict) – The control and variations models.
- simulations (int or None (default=1000000)) – Number of Monte Carlo simulations.
- random_state (int or None (default=None)) – The seed used by the random number generator.
-
expected_loss
(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)¶ Compute the expected loss. This is the expected uplift lost by choosing a given variant, i.e., \(\mathrm{E}[\max(control - variant - lift, 0)]\).
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns: expected_loss
Return type: float
-
expected_loss_ci
(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the difference distribution of \(Z = control-variant\).
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic” and “MC”.
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- interval_length (float (default=0.9)) – Compute
interval_length
% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI
) and Equal-tailed interval (method="ETI"
). Currently,method="HDI
is only available formethod="MC"
.
Returns: expected_loss_ci
Return type: np.ndarray or tuple of np.ndarray
-
expected_loss_relative
(method='exact', control='A', variant='B')¶ Compute expected relative loss for choosing a variant. This can be seen as the negative expected relative improvement or uplift, i.e., \(\mathrm{E}[(control - variant) / variant]\).
Parameters: - method (str (default="exact")) – The method of computation. Options are “exact” and “MC”.
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
Returns: expected_loss_relative
Return type: float
-
expected_loss_relative_ci
(method='MC', control='A', variant='B', interval_length=0.9, ci_method='ETI')¶ Compute credible intervals on the relative difference distribution of \(Z = (control - variant) / variant\).
Parameters: - method (str (default="MC")) – The method of computation. Options are “asymptotic”, “exact” and “MC”.
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- interval_length (float (default=0.9)) – Compute
interval_length
% credible interval. This is a value in [0, 1]. - ci_method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest
Density interval (
method="HDI
) and Equal-tailed interval (method="ETI"
). Currently,method="HDI
is only available formethod="MC"
.
Returns: expected_loss_relative_ci
Return type: np.ndarray or tuple of np.ndarray
-
expected_loss_relative_vs_all
(method='quad', control='A', variant='B', mlhs_samples=1000)¶ Compute the expected relative loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[(\max(A, C, D) - B) / B]\).
Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
- variant (str (default="B")) – The chosen variant.
- mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns: expected_loss_relative_vs_all
Return type: float
-
expected_loss_vs_all
(method='quad', variant='B', lift=0, mlhs_samples=1000)¶ Compute the expected loss against all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(\mathrm{E}[\max(\max(A, C, D) - B, 0)]\).
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
- variant (str (default="B")) – The chosen variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns: expected_loss_vs_all
Return type: float
-
probability
(method='exact', control='A', variant='B', lift=0, mlhs_samples=10000)¶ Compute the error probability or chance to beat control, i.e., \(P[variant > control + lift]\).
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="exact")) – The method of computation. Options are “exact”, “MC” (Monte Carlo) and “MLHS” (Monte Carlo + Median Latin Hypercube Sampling).
- control (str (default="A")) – The control variant.
- variant (str (default="B")) – The tested variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=10000)) – Number of samples for MLHS method.
Returns: probability
Return type: float
-
probability_vs_all
(method='quad', variant='B', lift=0, mlhs_samples=1000)¶ Compute the error probability or chance to beat all variations. For example, given variants “A”, “B”, “C” and “D”, and choosing variant=”B”, we compute \(P[B > \max(A, C, D) + lift]\).
If
lift
is positive value, the computation method must be Monte Carlo sampling.Parameters: - method (str (default="quad")) – The method of computation. Options are “MC” (Monte Carlo), “MLHS” (Monte Carlo + Median Latin Hypercube Sampling) and “quad” (numerical integration).
- variant (str (default="B")) – The chosen variant.
- lift (float (default=0.0)) – The amount of uplift.
- mlhs_samples (int (default=1000)) – Number of samples for MLHS method.
Returns: probability_vs_all
Return type: float
-
update
(data, variant)¶ Update posterior parameters for a given variant with new data samples.
Parameters: - data (array-like, shape = (n_samples)) –
- variant (str) –