# Utilities for Bayesian models and distributions¶

## Confidence/credible intervals¶

Credible intervals quantify the uncertainty of a parameter by providing the range of values containing the true parameter value with a given probability. Credible intervals can be calculated using the equal-tailed quantile method (ETI) or the highest posterior density method (HDI).

The HDI method calculates the interval such that $$P(l < z < u) = 1 - \delta$$, where $$\delta$$ denotes the significant level, and $$l$$ and $$u$$ denote the lower and upper bound of the interval, respectively. The HDI computes the narrowest interval by solving the minimization problem, see [CS99]

$\underset{l < u}{\text{min}}\left(|f(u) - f(l)| + |F(u) - F(l) - (1 -\delta)| \right).$

We reformulate the problem by removing absolute values and adding the narrowest interval $$u - l$$ on the objective function,

$\begin{split}\underset{u,l, t, w}{\text{min}} &\quad t + w + u - l\\ \text{s.t.} &\quad -t + f(u) - f(l) \ge 0\\ &\quad t + f(u) - f(l) \ge 0\\ &\quad -w + F(u) - F(l) - (1 -\delta) \ge 0\\ &\quad w + F(u) - F(l) - (1 -\delta) \ge 0\\ &\quad u - l - \epsilon \ge 0\\ &\quad l \in [l_{\min}, l_{\max}]]\\ &\quad u \in [u_{\min}, u_{\max}]\end{split}$

where $$\epsilon > 0$$, say $$\epsilon = 0.000001$$. Parameters $$l_{\min}$$, $$l_{\max}$$, $$u_{\min}$$ and $$u_{\max}$$ denote the bounds for the interval limits $$l$$ and $$u$$, there are dependent on the statistical distribution support.

cprior.cdist.ci_interval(x, interval_length=0.9, method='ETI')

Compute confidence/credible intervals (CI).

Parameters: x (array-like, shape = (n_samples)) – interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1]. method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (method="HDI") and Equal-tailed interval (method="ETI"). ci_interval – The lower and upper limit of the interval. numpy.ndarray

Example

>>> from scipy import stats
>>> from cprior.cdist import ci_interval
>>> x = stats.norm.rvs(size=int(1e6), random_state=42)
>>> ci_interval(x=x, interval_length=0.95, method="ETI")
array([-1.96315029,  1.95842544])
>>> ci_interval(x=x, interval_length=0.95, method="HDI")
array([-1.95282024,  1.9679026 ])

cprior.cdist.ci_interval_exact(dist, interval_length=0.9, method='ETI', bounds=None)

Compute exact confidence/credible intervals (CI).

Parameters: dist (object) – A class representing a probability distribution with methods pdf, cdf and ppf. interval_length (float (default=0.9)) – Compute interval_length% credible interval. This is a value in [0, 1]. method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (method="HDI") and Equal-tailed interval (method="ETI"). bounds (list or None (default=None)) – Sequence of (min, max) pairs for lower and upper limit of the interval. None is used to specify no bound. ci_interval – The lower and upper limit of the interval. numpy.ndarray

Example

>>> from scipy import stats
>>> from cprior.cdist import ci_interval
>>> dist = stats.beta(4, 10)
>>> ci_interval_exact(dist=dist, interval_length=0.9, method="ETI")
array([0.11266578, 0.49464973])
>>> bounds = [(0, 1), (0, 1)]
>>> ci_interval_exact(dist=dist, interval_length=0.9, method="HDI", bounds=bounds)
array([0.09439576, 0.46944915])


## References¶

 [CS99] Mi. Chen and Q. Shao. Monte Carlo Estimation of Bayesian Credible and HPD Intervals. Journal of Computational and Graphical Statistics, 8(1):69–92, 1999.