This project lists open-form narratives and the closed-form distributions that approximate them. Its intent is to help you build estimable statistical models on a sound micro-level foundation.

Here is a simple example of going from a real-world situation to an estimable mathematical model:

*Narrative*: Make a large series of independent, identically distributed (iid) draws from a source.
Take the mean of those draws.

*Distribution*: The distribution of repeated means will be a Normal distribution.

If you wanted to write a simulation, in which individual agents each experience some iid shock and their mean level is measured and reported, this wonderful piece of mathematics just saved you the trouble.

Now you can focus your energies on the more novel parts of the storyline. Of course, those too may have closed-form shortcuts that save you the trouble of writing down an open-form simulation. Your final model may wind up being a combination of closed-form submodels.

None of this is novel, and every narrative-to-distribution should have a reference to an existing work, (including Wikipedia, because this is uncontroversial, textbook stuff). However, it is being presented in what seems to be a novel way, to facilitate the development of detailed micro-level narratives using known bulding blocks where they are available.

The fact that we are relying on so many existing sources means that we don't need to provide proofs here, unless they are useful for elucidating the transformation.

Also, estimation is often not a trivial matter. Some of these examples may break for small $N$. We may add these notes later, but at this stage it would be nice to just get down as many narratives as possible, and leave the estimation details to the references.

$\def\Re{{\mathbb R}} \def\datas{{\mathbb D}} \def\params{{\mathbb P}} \def\models{{\mathbb M}} \def\mod#1{M_{#1}}$

**Q: Where's the $\chi^2$ distribution?** Given a series of $n$ Normally distributed
variables, the sum of their squares has a $\chi^2_n$ Distribution. So one could
conceivably describe a micro-level narrative that produces a $\chi^2$ outcome based
on the Normal narrative, but it is a rather convoluted storyline: start with $n$
iid sets, make distinct pools of draws from each of them, take their separate means,
take the square of each mean, then sum the squares. Here is Kmenta on
the implausibility of this storyline: “There are no noted parent populations whose
distributions could be described by the chi-squared distribution.'' The process by which a
$\chi^2$ Distribution is generated is therefore best described as a transformation of
existing distributions, not as a micro-level narrative.

This list will not cover methods by which one distribution can be transformed into another. Stats textbooks and Wikipedia do a fine job on transforming distributions to other distributions, so we leave that level of work out of scope.

*Narrative*: The coin-flip: one hit occurs with probability $p$.

*Distribution*: This defines the Bernoulli Distribution, which is one with probability $p$ and zero with
probability $1-p$.

*Notes*: The variance of a Bernoulli Distribution is $p(1-p)$.

*Narrative*: Make $N$ draws, each of which hits with probability $p$.

*Distribution*: The hit count $x$ has a Binomial$(N, p, x)$ Distribution.

*Notes*: The mean of $x$ is $Np$ and the variance is $Np(1-p)$.

*Narrative*: The die-roll: each observation is from a list of possible outcomes, each with its own
probability of occurring, ${\bf p} = [p_1, p_2, ..., p_k]$. Exactly one event happens each time, so
$\sum_{i=1}^k p_i = 1$. We make $n$ draws. What is the $k$-dimensional vector of observed
outcomes?

*Distribution*: Multinomial$(n, {\bf p}, k)$.

*Narrative*: Start with a pool of $h$ hits and $m$ misses, so $N=h+m$, and the Bernoulli $p=h/N$.
What are the odds that we get $x$ hits from $n$ draws without replacement?

*Distribution*: Negative binomial$(h, m, n, x)$

*Narrative*: [The Pólya urn scheme] Start with a pool of $\alpha$ red and $\beta$ black balls. Draw a ball, note its color,
then return it and a duplicate to the urn (so if you draw a white ball, put that and
another white ball back in the urn). Repeat $n$ times; report the count of white balls.

*Distribution*: Beta-binomial$(n, \alpha, \beta, x)$

*Notes*: This is the posterior distribution from updating with a Beta distribution and a Binomial likelihood.

*Narrative*: Make a large series of independent, identically distributed (iid) draws from a source.
Report the mean of those draws.

*Distribution*: As $N\to\infty$, the distribution of repeated means will be a Normal Distribution, with mean $\mu=\sum x/N$ and $\sigma = \sum (x-\mu)^2/N$.
(klemens:modeling)

*Narrative*: Begin with a value $x$.
Make a large series of independent, identically distributed (iid) draws from a source.
Report the product, $x\cdot d_1 \cdot d_2 ...$.

*Distribution*: As $N\to\infty$, the distribution of products will be a Lognormal Distribution, with
mean $\mu=\ln(x) +(\sum_i \ln(d_i))/N$ and $\sigma = \sum_i (\ln(d_i)-\mu)^2/N$. That
is, the log of the products will be Normally distributed, and $\mu$ and $\sigma$
indicate the mean and standard deviation of the log. (klemens:modeling)

*Narrative*: On one axis, we have a series of iid draws, which generates a ${\cal N}(0, \sigma)$
distribution. On an orthogonal axis, another series of iid draws also generates a ${\cal N}(0, \sigma)$ distribution.
The observed scalar output $x$ is the magnitude of the resulting vector ($x=\sqrt{d_1^2+d_2^2}$).

*Distribution*: Rayleigh$(\sigma, x)$ distribution

*Narrative*: One Bernoulli (coin-flip) draw per period. What is the likelihood that the
first hit will occur at period $x$? (goswami:rao, p 9)

*Distribution*: Geometric$(p, x)$

*Narrative*: One Bernoulli (coin-flip) draw per period. What is the likelihood that we will have to wait
$x$ draws before we observe $k$ hits?

*Distribution*: Negative binomial$(k, p, x)$

*Narrative*: A system is represented by a column vector of states ${\bf S}$, which
changes states according to a Markov matrix $P$. What is the likelihood that it
will take $t$ steps to reach a steady state, where ${\bf S}_{t+1} = {\bf S}_t\cdot P = {\bf S}_t$?

*Distribution*: The Markov Geometric Distribution,
$$MGD(R, Q, t)={\bf 1}'(I-tR)^{-1}(tQ){\bf 1},$$
where $Q$ is the diagonal matrix matching the diagonal of $P$, $R$ is the
off-diagonal of $P$ (so $R=P-Q$), ${\bf 1}$ is an appropriately-sized column vector of ones,
and $I$ and appropriately-sized identity matrix.
(gani:jerwood, eqn 2.9), notation via (goswami:rao, p 197).

*Narrative*: Independent events (rainy day, landmine, bad data) occur at the
mean rate of $\lambda$ events per span (of time, space, et cetera). What is the
probability that there will be $t$ events in a single span?

*Distribution*: Poisson$(\lambda, t)$

*Narrative*: Events occur via a Poisson process ($\lambda$ events per time/space span).
What is the likelihood that the first event will occur within $x$ periods?

*Distribution*: Exponential$(\lambda, x)$

*Notes*: The distribution is memoryless: after the first event, the time to the next event is also Exponentially distributed.

*Narrative*: Events are a Poisson process: $\lambda$ events per time period. What is the
likleihood that we will observe $x$ events in a span of $k$ time units?

Spatial version: $\lambda$ events per spatial unit. What is the likelihood that we observe $x$ events over a distance or area of $k$ units?

*Distribution*: Gamma$(k, \lambda, x)$

*Narrative*: Events occur via a Poisson-like process ($\lambda$ events per time/space span),
so the likelihood of observing an event within $x$ periods is Exponential$(\lambda, x)$.
But the time scale is distorted, so $y=x^{1/\gamma}$. [Note that an additive or
multiplicative distortion would still give us an Exponential Distribution.]

Another way to describe this is that the event rate is changing with time. If $\gamma > 1$, events are more likely after some time. If $\gamma < 1$, events are more likely to occur early. (casella:berger, p 103)

*Distribution*: Weibull$(\gamma, \lambda, x)$.

*Narrative*: The first *order statistic* of a set of numbers $x$ is the smallest number in the
set; the second is the next-to-smallest, up to the largest order statistic, which
is $\max(x)$.

The $\alpha+\beta-1$ elements of $x$ are drawn from a Uniform$[0, 1]$ Distribution.

*Distribution*: Let $a$ be the possible values of the $\alpha$th order statistic; then $a$ has a Beta$(\alpha,\beta, a)$ Distribution.

*Narrative*: The density of events is via Poisson process, so span lenghts without events ($s$) are Exponentially distributed, with parameter $\lambda$.
What is the distribution of $\exp(s)$?
Because $s\geq 0$, $\exp(s)$ will be $\geq 1$, so if $x_m$ periods have already passed,
we will need to rescale so that the distribution is bounded below by $x_m$.

*Distribution*: Pareto$(x_m, \lambda)$

*Notes*: As with any continuous distribution with a bounded lower support, there exists some
pivot $x%$ such that $(1-x)%$ of the density in an observed set of draws is expected to be above $x%$ (e.g., 1% of the population
holds 99% of the wealth). Given this pivot $x$, $1-x^y%$ of the Pareto's density is above
$x^y%$, for all $y>0$. This property makes it popular for modeling income distributions.

The Pareto inherits the memoryless property from the Exponential. Given a Pareto$(x_1, \lambda)$ distribution, the portion greater than $x_2$ has a Pareto$(x_2, \lambda)$ distribution.

*Narrative*: We know the upper and lower bounds are $u$ and $l$, but believe that any draw within that
range is as likely as any other draw within that range.

*Distribution*: Uniform$(l, u)$

[casella:berger] George Casella and Roger L Berger. Inference. Duxbury Press, 1990.

[gani:jerwood] J. Gani and D. Jerwood. Markov chain methods in chain binomial epidemic models. 27 (3): 591--603, 1971. URL [goswami:rao] A Goswami and BV Rao. Course in Applied Stochastic Processes. Number 40 in Texts and Readings in Mathematics. Hindustan Book Agency, 2006.

[klemens:modeling] Ben Klemens. with Data: Tools and Techniques for Statistical Computing. Princeton University Press, 2008.

[kmenta] Jan Kmenta. of Econometrics. Macmillan Publishing Company, $2 nd$ edition, 1986.