Distributions

Discrete

Binomial distribution

Discrete probability distribution of the number of successes/failures in a sequence of nn independent experiments. Each trial (or n=1n=1) is called Bernoulli trial (Bernoulli distribution).

The probability of getting kk successes in nn trials is:

Pr(X=k)=(nk)pk(1p)nkPr(X = k) = \binom{n}{k} p^k (1-p)^{n-k}

Beta distribution provides a prior for binomial distribution.

Multinomial distribution

Multinomial distribution is a generalization of binomial distribution. Instead of two outcomes (success/failure), multinomial distribution considers kk outcomes (e.g., kk-sided die).

Each trial (or n=1n=1) is called categorical distribution.

Poisson distribution

  • Discrete probability distribution
  • Models the number of events occurring in a fixed interval of time/space

Given the average number of events per interval λ\lambda, the probability of observing kk events in a given interval is:

P(X=k)=λkeλk!P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}

Continuous

Gaussian distribution

Definition of Gaussian distribution (for scalar xRx \in \mathbb{R}):

p(x;μ,σ)=1σ2πexp((xμ)22σ2)p(x; \mu, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}} \exp \left( - \frac{(x - \mu)^2}{2 \sigma^2} \right)

Gaussian noise model

In the context of (scalar) prediction, Gaussian noise model is

y=f(x;θ)+ν,  νN(0,σ2)y = f(\bm{x}; \theta) + \nu,~~\nu \sim \mathcal{N}(0, \sigma^2)
p(yx;θ,σ)=1σ2πexp((yf(x;θ))22σ2)p(y | \bm{x}; \theta, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}} \exp \left( - \frac{(y - f(\bm{x}; \theta))^2}{2 \sigma^2} \right)

Let's think about maximizing the log-likelihood of the data D={(xi,yi)}i=1N\mathcal{D} = \{(\bm{x}_i, y_i)\}_{i=1}^N. Assuming the data is i.i.d.,

p(yX;θ,σ)=i=1Np(yixi;θ,σ)p(\bm{y} | X; \theta, \sigma) = \prod_{i=1}^N p(y_i | \bm{x}_i; \theta, \sigma)
θlogp(yX;θ,σ)=θi=1Nlogp(yixi;θ,σ)=θi=1N[log1σ2π(yif(xi;θ))22σ2]=θi=1N[(yif(xi;θ))22σ2]=12σ2θi=1N(yif(xi;θ))2\begin{align*} \frac{\partial}{\partial \theta} \log p(\bm{y} | X; \theta, \sigma) &= \frac{\partial}{\partial \theta} \sum_{i=1}^N \log p(y_i | \bm{x}_i; \theta, \sigma) \\ &= \frac{\partial}{\partial \theta} \sum_{i=1}^N \left[ \log \frac{1}{\sigma \sqrt{2 \pi}} - \frac{(y_i - f(\bm{x}_i; \theta))^2}{2 \sigma^2} \right] \\ &= \frac{\partial}{\partial \theta} \sum_{i=1}^N \left[ - \frac{(y_i - f(\bm{x}_i; \theta))^2}{2 \sigma^2} \right] \\ &= - \frac{1}{2 \sigma^2} \frac{\partial}{\partial \theta} \sum_{i=1}^N (y_i - f(\bm{x}_i; \theta))^2 \\ \end{align*}

Others

Thomson sampling

A heuristic for multi-armed bandit problem, addressing the exploration-exploitation dilemma.