Discrete probability distribution of the number of successes/failures in a sequence of n independent experiments.
Each trial (or n=1) is called Bernoulli trial (Bernoulli distribution).
The probability of getting k successes in n trials is:
Pr(X=k)=(kn)pk(1−p)n−k
Beta distribution provides a prior for binomial distribution.
Multinomial distribution
Multinomial distribution is a generalization of binomial distribution.
Instead of two outcomes (success/failure), multinomial distribution considers k outcomes (e.g., k-sided die).
Each trial (or n=1) is called categorical distribution.
Poisson distribution
Discrete probability distribution
Models the number of events occurring in a fixed interval of time/space
Given the average number of events per interval λ,
the probability of observing k events in a given interval is:
P(X=k)=k!λke−λ
Continuous
Gaussian distribution
Definition of Gaussian distribution (for scalar x∈R):
p(x;μ,σ)=σ2π1exp(−2σ2(x−μ)2)
Gaussian noise model
In the context of (scalar) prediction, Gaussian noise model is
y=f(x;θ)+ν,ν∼N(0,σ2)
p(y∣x;θ,σ)=σ2π1exp(−2σ2(y−f(x;θ))2)
Let's think about maximizing the log-likelihood of the data D={(xi,yi)}i=1N. Assuming the data is i.i.d.,