NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Paper: https://arxiv.org/abs/2003.08934
Input / Output
: 3D locationx, y, z : Viewing direction\theta, \phi : emitted colorr, g, b volume density (much like opacity)\sigma  This only depends on
to have a consistency across viewpoints(x, y, z)
 This only depends on
Rendering (Basics)
Rendering is stragihtforward: (classic) volume rendering
: volume density at position{\color{blue} \mathbf{\sigma}(\mathbf{r}(t))} \mathbf{r}(t) : the color at position{\color{green} \mathbf{c}(\mathbf{r}(t), \mathbf{d})} viewed from direction\mathbf{r}(t) \mathbf{d} : the probability that the ray travels fromT(t) tot_\text{near} :t ~~~T(t) = \exp (\int_{\text{near}}^{t} \sigma(\mathbf{r}(s))ds)
Approximating this integral with the sum over discrete bins will suffer from low resolution.
In practice, they use hierarchical version of stratified sampling.
 stratified sampling: Helps to simulate smoother integration than relying on discrete uniform bins.
 hierarchical: Helps to allocate more samples to the region that affects rendering (i.e., avoid sampling a lot from empty space!)
Stratified Sampling
It partitions
It can simulate sampling from the entire space.
Key ideas / components
Naively training the network with above idea doesn't work. The key ideas are:
 Encouraging the representation to be multiview consistent
 restricting the network to predict
as a function of only the location\sigma x  <>
is predicted as a function of both location and viewing directionc
 restricting the network to predict
 Positional encoding
 It's a common knowledge that (sinusoidal) positional encoding helps NNs to fit to highfreq signal:
 Hierarchical sampling procedure
 Details below
Minor:
 Viewing direction
is concatenated to the feature vector in a middle layer of the network(\theta, \phi)
Training NeRF: Hierarchical sampling
Hierarchical sampling allocates more samples to the region that affects final rendering.
They simultaneously optimize two networks: coarse one and fine one.

Sample a set of
locations along the rayN_c using stratified sampling\mathbf{r} 
r_1 \ldots r_{N_c}


Evaluate the coarse network at these locations:

r_i \rightarrow \text{NeRF Network (Coarse)} \rightarrow \{c_i, \sigma_i\}


Compute coarse rendering based on the samples:

\hat{C}_\text{coarse}(\mathbf{r}) = \sum_{i=1}^{N_c}w_i c_i,~~w_i = T_i \cdot (1  \exp(\sigma_i \delta_i)) : the distance between adjacent samples\delta_i : the probability that the ray reaches the pointT_i i : volume density (i.e., opacity)\sigma_i


Normalize the above weights to form a piecewiseconstant PDF along the ray, and sample a second set of
locations from this distribution.N_f 
r'_1 \ldots r'_{N_f}


Evaluate the fine network at the all
locations:N_c + N_f 
r'_i \rightarrow \text{NeRF Network (Fine)} \rightarrow \{c'_i, \sigma'_i\} 
r_i \rightarrow \text{NeRF Network (Fine)} \rightarrow \{c_i, \sigma_i\}


Compute the final rendered color using all
samplesN_c + N_f 
\hat{C}_\text{fine}(\mathbf{r}) = \sum_{i=1}^{N_c}w_i c_i + \sum_{i=1}^{N_f}w'_i c'_i  Notice that second set of
samples are biased towards region with higherN_f \sigma
