The largest dataset size of many specific problems (e.g., object shape/normal, pose understanding, etc.) is often under 100k, i.e., 5×1045 \times 10^4 times smaller than LAION-5B. → Need to deal with overfitting

ControlNet clones the weights to "trainable copy" and a "locked copy". These two NN blocks are connected with zero convolution (i.e., initialized as zero). → Intuition: zero-conv prevents from adding additional noise to deep features → The training is much faster than training from scratch