Rough Summary: DNC
- This paper introduces Differential Neural Computer (DNC)
- Neural Nets cannot handle the tasks that requires complex data structures due to lack of external memory
- DNC is fully differentiable and is capable to use external memory matrix just like random-access memory in computers
- DNC can learn to solve tasks not by being explicitly programmed but by being trained just as other machine learning models
- DNC can handle external read-write memory, and that enables it to solve complex and structured tasks such as moving block puzzle, finding the shortest path and inferring missing links in a graph
Existing neural networks cannot dynamically allocate new spaces in memory when the memory demands of a task increase (e.g., Size of hidden state vector of LSTMs is fixed)
Model: Differential Neural Computer (DNC)
First of all, let's describe the model at high level.
The model can be decomposed into four components:
- Read and write heads
- Memory usage and temporal links
As you can guess from the structure, this is conceptually identical to Turing Machine, which has a controller to move its tapes (memory) to the left or right, and heads to read and write on the tapes.
In this paper, the Controller is just a simple variant of LSTM that spits out what they call interface vector
This vector facilitates where and what to read and write in the memory.
In the following, I'll describe at high level how DNCs write to and read from the memory using interface vector
--- Writing ---
First, write key (bottom in the green box) is used to look up the memory locations whose content are similar to the key.
The model then confirms if those locations are already used or not with Memory usage (d in the figure) to decide where to write (green arrow).
It represents where to write in terms of write weighting
The equation for the "erase and overwrite" shown in the paper is pretty straight forward:
--- Reading ---
As in writing, the memory location whose content is similar to read key is looked up first.
Then, instead of just reading from the location, the model performs a specific read operation of read mode.
There are three read modes: Backward, Content and Forward.
In Content mode, the model just reads from the location being looked up by the read key.
When the mode is Forward or Backward, it ignores the key and jumps forward or backward to the memory location which is edited after or before when current location is edited.
These modes often enable the model to easily read from the memory in the correct order (reading from the memory in the order those are written makes sense in many cases, as can be seen in stack and queue.)
The order of the memory locations that are written is retained in temporal links shown in d.
Again, following the process described above, the model calculates read weightings
The equation for this is even simpler.
The fact that there are multiple read weightings corresponds to having multiple read heads in a Turing Machine. The model reads from the memory to create
A bit more details
We didn't dig deep about how to calculate write weighting
In this post, instead of fully describe how to calculate those vectors, I will just explain one of the important concepts used in the paper.
We have seen that the memory locations that have similar content to a key are looked up both in reading and writing.
Content-based addressing exactly does the lookup operation.
Now, let's say we have a key vector
The equation shown in the paper is: