Recurrent Neural Network

Recurrent neural network is a class of artificial neural network that are widely used for sequence learning. A basic deep neural network (DNN) takes the input and produce output. For example, in image recognition, the neural network only study the relation ship between the input image and it’s corresponding label. A basic DNN will not add any self-understanding to the input data. It just study the relationship between the input and output.

The RNN overcome the limit of basic DNN by take the “history” as a side input. The “history” is generated by the previous timestamp. In the current timestamp, the rnn will generate update the “history” for next use. The memory of “history” is kept as a internal state, which is called “hidden state”. The “hidden state” vector is used and updated repeatedly as we feed more input into the network.

rnn cell

By unfold the rnn layer, we will observe a sequence of hidden state passing through all the cells with in layer. However, in implementation, we only store one set of stateMap for one layer. The stateMap is then used and updated in the feed-forward pass.


Because of the state update mechanism, if we have a sequence of input, we need to feed them into the RNN layer one by one. In this way, given a sequence of input, we will get a sequence of output. When to use the output and when to do the backpropagation depends on specific use case. We only talk about how to config the Many-to-many RNN layer, which accutally a one-to-one RNN with multiple interations.

rnn many

Given a sequence of inputs \((x_1, …, x_T)\) a standard RNN computes a sequence of outputs \((y_1, …, y_T)\) by iterating the following equation:

The input and output of RNN layer are vector sequences rather than sequences of single values. The size of the input vector and the size of the output vector does not have to be equal. Typically, when we create a new LSTM layer we will specify the size of input vector and the size of output vector.

    // dl4j example
    new LSTM.Builder()

Usually, we convert the input sequence to a vector sequence by using one-hot vector representation. How about the length of the input sequence? The length of input is represented by the number of iterations we will loop with the RNN layer.

For the dataset with difference length of inputs, we will need to do different number of iterations. However, most of the deep learning frameworks requres a batch of input sequence can be aligned in a matrix, in other words, have equal length. To solve this, we use RNN masks to make them have equal length by add leading or trailing zeros. Refer to RNN masking for more details.