Article RNN Generally speaking LSTM,GRU wait

CNN neutralization RNN in batchSize The default location for is different .

  • CNN in :batchsize The position is position 0.
  • RNN in :batchsize The position is position 1.

stay RNN Input data format in :

For the simplest RNN, We can call... In two ways ,torch.nn.RNNCell(), It only accepts... In the sequence Single step Input , The hidden state must be explicitly passed in .torch.nn.RNN() You can accept a Sequence The input of , By default, an all 0 The hidden state of , You can also declare your own hidden state .

  1. The input size is The three dimensional tensor[seq_len,batch_size,input_dim]
  • input_dim It's the dimension of input , For example 128
  • batch_size It's a trip RNN Enter the number of sentences , For example 5.
  • seq_len Is the maximum length of a sentence , such as 15
    So pay attention to ,RNN The input is a sequence , Input all the sentences in the batch at once , Got ouptut and hidden All the output and hidden states of this batch , Dimension is also three-dimensional .
    ** Now there are batch_size Independent RNN Components ,RNN The input dimension of is input_dim, Total input seq_len Time steps , Then each time step is input to the whole RNN The dimension of the module is [batch_size,input_dim]
# structure RNN The Internet ,x Dimensions 5, The dimension of the hidden layer 10, The number of layers of the network 2
rnn_seq = nn.RNN(5, 10,2)
# Construct an input sequence , Sentence length is 6,batch yes 3, The length of each word is 5 Vector representation of
x = torch.randn(6, 3, 5)
#out,ht = rnn_seq(x,h0)
out,ht = rnn_seq(x) #h0 You can specify or not specify

problem 1: here outht Of size How much is? ?
answer out:6 * 3 * 10, ht: 2 * 3 * 10,out Output dimension of [seq_len,batch_size,output_dim],ht Dimensions [num_layers * num_directions, batch, hidden_size], If it is One way single layer Of RNN So a sentence only has One hidden.
problem 2out[-1] and ht[-1] Whether it is equal or not ?
answer : equal , The hidden unit is the last unit of output , As you can imagine , Each output is actually the hidden unit of that time step

  1. RNN Other parameters of
RNN(input_dim ,hidden_dim ,num_layers ,…)
– input_dim Represents the characteristic dimension of the input
– hidden_dim Represents the characteristic dimension of the output , If there are no special changes , amount to out
– num_layers Represents the number of layers of the network
– nonlinearity Represents the selected nonlinear activation function , The default is ‘tanh’
– bias Indicates whether bias is used , By default
– batch_first Represents the form of input data , The default is False, That's the form ,(seq, batch, feature), That is, put the length of the sequence first ,batch Put it in the second place
– dropout Indicates whether to apply in the output layer dropout
– bidirectional Indicates whether to use bidirectional rnn, The default is False

LSTM One more output of memory unit

# Input dimensions 50, Cryptic layer 100 dimension , Two layers of 
lstm_seq = nn.LSTM(50, 100, num_layers=2)
# Input sequence seq= 10,batch =3, Input dimensions =50
lstm_input = torch.randn(10, 3, 50)
out, (h, c) = lstm_seq(lstm_input) # Use the default full 0 Hidden state

problem 1:out and (h,c) Of size How many are each ?
answer :out:(10 * 3 * 100),(h,c): All are (2 * 3 * 100)
problem 2:out[-1,:,:] and h[-1,:,:] Is it equal ?
answer : equal

GRU It's more like the traditional RNN

gru_seq = nn.GRU(10, 20,2) # x_dim,h_dim,layer_num
gru_input = torch.randn(3, 32, 10) # seq,batch,x_dim
out, h = gru_seq(gru_input)


