Sequence to Sequence Learning with Keras
Papers:
- Sequence to Sequence Learning with Neural Networks
- A Neural Conversational Model
- Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
Notes:
-
The LSTM Encoder encodes a sequence to a single a vector.
-
The LSTM Decoder, when given a hidden state and a vector, generates a sequence.
-
In the
Seq2seqmodel, the output vector of the LSTM Encoder is the input for the LSTM Decoder, and -
The hidden state of the LSTM Encoder is copied to the hidden state of LSTM Decoder.
Continious VS Descrete sequence pairs:
- When training on continuous sequence pairs, such as long conversations, use the
Conversationalmodel instead ofSeq2seqmodel, with argumentcontext_sensitive=True. This is important if you want context sensitive conversational models, so that you can avoid scenarios like this:(Will only work if there are lot of exchanges in each conversation in your training data)
Human: what is your job ?
Machine: i ’m a lawyer .
Human: what do you do ?
Machine: i ’m a doctor
Source : A Neural Conversational Model
-
When
context_sensitive=Truedo not forget to clear the hidden state ofConversationallayer after every conversation(Not after every exchange) or a fixed number of batches usingreset_hidden_state()during training and testing. You could use theResetStatecallback for this purpose. -
You will also have to clear the hidden state of
Seq2seqlayer after a fixed number of batches when used withremember_state=True. -
In case of descrete sequence pairs(for e.g, machine translation) use
Seq2seqlayer with theremeber_stateargument set toFalse.
Example:
import keras
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from seq2seq.seq2seq import Seq2seq
from keras.preprocessing import sequence
vocab_size = 20000 #number of words
maxlen = 100 #length of input sequence and output sequence
embedding_dim = 200 #word embedding size
hidden_dim = 500 #memory size of seq2seq
embedding = Embedding(vocab_size, embedding_dim, input_length=maxlen)
seq2seq = Seq2seq(input_length=maxlen, input_dim=embedding_dim,hidden_dim=hidden_dim,
output_dim=embedding_dim, output_length=maxlen, batch_size=10, depth=4)
model = Sequential()
model.add(embedding)
model.add(seq2seq)Installation:
sudo pip install git+ssh://github.com/farizrahman4u/seq2seq.git
Requirements:
Working Example:
- Training Seq2seq with movie subtitles - Thanks to Nicolas Ivanov
