Porting my PyTorch name generator to TensorFlowJS
(originally posted February 3rd, 2019)
I started out by learning how Keras works and looking for the basic LSTM layer, since I’m not quite ready to build my own LSTM from scratch. Installing TensorFlow was more involved than installing PyTorch; I’m using Anaconda, which had a base version of Python that was too high for TensorFlow, so I had to create a new environment with the older Python version that TensorFlow would work with. Anaconda is great for creating separate Python environments for each project, so that wasn’t such a big deal. I then looked at the RNN tutorials for Keras on the TensorFlow page and was intimidated by how complicated they were. Luckily, I found a tutorial that showed a simple way to use LSTM layers. I like learning by finding the most basic way to get something to work and then building on it one piece at a time, so this was a great tutorial to find. I then used the Keras reference to learn the rest of what I needed to know to get a functioning RNN model.
I then needed to figure out how to build a data set for training. Keras works with simple NumPy arrays, so I wrote a function to convert the text file of names to a pair of input and target arrays for training. I found that Keras has a flexible training method for models that lets me input the whole data set and then tell it a batch size and whether to shuffle the data each epoch. It even had a setting for splitting it into training and validation sets, so I didn’t need to code any of that. I was worried TensorFlow would be harder to learn than PyTorch, but with Keras it is actually more intuitive.
There were a couple things that were less flexible with Keras than with PyTorch, though. I had to pick a fixed sequence length for training with Keras, which meant I had to pad each training name with spaces to the length of the longest name, which was 15 letters long. That meant I had to build the LSTM layer to always be 15 long and always take that length of sequence data. I was able to train on multiple names at a time, though, which I wasn’t able to do in PyTorch (at least not yet), so that was a benefit. The other harder thing with Keras was generating a name with this fixed sequence size. To generate a name, I need to first feed the first letter of the name and get the prediction of the second letter, then feed the first two letters and get the prediction of the third letter, etc. Since I could only input and output sequences of 15, I started with a first letter padded with spaces and got a full sequence output of length 15, of which I only used the first index. Then I kept feeding in the gradually growing name with fewer and fewer padded spaces while only using the exact predicted letter I needed to add to the end of the name in progress. This is very inefficient and kind of a hack, but it does work and the inefficiency isn’t a big deal when only generating small numbers of names at a time.
The Keras model could generate the most likely name given an initial character, but I haven’t yet implemented generating names with randomness added in. This means that the trained model will always generate the same name given the same input. One interesting name it generated was Oluwatamilia, which is close to a name that is in the training set: Oluwatamilore. It’s a good example of overfitting at first and then taking a new direction at the end. An interesting name generated completely randomly with no training of the model was Atxxrogoddyceppi, which doesn’t seem completely random. It makes me wonder how much training the model actually needs to start making names that have hints of real names.
You can find the project code here.