Name generator in TensorFlow and JavaScript

Joe Bender
5 min readFeb 3, 2021

--

Porting my PyTorch name generator to TensorFlowJS

(originally posted February 3rd, 2019)

My goal this week was to translate my PyTorch implementation of a name generator into TensorFlow, one of the other most popular deep learning frameworks for Python. TensorFlow has a high level interface called Keras that lets you build models in an intuitive way, layer by layer, so I used that instead of the lower level building blocks of TensorFlow, which are a bit more complicated. I was able to make a functioning version of the name generator with Keras by the middle of the week, so I moved on and also made a version that works in the browser with TensorFlowJS, the JavaScript version of TensorFlow that works completely within a browser.

I started out by learning how Keras works and looking for the basic LSTM layer, since I’m not quite ready to build my own LSTM from scratch. Installing TensorFlow was more involved than installing PyTorch; I’m using Anaconda, which had a base version of Python that was too high for TensorFlow, so I had to create a new environment with the older Python version that TensorFlow would work with. Anaconda is great for creating separate Python environments for each project, so that wasn’t such a big deal. I then looked at the RNN tutorials for Keras on the TensorFlow page and was intimidated by how complicated they were. Luckily, I found a tutorial that showed a simple way to use LSTM layers. I like learning by finding the most basic way to get something to work and then building on it one piece at a time, so this was a great tutorial to find. I then used the Keras reference to learn the rest of what I needed to know to get a functioning RNN model.

I then needed to figure out how to build a data set for training. Keras works with simple NumPy arrays, so I wrote a function to convert the text file of names to a pair of input and target arrays for training. I found that Keras has a flexible training method for models that lets me input the whole data set and then tell it a batch size and whether to shuffle the data each epoch. It even had a setting for splitting it into training and validation sets, so I didn’t need to code any of that. I was worried TensorFlow would be harder to learn than PyTorch, but with Keras it is actually more intuitive.

There were a couple things that were less flexible with Keras than with PyTorch, though. I had to pick a fixed sequence length for training with Keras, which meant I had to pad each training name with spaces to the length of the longest name, which was 15 letters long. That meant I had to build the LSTM layer to always be 15 long and always take that length of sequence data. I was able to train on multiple names at a time, though, which I wasn’t able to do in PyTorch (at least not yet), so that was a benefit. The other harder thing with Keras was generating a name with this fixed sequence size. To generate a name, I need to first feed the first letter of the name and get the prediction of the second letter, then feed the first two letters and get the prediction of the third letter, etc. Since I could only input and output sequences of 15, I started with a first letter padded with spaces and got a full sequence output of length 15, of which I only used the first index. Then I kept feeding in the gradually growing name with fewer and fewer padded spaces while only using the exact predicted letter I needed to add to the end of the name in progress. This is very inefficient and kind of a hack, but it does work and the inefficiency isn’t a big deal when only generating small numbers of names at a time.

The Keras model could generate the most likely name given an initial character, but I haven’t yet implemented generating names with randomness added in. This means that the trained model will always generate the same name given the same input. One interesting name it generated was Oluwatamilia, which is close to a name that is in the training set: Oluwatamilore. It’s a good example of overfitting at first and then taking a new direction at the end. An interesting name generated completely randomly with no training of the model was Atxxrogoddyceppi, which doesn’t seem completely random. It makes me wonder how much training the model actually needs to start making names that have hints of real names.

I also got a version of the model working in the browser by saving the model in Keras and exporting to a format that the JavaScript version of TensorFlow could load and use. It just outputs names to the browser console for now. One frustration of getting this simple functionality working was that I could only load the model using an HTTP request and not from my file system. I guess it wasn’t programmed to work for people who just wanted to test it out on their own computer without building a server and hosting platform. Luckily I found Python’s HTTP server, which is already a part of Python’s main installation and which I could use from the command line to start a very simple server on my computer. I started it in the directory of my project and was able to get the JavaScript model loaded and working. I thought it was strange that I didn’t find any tutorials about doing it this simple way. Most of them made it sound like I needed to get a full Node server running just to load the model. I guess the theme of this week is my disappointment with many tutorials overly complicating things.

To get the JavaScript version working, I also had to code new versions of helper methods that converted data from strings to NumPy arrays that could be used by the model, and then back to strings. I already had these coded in Python for the Keras version, but I couldn’t use the same code with the JavaScript version. Translating from Python to JavaScript made me realize how useful some things in Python are, like list comprehensions. In JavaScript I have to make a for loop that takes up more lines and just seems more complicated. I also learned a bit more about asynchronous functions in JavaScript, and how to use the keywords ‘async’ and ‘await’ together.

You can find the project code here.

--

--

Joe Bender

I write about machine learning, natural language processing, and learning in general.