Name generator for fantasy characters
Using PyTorch and a list of baby names as data
(originally posted January 6th, 2019)
My first NLP project is a generator of first names for fiction writers, especially for the fantasy genre. Coming up with fantasy names can be hard because a good name should seem unusual but familiar at the same time. A neural network is a great tool for the job because it can learn the patterns of letters in names, like which letter should come after a certain series of other letters. Injecting a bit of randomness into this process can produce names that range from familiar to bizarre. Adjusting the amount of randomness is something the user can do to still feel like they have some input in this creative process, instead of letting the neural network do all the work. My goal for this project was to create a command line app that generates a single name, which can be run as many times as needed.
The first step was to choose a deep learning framework to use. I chose PyTorch, which I already had some experience with. I knew I wanted to use an RNN with LSTM units, and I read about how to use PyTorch’s LSTM module. The hardest part of getting it to work was dealing with tensor dimensions. I knew I needed to encode each letter of each name as a one-hot tensor with one dimension for each letter of the alphabet, but I wanted to be able to train on more than one letter at a time. The LSTM module takes inputs in the shape of (sequence_index, batch_index, features), so I needed to create one of these 3D tensors for each input name. The cross-entropy loss criterion that I used wouldn’t take sequences of batches though, so I had to loop through each sequence index and take the mean of all those losses. I’m curious if there is a more computationally efficient way to do this next time.
Once I had the training part of the network working and was able to predict an output letter based on a sequence of input letters, I needed to make the actual name generator. This inference step required using the LSTM module differently than in training. I sent a randomly chosen first letter into the network and computed the softmax values on the output. I then picked the letter with the highest probability from those softmax values and sent it back into the network, along with the hidden state output from the previous step. During training I didn’t need to keep track of the hidden state within a sequence because the LSTM module handled that all in one step. After this letter-by-letter inference method, the final output was a series of letters than I combined into a name.
Now that I had a working model and a way to generate a name from that model, I needed a large source of input data. I found a baby name database from the Social Security Administration that has first names for many years along with other statistics. I ignored the extra information and only used the actual first names. Since I couldn’t figure out how to batch sequences of varying length into a tensor to send into the LSTM module, I filtered the list of names down to those with six letters. I used Pandas to get a list of the names from the csv file.
Training the network for only a few hundred epochs seemed to be enough to generate somewhat usable names. Choosing the most likely letter from the softmax each time was too deterministic and resulted in the same name every time for any given first letter. I added a tuning parameter to the softmax function used in the inference step so that the probabilities for each category were exaggerated, meaning the highest probability became much higher than the other ones, not just slightly higher. Adjusting this parameter resulted in more varied results and more usable names.
Instead of judging the results based on the loss of the network, I think it makes more sense to judge the quality of the generated names. Getting the loss close to zero would probably not be possible, because there are different correct next letters in a name depending on which name is going through the training process at that step. For example, the sequence ‘Em’ could have the following letter of ‘m’ or ‘i’ depending on whether the training example was using ‘Emma’ or ‘Emily’. Looking at the generated names and judging the results from an aesthetic angle seems like the best way to judge the performance of the model.
Here are some of the generated names that I got:
It’s like I picked them straight out of a Brandon Sanderson novel! His characters’ names aren’t all six letters long, though.
You can find the code for the project here.