Improving my PyTorch fantasy name generator
(originally posted January 20th, 2019)
This week I returned to my fantasy name generation app to make some improvements. These included adjusting the app so it could generate names of varying lengths, adding tests to make sure all the functions were doing the right thing, and adding comments and documentation to my python files to make the code easier to return to next time.
To make the app generate names of varying lengths, instead of just six-letter names, I first changed how the training process worked. Instead of training just on the six letter names in the data set, I wanted to be able to train on all of the names. Five and six letter names were the most common, but some names had 15 letters and some had only two. Training on names of multiple lengths meant I had to change some things about the training process. Before, I was passing a group of names through the network all at once. This required all the names to have the same number of letters, or sequence length. I could have padded all of the names with a series of null characters at the end so they were all as long as the longest name in the data set, but I thought of a simpler approach. I decided to just train on one name at a time so that the sequence length differences wouldn’t be an issue. This might make training slower, but my goal was to get it working and think about optimization later.
I also introduced a ‘terminal character’, using the underscore character, to mark the end of a name. This meant the one-hot tensors had to have an extra dimension to represent this character. Name generation could use this character to know when to stop generating more characters. The loop that generates letters for a name used to have a fixed number of iterations, but now it loops until the network predicts the terminal character. That way, the network decides how long the names should be instead of having a fixed length. If I wanted a name of a certain length and stopped the generation loop early, it might now end up with a strange name because the network didn’t realize that was going to be the final letter. Choosing the length of the name beforehand is a feature I’d like to add, but I’m not sure how to do it while making the names end well.
I also wanted to add tests to the app this week. I learned about PyTest, a testing framework that seemed to be the most popular for python. I created some extra python files and filled them with tests to make sure my functions were doing the right thing. Testing after I got the app working isn’t the order I would use in the future, but writing this app was a learning process mostly. Now I know it makes more sense to test while I’m writing the functions, or even beforehand as a way of planning. I also realized that some things were easier to test without PyTest, using assert statements inside the functions. These statements made sure the input and output were of the right type and even that the contents made sense (for example, that a generated name had only letters and not numbers). It was more work to do these verification steps as PyTest tests, but it might help to have them run separately from the actual functions that are being tested for efficiency reasons. Running a set of assert statements might slow down a function, while PyTest tests are only run from the command line when I want to run them, and not when the app is actually running.
Then I added docstrings and comments. I read the python style guide and saw how docstrings were meant to be used at the top of function bodies to document the purpose of the functions. I did my best to describe the purpose of every function and the inputs it should take. The style guide didn’t say anything about describing the outputs of functions, but maybe I will add that in the future. For group projects it will just depend on the agreed style for the group. I also added comments wherever it wasn’t completely obvious what the code was doing. I tried to describe the higher purpose and reasoning behind each piece of code instead of just saying exactly what it was doing. Some lines of code seemed so obvious that I didn’t add comments to them. I think that I will have a much easier time coming back to this project next time because I will have clear explanations of all the code to read. Writing the docstrings and comments took about as long as writing the actual code. Maybe I overthought the documentation and tried to get it perfect, but it does seem like a huge part of writing an app that I hadn’t planned on spending so much time on. I usually even spend a lot of time trying to name my variables and functions perfectly.
Next time I work on this project, I’d like to actually treat it more like a real classification task by splitting my data into training, validation, and test sets and then test how accurately the network can predict next letters in the validation and test sets after only training on the training set. I’d also like to see how low I can get the training loss by tuning hyperparameters and adding things like dropout and regularization. I could also make the training more efficient and have it use the GPU. For now, though, I’m happy with the name results I’ve been getting, like:
You can find the project code here.