Tuning, adjusting, trial and error
How tuning a neural network is like tuning a synthesizer
(originally posted April 6th, 2019)
When I’m making decisions about hyperparameter settings for a neural network (batch size, learning rate, number of layers, size of each layer), I’m reminded of what it was like when I started making electronic music in Apple’s Logic Pro many years ago. When I opened up a new synthesizer for the first time to make some cool sounds, I was met with this:
That is quite a lot of settings to work with, and I had no idea what any of them did. Some of them changed the tone of the sound, but others didn’t seem to be doing much at all. Some sliders had strange interactions at certain positions but didn’t seem to affect each other at other positions. Some buttons seemed to just break the sound completely by making hardly any noise come out.
It was like the feeling of starting to tune neural network hyperparameters without knowing anything about how a neural network works. But I read the manual and watched YouTube videos and eventually had a good idea of what each knob, slider and button did. I realized that some of the ones that seemed to not do anything were actually “hyperparameters” of other settings, affecting how those settings would work in a secondary way. Some affected the sound itself, while others affected the timing of the other settings.
The difference with neural networks is that even after taking classes, reading books and spending months learning about their inner workings and exactly what each of the hyperparameters do, I still have no idea where to set them. It’s a whole lot of trial and error still, just like when I first opened that synthesizer years ago. It’s actually more challenging because I have to wait so long to see the results with larger networks.
The waiting reminds me of my minimal experience with Blender, an open source 3D graphics program. Once you set up your scene and want to export it as a flat image, you “render” it by telling the program to calculate all the virtual bouncing light particles against the virtual materials to create a final image. This can take a long time depending on the complexity of the scene, so you have to wait a while to see the results of the settings you chose.
When people say hyperparameter tuning is an art, I think back to my experiences with music and graphics. It’s comforting to make these connections because it makes experimenting with settings seem like a familiar task, but it’s frightening because I realize how hard it can be to get things working.
All this makes me think: how cool would it be to adjust hyperparameters with an interface that looks like that Sculpture synthesizer? I hope deep learning tools eventually have an intuitive graphic interface that looks as cool as synthesizers. But for now, I’ll keep using Python.