Description

The goal of this work is to test whether or not spectrograms are a reasonable way to visually represent music as image input to feed to deep learning networks. We implemented music classification using the songs' spectrograms and then utilized the trained network to modify spectrograms, similar to the Google's deep dream method. We have also attempted to change the spectrograms through neural style transfer and GANs. In all modification methods, the spectrograms were converted back to audio and assessed qualitatively. While classification achieves a good accuracy, music modification may require tuning and post-processing but does produce promising sound tracks. For more details, refer to the paper here. Below are the audio samples for each of the music modification experiments.

Deep Dream

There are two kinds of deep dream experiments that we performed - one where we do not try to control the modification process (Dream) and the other where we try to control the modification process with a guide song/spectrogram (Dream Control). We applied the deep dream approaches at the 13th convolutional layer of the trained CNNs. The number of octaves used was 6 and the octave scale used was 1.4. At each octave, gradient ascent was used for 20 iterations.

Original hiphop song



Deep dream



Original classical song



Deep dream (control with the classical song)



Neural Style Transfer

Applying image style transfer to the spectrograms, we have tried to transfer a style of some genres to songs of a different genre. Following the original work, we have used a pretrained VGG-19 neural network to extract features. To transfer the style, we create a new image that matches the content of the original track's spectrogram and the style of a spectrogram of a song of a different genre.

Original Classical Song



Original Rock Song



Style transfer from the rock song to the classical



CycleGAN

We have tried converting classical pieces into rock songs but using a CycleGAN model. The training test consisted of 5397 rock and 1074 classical spectrograms. Due to time limitations we were only able to train CycleGAN for 13 epochs. Also, to speed up the training the spectrogram images were resized from 512x512 to 256x256 prior to the training.

Original Classical Song



CycleGAN Modified Song