I wanted to push the boundaries of my admittedly alchemical understanding of this technology to maintain coherent visions over time. I chose Kubrick’s The Shining as source material for its symmetrical secret spaces. Stripped of the supernatural, the story is about the geometry of memory. The same can be said about the neural network that made the pictures.
I’m using the GoogLeNet Neural Network trained on the MIT Places database, which contains 2,448,873 images from some 205 scene categories. The deep dream algorithm is very sensitive to different inputs. Indeed, my computer holds Kubrick’s cinematography in high regard. So we also have that in common.
The deep dream animator tool I found on GitHub is a Python script that makes the neural net do things. Neural networks of this sort are designed for feature recognition. Instead of extracting meaning from images, deepdream makes images resemble the features it recognizes. This opens up a huge design space. I curate dreams by choosing the “personality” of the neural network. This one knows a lot about campus life. Another thinks about dogs a lot.
This creepy Victorian house was used as a guide image It’s a template for skewing the weighting of features the neural net wants to identify on its own. It’s a useful method for generating a wide variety of images from a single neural net without using a different training set. For my purposes, I was keen to amplify house-like features.
Limiting the neural network’s understanding of Kubrick’s The Shining to a single picture of a creepy house made it obsessive. It sees houses everywhere. The images shift around, but remain remarkably consistent. Is this mechanical understanding so different than my own? It is remarkable to see house-ness, the idea of a house, emerge and persist from such simple choices.
Getting the expressive results I wanted took some trial and error and has opened a previously unknown world of machine learning. Processing hundreds of HD frames is impractical if you try to do it on the CPU. It would take days. Fortunately the Caffe Deep Learning Framework supports Nvidia’s CUDA parallel processing architecture and I was able to process 1000 frames in 12-16 hours on a GTX970
The eye is sensitive to motion and we seem to easily substitute temporal resolution for spatial resolution. Smaller features tended to average together. When in motion, this just appeared as noise on top of a blurry movie. The size of the dreamed features needed to be proportional to the size of the 1080p image source.
The color was going to be a problem. Stills looked interesting, but the deep dream algorithm builds images using primary RGB values – pure red, green and blue. This yields a characteristic rainbow fringed appearance that can be exaggerated or suppressed depending on many other factors.
I used After Effects to composite the rendered frames over the source frames with the luminance transfer mode so that all color information was derived from the source.
The neural network used for deep dreaming is organized into layers. Lower levels activate in response to simple geometric features and contours. The upper layers of these networks contain what might be called semantic information. In the case of the bvlc_googlenet model I was using, there seem to be a lot of dogs and eyes. I like creepy eyeballs as much as anyone, but I’m trying to remix art up in here so I chose a layer that was alive with geometric hallucinations and only the occasional eyeball.
You may want to train your own neural network. The character of the rendered output can be quite different. The bvlc_googlenet model (codenamed Inception) is trained on data from the ImageNet project which contains 14,197,122 images at this time.
Encoding for YouTube posed interesting compression vs. bandwidth challenges. This 1080p clip includes a lot of high frequency detail with every pixel in motion. I used the Pixel Motion Blur feauture in After Effects to calculate motion blur for all moving pixels. I used the Posterize Time feature to show unique (now motion blurred) frames at 10 fps while keeping the movie frame rate consistent at 24 fps. This yielded a sequence where approximately every 3 frames were identical, each having fewer high frequency pixel transitions.
Finally, the native audio was too clean, and I wanted it to seem as though it was being generated with the same process as the images. Neural networks have been used to process audio for decades, but not here. Instead the result emerged as a happy accident when I was experimenting with the frame rate and forgot to conform the original audio to match the timebase of the output.
This experiment began as a CSS hack to remove branding and UI from the Google Maps application for an AR product demo. Stripped of any context and zoomed in tight, the landscapes began to resemble paintings.
The API made it easy to manage viewport scaling and fire events from different phases of that interaction. I needed to constrain the scale of the map to keep features unrecognizable you see. The design prevents zooming out beyond a threshold, although in some cases it’s immediately obvious where you are. I added a randomization control to instantly switch the viewport between a collection of pleasant coordinates I’d noticed along the way.
A friend thought it would be relaxing to hear the sound of the wind when you zoomed out. I liked that idea and wanted to generalize it further. I implemented an HTML5 audio player and made it bindable to any discrete map zoom level. I attached an audio loop to each and wrote a script to crossfade between the audio cues with the scroll wheel.