Update - 13/07/2015
Images in this blog post are licensed by Google Inc. under a Creative Commons Attribution 4.0 International License. However, images based on places by MIT Computer Science and AI Laboratory require additional permissions from MIT for use.
Artificial Neural Networks have spurred remarkable recent progress in image classification and speech recognition. But even though these are very useful tools based on well-known mathematical methods, we actually understand surprisingly little of why certain models work and others dont. So lets take a look at some simple techniques for peeking inside these networks.
We train an artificial neural network by showing it millions of training examples and gradually adjusting the network parameters until it gives the classifications we want. The network typically consists of 10-30 stacked layers of artificial neurons. Each image is fed into the input layer, which then talks to the next layer, until eventually the output layer is reached. The networks answer comes from this final output layer.
One of the challenges of neural networks is understanding what exactly goes on at each layer. We know that after training, each layer progressively extracts higher and higher-level features of the image, until the final layer essentially makes a decision on what the image shows. For example, the first layer maybe looks for edges or corners. Intermediate layers interpret the basic features to look for overall shapes or components, like a door or a leaf. The final few layers assemble those into complete interpretationsthese neurons activate in response to very complex things such as entire buildings or trees.
One way to visualize what goes on is to turn the network upside down and ask it to enhance an input image in such a way as to elicit a particular interpretation. Say you want to know what sort of image would result in Banana. Start with an image full of random noise, then gradually tweak the image towards what the neural net considers a banana (see related work in [1], [2], [3], [4]). By itself, that doesnt work very well, but it does if we impose a prior constraint that the image should have similar statistics to natural images, such as neighboring pixels needing to be correlated.
data:image/s3,"s3://crabby-images/b75bc/b75bce310700316b702abbb81f1919a376f41ea9" alt=""
data:image/s3,"s3://crabby-images/08c4a/08c4abde77caa1458a895a284db700effb23f9c9" alt=""
Indeed, in some cases, this reveals that the neural net isnt quite looking for the thing we thought it was. For example, heres what one neural net we designed thought dumbbells looked like:
data:image/s3,"s3://crabby-images/0b65d/0b65dfa0902ed441b8a56134e4c76189f2ef3662" alt=""
Instead of exactly prescribing which feature we want the network to amplify, we can also let the network make that decision. In this case we simply feed the network an arbitrary image or photo and let the network analyze the picture. We then pick a layer and ask the network to enhance whatever it detected. Each layer of the network deals with features at a different level of abstraction, so the complexity of features we generate depends on which layer we choose to enhance. For example, lower layers tend to produce strokes or simple ornament-like patterns, because those layers are sensitive to basic features such as edges and their orientations.
![]() |
Left: Original photo by Zachi Evenor. Right: processed by Günther Noack, Software Engineer |
![]() |
Left: Original painting by Georges Seurat. Right: processed images by Matthew McNaughton, Software Engineer |
data:image/s3,"s3://crabby-images/8a8a4/8a8a4e4da08566f4b7304a00089aa1b64ea13447" alt=""
data:image/s3,"s3://crabby-images/d0cb8/d0cb86ef7d15b84743edcd8cdfa0a917659f985b" alt=""
![]() |
The original image influences what kind of objects form in the processed image. |
We must go deeper: Iterations
If we apply the algorithm iteratively on its own outputs and apply some zooming after each iteration, we get an endless stream of new impressions, exploring the set of things the network knows about. We can even start this process from a random-noise image, so that the result becomes purely the result of the neural network, as seen in the following images:
![]() |
Neural net dreams generated purely from random noise, using a network trained on places by MIT Computer Science and AI Laboratory. See our Inceptionism gallery for hi-res versions of the images above and more (Images marked Places205-GoogLeNet were made using this network). |
0 comments:
Post a Comment