Monday, May 4, 2015

On The Evolution of Machine Learning

DB: Why is a neural network so powerful compared to the traditional linear and non-linear methods that have existed up until now?

RZ: When you have a linear model, every feature is either going to hurt or help whatever you are trying to score. That’s the assumption inherent in linear models. So the model might determine that if the feature is large, then it’s indicative of class 1; but if it’s small, it’s indicative of class 2. Even if you go all the way up to very large values of the feature, or down to very small values of the feature, you will never have a situation where you say, “In this interval, the feature is indicative of class 1; but in another interval it’s indicative of class 2.”

That’s too limited. Say you are analyzing images, looking for pictures of dogs. It might be that only a certain subset of a feature’s values indicate whether it is a picture of a dog, and the rest of the values for that pixel, or for that patch of an image, indicate another class. You can’t draw a line to define such a complex set of relationships. Non-linear models are much more powerful, but at the same time they’re much more difficult to train. Once again, you run into those hard problems from optimization theory. That’s why for a long while we thought that neural networks weren’t good enough, because they would over-fit, or they were too powerful.  We couldn’t do precise, guaranteed optimization on them. That’s why they (temporarily) vanished from the scene.

DB: Within neural network theory, there are multiple branches and approaches to computer learning. Can you summarize some of the key approaches?

RZ: By far the most successful approach has been a supervised approach where an older algorithm, called backpropagation, is used to build a neural network that has many different outputs.

Let’s look at a neural network construction that has become very popular, called Convolutional Neural Networks. The idea is that the machine learning researcher builds a model constructed of several layers, each of which handles connections from the previous layer in a different way.

In the first layer, you have a window that slides a patch across an image, which becomes the input for that layer. This is called a convolutional layer because the patch “convolves”, it overlaps with itself. Then several different types of layers follow. Each have different properties, and pretty much all of them introduce non-linearities.

The last layer has 10,000 potential neuron outputs; each one of those activations correspond to a particular label which identifies the image. The first class might be a cat; the second class might be a car; and so on for all the 10,000 classes that ImageNet has. If the first neuron is firing the most out of the 10,000 then the input is identified as belonging to the first class, a cat.

The drawback of the supervised approach is that you must apply labels to images while training. This is a car. This is a zoo. Etc.


- Interview with Reza Zadeh

No comments: