No Free Lunch

A fundamental topic, and yet one often left until later to learn, is that of the so-called ``No Free Lunch’’ theorem. If I was asked to summarize the lesson of this theorem in one line, it would simply be:

Machine learning is not magic

And this is a very important lesson indeed. Perhaps the best intuition behind the theorem is to be gained by the following simple example. Assume we see a sequence of random numbers,

\[x = { 1, 3, 9, \ldots}\]

and we are asked to predict the next number in the sequence. Most of us would probably predict, \(x = { 1, 3, 9, 27, \ldots}\), assuming that the sequence at each time step is being generated by \(x_t = 3 x_{t-1}\).

However, there is no reason to not instead believe the hypothesis that this sequence is simply the output of a random number generator, in which case the next item of the sequence can not easily predicted. Even if it seems very unlikely (and Occam’s razor is a good argument here), we cannot completely disprove this hypothesis without seeing all the data points.

``Everything, but the data, is an assumption’’

Zoubin Gahramani MSR AI Summer School 2017

The only reason machine learning works at all are the assumptions we make about the problem. We call these assumptions the model. Whatever assumptions we make in our model will, of course, only help predictions with the types of problems where those assumptions hold, while hindering prediction with other types of problems. Wolpert et al. show that this means at best, over all possible input data distributions, we can not expect any model to do better than random.

This is often cited as the nail in the coffin for the idea of a universal learning algorithm, and algorithm that can learn any problem. Theoretically this is certainly true, however we are interested only in learning real-word problems, of which the data lies in a specific subset of all possible distributions.

Leave a Comment

Your email address will not be published. Required fields are marked *