Notebook 2: Minimizing a function

Here, we will study how to practically minimize a function. Most Machine Learning Algorithms and many other AI algorithms can be reduced to the question of minimizing a certain function.

Importing the libraries

We start by importing Numpy and Matplotlib as we did in the previous session

In [1]:
import numpy as np

%matplotlib inline

import matplotlib
import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = [15, 6]

Minimizing a function

Minimizing a function is to find the argument for which it has minimal value.

Let us start by considering a simple function: $$f(x) = (x-2)^2 + 1$$

In [2]:
def f(x):
    return (x-2)*(x-2)+1
In [3]:
x_values = np.linspace(-4,4)
plt.plot(x_values, f(x_values))
Out[3]:
[<matplotlib.lines.Line2D at 0x117b29160>]

High-school style minimization

You might have learned in High-School that, to minimize a function, you first compute its derivative, then find the values for which the derivative is zero.

You then compute the second derivative (ie. the derivative of the derivative) to check if it is a minimum: if the second derivative is positive, it is a minimum; if it is negative it is a maximum. If it is zero, we cannot conclude.

In our case, the derivative is: $$\frac{d f(x)}{dx} = 2\cdot(x-2)$$

And the second derivative is: $$\frac{d^2 f(x)}{dx^2} = 2$$

We can already see that the second derivative is positive everywhere (in such a case, the function is called "convex"). This means that if the derivative is zero for some x value, then it is necessarily a minimum.

Here, it is easy to see that the derivative is 0 for x=2. Therefore, the minimum of the function is f(2)=0 when x=2

Gradient Descent

This method for finding a minimum works fine for simple cases. But we will encounter extremely complicated functions in practice. For such functions, this approach is in general not practical.

Instead, we will now see an interative method called Gradient Descent

Let us define a function corresponding to the derivative of f:

In [4]:
def deriv_f(x):
    return 2*(x-2)
In [5]:
x_values = np.linspace(-4,4)
plt.plot(x_values, f(x_values))
plt.plot(x_values, deriv_f(x_values), "--")
Out[5]:
[<matplotlib.lines.Line2D at 0x117b75630>]
In [6]:
deriv_f(0)
Out[6]:
-4

Let us choose a starting value for x:

In [7]:
x0 = 0

Now let us update x0 according to the opposite direction of the gradient. We take a learning rate of 0.2

In [8]:
print("x0:", x0)
print("f(x0):", f(x0))
print("f'(x0):", deriv_f(x0))

x0 = x0 - 0.2 * deriv_f(x0)
x0: 0
f(x0): 5
f'(x0): -4

You can execute the cell above several times with Ctrl+Enter to see the evolution of x0 and f(x0)

Let us now do the same thing, but more graphically:

In [9]:
x0 = 0
In [10]:
print("x0:", x0)
print("f(x0):", f(x0))
print("f'(x0):", deriv_f(x0))

plt.plot(x_values, f(x_values))
plt.plot([x0], [f(x0)], "ro")

x0 = x0 - 0.2 * deriv_f(x0)

#Press Ctrl+Enter in this cell several times to see the evolution
x0: 0
f(x0): 5
f'(x0): -4

More problematic cases

What happen if there is no minimum value? What happens if there is more than one minimum?

Case of an unbounded function

In [11]:
def f2(x):
    return np.cos(x)**2 +1 -x 
In [12]:
def deriv_f2(x):
    return -2*np.cos(x)*np.sin(x) - 1
In [13]:
x_values = np.linspace(-5,5)
In [14]:
plt.plot(x_values, f2(x_values))
Out[14]:
[<matplotlib.lines.Line2D at 0x117d59cf8>]
In [15]:
x0 = 0
In [16]:
print("x0:", x0)
print("f(x0):", f2(x0))
print("f'(x0):", deriv_f2(x0))

plt.plot(x_values, f2(x_values))
plt.plot([x0], [f2(x0)], "ro")

x0 = x0 - 0.2 * deriv_f2(x0)
x0: 0
f(x0): 2.0
f'(x0): -1.0

Case of a function with several minimums

In [ ]:
 
In [17]:
def f3(x):
    return 2*np.sin(x-2)**2 + np.cos(x)
In [18]:
x_values = np.linspace(-5,5)
plt.plot(x_values, f3(x_values))
Out[18]:
[<matplotlib.lines.Line2D at 0x117f24a58>]
In [19]:
def deriv_f3(x):
    return 4*np.sin(x-2)*np.cos(x-2) - np.sin(x)
In [20]:
x0 = 0
In [21]:
print("x0:", x0)
print("f(x0):", f3(x0))
print("f'(x0):", deriv_f3(x0))

plt.plot(x_values, f3(x_values))
plt.plot([x0], [f3(x0)], "ro")

x0 = x0 - 0.2 * deriv_f3(x0)
x0: 0
f(x0): 2.653643620863612
f'(x0): 1.5136049906158566
In [22]:
import os

os.system('jupyter nbconvert --to html Notebook2.ipynb')
Out[22]:
65280
In [ ]:
 
In [ ]:
 
In [ ]: