Here, we will study how to practically minimize a function. Most Machine Learning Algorithms and many other AI algorithms can be reduced to the question of minimizing a certain function.
We start by importing Numpy and Matplotlib as we did in the previous session
import numpy as np
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [15, 6]
Minimizing a function is to find the argument for which it has minimal value.
Let us start by considering a simple function: $$f(x) = (x-2)^2 + 1$$
def f(x):
return (x-2)*(x-2)+1
x_values = np.linspace(-4,4)
plt.plot(x_values, f(x_values))
You might have learned in High-School that, to minimize a function, you first compute its derivative, then find the values for which the derivative is zero.
You then compute the second derivative (ie. the derivative of the derivative) to check if it is a minimum: if the second derivative is positive, it is a minimum; if it is negative it is a maximum. If it is zero, we cannot conclude.
In our case, the derivative is: $$\frac{d f(x)}{dx} = 2\cdot(x-2)$$
And the second derivative is: $$\frac{d^2 f(x)}{dx^2} = 2$$
We can already see that the second derivative is positive everywhere (in such a case, the function is called "convex"). This means that if the derivative is zero for some x value, then it is necessarily a minimum.
Here, it is easy to see that the derivative is 0 for x=2. Therefore, the minimum of the function is f(2)=0 when x=2
This method for finding a minimum works fine for simple cases. But we will encounter extremely complicated functions in practice. For such functions, this approach is in general not practical.
Instead, we will now see an interative method called Gradient Descent
Let us define a function corresponding to the derivative of f:
def deriv_f(x):
return 2*(x-2)
x_values = np.linspace(-4,4)
plt.plot(x_values, f(x_values))
plt.plot(x_values, deriv_f(x_values), "--")
deriv_f(0)
Let us choose a starting value for x:
x0 = 0
Now let us update x0 according to the opposite direction of the gradient. We take a learning rate of 0.2
print("x0:", x0)
print("f(x0):", f(x0))
print("f'(x0):", deriv_f(x0))
x0 = x0 - 0.2 * deriv_f(x0)
You can execute the cell above several times with Ctrl+Enter to see the evolution of x0 and f(x0)
Let us now do the same thing, but more graphically:
x0 = 0
print("x0:", x0)
print("f(x0):", f(x0))
print("f'(x0):", deriv_f(x0))
plt.plot(x_values, f(x_values))
plt.plot([x0], [f(x0)], "ro")
x0 = x0 - 0.2 * deriv_f(x0)
#Press Ctrl+Enter in this cell several times to see the evolution
def f2(x):
return np.cos(x)**2 +1 -x
def deriv_f2(x):
return -2*np.cos(x)*np.sin(x) - 1
x_values = np.linspace(-5,5)
plt.plot(x_values, f2(x_values))
x0 = 0
print("x0:", x0)
print("f(x0):", f2(x0))
print("f'(x0):", deriv_f2(x0))
plt.plot(x_values, f2(x_values))
plt.plot([x0], [f2(x0)], "ro")
x0 = x0 - 0.2 * deriv_f2(x0)
def f3(x):
return 2*np.sin(x-2)**2 + np.cos(x)
x_values = np.linspace(-5,5)
plt.plot(x_values, f3(x_values))
def deriv_f3(x):
return 4*np.sin(x-2)*np.cos(x-2) - np.sin(x)
x0 = 0
print("x0:", x0)
print("f(x0):", f3(x0))
print("f'(x0):", deriv_f3(x0))
plt.plot(x_values, f3(x_values))
plt.plot([x0], [f3(x0)], "ro")
x0 = x0 - 0.2 * deriv_f3(x0)
import os
os.system('jupyter nbconvert --to html Notebook2.ipynb')