As usual, we start by importing the libraries we will use:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d
import pandas as pd
We define some data we are going to use for training a model. They represent information about 30 persons. We know how many cigarets they were smoking per days, their bmi, their biological sex and the age they die .(NB: this is artificial data, not from a real study).
Note that for indicating categorical variables, such as the biological sex, we use a binary variable equal to 1 if the person belongs to a given category. We could have used a variable is_female instead of is_male. However, it is not useful to use both is_male and is_female, because there is a simple linear relation between both features.( is_female = 1-is_male)
daily_cig_smoked = np.array([24, 37, 11, 33, 27, 27, 6, 31, 28, 15, 33, 39, 39, 30, 26, 8, 10,
0, 8, 33, 14, 32, 24, 10, 19, 13, 24, 38, 5, 0])
bmi = np.array([44.4, 47.7, 14.1, 49.3, 20.4, 38.2, 22.2, 17.1, 23.3, 31.4, 25.9,
26. , 40.4, 31.8, 40.8, 49.8, 46.7, 24.8, 35.9, 20.5, 38.9, 14.4,
29.6, 25.2, 39.4, 44.3, 47.2, 42.3, 30.8, 33.9])
is_male = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0,
1, 0, 0, 0, 0, 1, 1, 0])
age_of_death = np.array([43.4, 25.2, 73.1, 25.5, 70.5, 52.1, 78.6, 64.6, 71.3, 66.8, 63.6,
66.9, 42.6, 57.5, 48.6, 47.9, 46.6, 81.3, 75.6, 66.7, 58.4, 61.3,
62.6, 81.8, 63.9, 57.4, 43.8, 38.6, 75.9, 81.4])
# We put all variables in one big array:
cig_bmi_male_age = np.vstack((daily_cig_smoked, bmi, is_male, age_of_death))
We try to plot our data visually, with daily_cig_smoked on one axis, bmi on the other and age_of_death on the vertical axis. the is_male feature is represented by a color code (blue if male, red if female)
fig = plt.figure(figsize=(7,5))
ax = fig.add_subplot(111, projection='3d')
cmap, norm = matplotlib.colors.from_levels_and_colors([-1,0.5,2], ["red", "blue"])
ax.scatter3D(daily_cig_smoked, bmi, age_of_death, c=is_male, cmap=cmap, norm=norm)
We use the pandas library to display the example values nicely:
pd.DataFrame(cig_bmi_male_age.T,
columns=("daily cigarets","bmi","is male","age of death"))
We start by defining our model, with 4 parameters thetas::$$age = \theta_0 + \theta_1\times cig + \theta_2\times bmi + \theta_3\times ismale$$
def model(cig, bmi, is_male, thetas):
return thetas[0] + thetas[1]*cig +thetas[2]*bmi + thetas[3]*is_male
As before, define the loss (ie. the Mean Squared Distance): $$ loss(\vec{\theta}) = \frac{1}{N}\cdot\sum_i (model(\vec{\theta}, x_i)-y_i)^2$$
def loss(thetas, cig, bmi, is_man, age_of_death):
return np.mean((model(cig, bmi, is_man, thetas) - age_of_death)**2)
We can now define the gradient: $$error_i=\theta_0+ \theta_1 \times cig_i + \theta_2\times bmi + \theta_3\times ismale -age_i$$ and can then express the gradient as: $$\frac{2}{N}\cdot\sum_i error_i$$ $$\frac{2}{N}\cdot\sum_i cig_i\times error_i$$
$$\frac{2}{N}\cdot\sum_i bmi_i\times error_i$$ $$\frac{2}{N}\cdot\sum_i ismale_i\times error_i$$
def grad_loss(thetas, cig, bmi, is_man, age_of_death):
error = model(cig, bmi, is_male, thetas) - age_of_death
return np.array([np.mean(2*error), np.mean(2*error*cig),
np.mean(2*error*bmi), np.mean(2*error*is_male)])
Next, we initialize some thetas randomly:
thetas = np.random.randn(4)
print ("The initial loss is:", loss(thetas, daily_cig_smoked, bmi, is_male, age_of_death))
We do the gradient descent. Note that we need many iterations to converge:
for numiter in range(20000):
thetas = thetas - 0.0001*grad_loss(thetas, daily_cig_smoked, bmi, is_male, age_of_death)
print(f"loss: {loss(thetas, daily_cig_smoked, bmi, is_male, age_of_death)}")
print(thetas)
print(grad_loss(thetas, daily_cig_smoked, bmi, is_male, age_of_death))
Now, let us make a prediction:
print("If I smoke 10 cigarets per days, have a bmi of 22 and am a woman, the model predict I will die at",
model(10, 22, 0, thetas))
As we saw, having features with very different ranges will make learning slower. The following code will scale all features to have mean 0 and range between -1 and 1.
def scale_feature(feature):
min_val = np.min(feature)
max_val = np.max(feature)
mean_feat = np.mean(feature)
def scaler(x):
return (x - mean_feat)/(max_val - min_val)
return (feature - mean_feat)/(max_val - min_val), scaler
daily_cig_smoked_scaled, daily_cig_smoked_scaler= scale_feature(daily_cig_smoked)
bmi_scaled, bmi_scaler = scale_feature(bmi)
is_male_scaled, is_male_scaler = scale_feature(is_male)
The scaled data now looks like this:
pd.DataFrame(np.vstack(( daily_cig_smoked_scaled, bmi_scaled, is_male_scaled, age_of_death)).T,
columns=("daily cigarets","bmi","is male","age of death"))
Let us retry the learning. Note that we can now train with a much larger learning rate and thus minimize the loss much faster.
thetas = np.random.randn(4)
for numiter in range(100):
thetas = thetas - 0.1*grad_loss(thetas, daily_cig_smoked_scaled, bmi_scaled, is_male_scaled, age_of_death)
print(f"loss: {loss(thetas, daily_cig_smoked_scaled, bmi_scaled, is_male_scaled, age_of_death)}")
print(thetas)
print(grad_loss(thetas, daily_cig_smoked_scaled, bmi_scaled, is_male_scaled, age_of_death))
print("If I smoke 10 cigarets per days, have a bmi of 22 and am a woman, the model predict I will die at",
model(daily_cig_smoked_scaler(10), bmi_scaler(22), is_male_scaler(1), thetas))
Let us go back to the case of one feature/variable. This time, we consider the relation between life expectancy and BMI.
plt.plot(bmi, age_of_death, "o")
We can see that the relation is not really linear. Maybewe can use a Polynomial model instead of a linear model.Fortunately, we can re-use our multi-variable linear model by replacing the other features with powers of the bmi.
thetas_bmi = np.random.randn(4)
We then do the gradient descent as before:
for numiter in range(1000):
thetas_bmi = thetas_bmi - 0.1*grad_loss(thetas_bmi, bmi_scaler(bmi),
bmi_scaler(bmi)**2,
bmi_scaler(bmi)**3, age_of_death)
print(f"loss: {loss(thetas_bmi, bmi_scaler(bmi), bmi_scaler(bmi)**2, bmi_scaler(bmi)**3, age_of_death)}")
print(thetas_bmi)
print(grad_loss(thetas_bmi, bmi_scaler(bmi), bmi_scaler(bmi)**2,
bmi_scaler(bmi)**3, age_of_death))
bmi_x = np.linspace(12, 50)
plt.plot(bmi, age_of_death, "o")
plt.plot(bmi_x, model(bmi_scaler(bmi_x), bmi_scaler(bmi_x)**2,
bmi_scaler(bmi_x)**3, thetas_bmi))
We can see that the model predict the data much better than a simple linear regression now.