Modeling Non-linear Relationships

In this notebook, we will use basic feature transformations (feature engineering) to model non-linear relationships using linear models.

Toy Data set

To enable easy visualization of the model fitting process we will use a simple synthetic data set.

We can visualize the data in three dimensions:

Basic Linear Model

We normally start with a basic linear model with an intercept term.

To track the performance of our models, we use the following plotting functions.

Examining our latest model:

Examining the above data we see that there is some periodic structure as well as some curvature. Can we fit this data with a linear model?

What does it mean to be a Linear Model

Linear models are linear combinations of features. These models are therefore linear in the parameters but not necessarily the underlying data. We can encode non-linearity in our data through the use of feature functions:

$$ f_\theta\left( x \right) = \phi(x)^T \theta = \sum_{j=0}^{p} \phi(x)_j \theta_j $$

where $\phi$ is an arbitrary function from $x\in \mathbb{R}^d$ to $\phi(x) \in \mathbb{R}^{p+1}$. We could also denote these as a collection of separate feature $\phi_j$ feature functions from $x\in \mathbb{R}^d$ to $\phi_j(x) \in \mathbb{R}$:

$$ \phi(x) = \left[\phi_0(x), \phi_1(x), \ldots, \phi_p(x) \right] $$

We often refer to these $\phi_j$ as feature functions and their design plays a critical role in both how we capture prior knowledge and our ability to fit complicated data

Introducing Non-linear Features

In the following, we will add a few different sine functions at different frequencies and offsets.

$$ \sin\left(2 \pi * \textbf{frequency}X + \textbf{phase}\right) $$

Note that for this to remain a linear model, we cannot make the frequency or phase of the sine function a model parameter. In fact, these are actually hyperparameters of the model that would need to be tuned using either domain knowledge or other search procedures.

Notice that to make predictions I need to actually apply the $\Phi$ feature function to my data.

There is still some curvature. We can introduce additional polynomial terms to try to improve the fit of our model.

Can you guess the new number of features?

Let's build a feature function that combines our features so far:

Success!!!!!