**by Joseph Gonzalez (Spring 2020)**

In this notebook we will explore a key part of data science, **feature engineering**: *the process of transforming the representation of model inputs to enable better model approximation.* Feature engineering enables you to:

**encode**non-numeric features to be used as inputs to common numeric models- capture
**domain knowledge**(e.g., the perceived loudness or sound is the log of the intensity) **transform**complex relationships into simple linear relationships

In the past few lectures we have been exploring various models for **regression**. These are models from some domain to a continuous quantity.

So far we have been interested in modeling relationships from some numerical **domain** to a continuous quantitative **range**:

In this class we will focus on **Multiple Regression** in which we consider mappings from potentially high-dimensional input spaces onto the real line (i.e., $y \in \mathbb{R}$):

It is worth noting that this is distinct from **Multivariate Regression** in which we are predicting multiple (confusing?) response values (e.g., $y \in \mathbb{R}^q$).

As usual, we will import a standard set of functions.

In [1]:

```
import numpy as np
import pandas as pd
```

In [2]:

```
import plotly.offline as py
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
import cufflinks as cf
cf.set_config_file(offline=True, sharing=False, theme='ggplot');
```

In [3]:

```
from sklearn.linear_model import LinearRegression
```

Linear models are **linear combinations** of features. These models are therefore linear in the **parameters** but not necessarily the underlying data. We can encode non-linearity in our data through the use of feature functions:

where $\phi$ is an *arbitrary function* from $x\in \mathbb{R}^d$ to $\phi(x) \in \mathbb{R}^{p+1}$. Notationally, we might right these as a collection of separate feature $\phi_j$ feature functions from $x\in \mathbb{R}^d$ to $\phi_j(x) \in \mathbb{R}$:

We often refer to these $\phi_j$ as **feature functions** and their design plays a critical role in both how we capture prior knowledge and our ability to fit complicated data.

To demonstrate the power of feature engineering let's return to our earlier synthetic dataset.

In [4]:

```
synth_data = pd.read_csv("data/synth_data.csv.zip")
synth_data.head()
```

Out[4]:

This dataset is simple enough that we can easily visualize it.

In [5]:

```
fig = go.Figure()
data_scatter = go.Scatter3d(x=synth_data["X1"], y=synth_data["X2"], z=synth_data["Y"],
mode="markers",
marker=dict(size=2))
fig.add_trace(data_scatter)
fig.update_layout(margin=dict(l=0, r=0, t=0, b=0),
height=600)
fig
```

**Is the relationship between $y$ and $x_1$ and $x_2$ linear?**

Previously we fit a linear model to the data using SKlearn.

In [6]:

```
model = LinearRegression()
model.fit(synth_data[["X1", "X2"]], synth_data[["Y"]])
```

Out[6]:

Visualizing the model we obtained:

In [7]:

```
def plot_plane(f, X, grid_points = 30):
u = np.linspace(X[:,0].min(),X[:,0].max(), grid_points)
v = np.linspace(X[:,1].min(),X[:,1].max(), grid_points)
xu, xv = np.meshgrid(u,v)
X = np.vstack((xu.flatten(),xv.flatten())).transpose()
z = f(X)
return go.Surface(x=xu, y=xv, z=z.reshape(xu.shape),opacity=0.8)
```

In [8]:

```
fig = go.Figure()
fig.add_trace(data_scatter)
fig.add_trace(plot_plane(model.predict, synth_data[["X1", "X2"]].to_numpy(), grid_points=5))
fig.update_layout(margin=dict(l=0, r=0, t=0, b=0),
height=600)
```

This wasn't a bad fit but there is definitely more structure.

Examining the above data we see that there is some **periodic** structure. Let's define a feature function that might try to capture this periodic structure. In the following will add a few different sine functions at different frequences and offsets. Note that for this to remain a linear model, I cannot make the frequence or phase of the sine function a model parameter. Recall in previous lectures we actually made the frequency and phase a parameter of the model and then we were required to used gradient descent to compute the loss minimizing parameter values.

In [9]:

```
def phi_periodic(X):
return np.hstack([
X,
np.sin(X),
np.sin(10*X),
np.sin(20*X),
np.sin(X + 1),
np.sin(10*X + 1),
np.sin(20*X + 1)
])
```

Creating the original $\mathbb{X}$ and $\mathbb{Y}$ matrices:

In [10]:

```
X = synth_data[["X1", "X2"]].to_numpy()
Y = synth_data[["Y"]].to_numpy()
```

Constructing the $\Phi$ matrix:

In [11]:

```
Phi = phi_periodic(X)
```

In [12]:

```
Phi.shape
```

Out[12]:

Fitting the **linear model** to the transformed features:

In [13]:

```
model_phi = LinearRegression()
model_phi.fit(Phi, Y)
```

Out[13]:

In [14]:

```
def predict_phi(X):
return model_phi.predict(phi_periodic(X))
```

In [15]:

```
fig = go.Figure()
fig.add_trace(data_scatter)
fig.add_trace(plot_plane(predict_phi, X, grid_points=100))
fig.update_layout(margin=dict(l=0, r=0, t=0, b=0),
height=600)
```