In [1]:

```
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.offline as py
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
# import cufflinks as cf
# cf.set_config_file(offline=True, sharing=False, theme='ggplot');
from scipy.optimize import minimize
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
```

In [2]:

```
# Formatting options
# Big font helper
def adjust_fontsize(size=None):
SMALL_SIZE = 8
MEDIUM_SIZE = 10
BIGGER_SIZE = 12
if size != None:
SMALL_SIZE = MEDIUM_SIZE = BIGGER_SIZE = size
plt.rcParams['font.size'] = SMALL_SIZE
plt.rcParams['axes.titlesize'] = SMALL_SIZE
plt.rcParams['axes.labelsize'] = MEDIUM_SIZE
plt.rcParams['xtick.labelsize'] = SMALL_SIZE
plt.rcParams['ytick.labelsize'] = SMALL_SIZE
plt.rcParams['legend.fontsize'] = SMALL_SIZE
plt.rcParams['figure.titlesize'] = BIGGER_SIZE
def savefig(fname):
if not SAVE_FIGURES_FLAG:
# Avoid memory overload
return
if not os.path.exists("images"):
os.mkdir("images")
fig = plt.gcf()
fig.patch.set_alpha(0.0)
plt.savefig(f"images/{fname}.png", bbox_inches = 'tight');
plt.rcParams['lines.linewidth'] = 3
plt.style.use('fivethirtyeight')
sns.set_context("talk")
sns.set_theme()
adjust_fontsize(20)
SAVE_FIGURES_FLAG = False
```

In [3]:

```
toy_df = pd.DataFrame({"x": [-1, -.75, -.5, -.25, .3, .4, 1, 1.2, 3],
"y": [ 0, 0, 0, 0, 1, 1, 1, 1, 1]})
toy_df.sort_values("x")
```

Out[3]:

x | y | |
---|---|---|

0 | -1.00 | 0 |

1 | -0.75 | 0 |

2 | -0.50 | 0 |

3 | -0.25 | 0 |

4 | 0.30 | 1 |

5 | 0.40 | 1 |

6 | 1.00 | 1 |

7 | 1.20 | 1 |

8 | 3.00 | 1 |

In [4]:

```
plt.scatter(data=toy_df, x='x', y='y', s=100);
plt.xlabel('Data')
plt.ylabel('Class');
```

The classification task includes the following steps:

- Obtaining training data and selecting features for the classification task,
- Learning the classification activation value $z$ based on a weighted combination of input features: $z = f_\theta(x) = \theta.x$
- Using classification activation along with the decision rule to classify datapoints.

If we choose step function with decision threshold of 0 (sign function), the following classification rule will be obtained:

$$ \text{classify}(x) = \begin{cases} 1, &\quad\text{if}\ \ f_\theta(x) \geq 0 \\ 0, &\quad\text{if}\ \ f_\theta(x) < 0 \end{cases}$$In [5]:

```
color_condition = (((toy_df['x']<0) & (toy_df['y']==0)) |
((toy_df['x']>=0) & (toy_df['y']==1))).astype(int)
color = np.where((color_condition==False),'r', np.where((color_condition==True),'g', 'r'))
plt.scatter(data=toy_df, x='x', y='y', s=100, c=color)
plt.xlabel('Data')
plt.ylabel('Class')
plt.step([-3,0,4], [0, 1, 1], where='post', color='b')
plt.title("Binary Classification with Sign Function Decision Rule");
```

What if instead of a 0-1 output, we predict the probability of each data point belonging to the class 0 or class 1 and use the 50\% probability threshold to classify datapoints? The classifier in the probabilistic model will be:

$$ \text{classify}(x) = \begin{cases} 1, &\quad\text{if}\ \ P_\theta(Y = 1 | x) \geq 0.5 \\ 0, &\quad\text{if}\ \ P_\theta(Y = 1 | x) < 0.5 \end{cases}$$This approach enables us to evaluate the quality of the model based on:

- How close the probability of datapoints for class 1 are to 100\%.
- How close the probability of datapoints for class 0 are to 0\%

Our goal in a probabilistic model is to train weights for the model that returns probability values $P_\hat{\theta}(Y = 1 | x)$. This probability function is a conditional probability function that outputs a probability value in the ranges of $[0,1]$ having seen the datapoint $x$.

We follow similar steps as before:

- Obtaining training data and selecting features for the classification task,
- Learning the classification activation value $z$ based on a weighted combination of input features: $z = f_\theta(x) = \theta.x$
- Using a probabilistic function $P_\theta(Y = 1 | x) = P(Y = 1 | z)$ to output a probability value between 0 and 1. The 50\% decision rule is used to classify datapoints
- Classify into class 0 if $P_\theta(Y = 1 | x) = P(Y = 1 | z) < 0.5$, and
- Classify into class 1 if $P_\theta(Y = 1 | x) = P(Y = 1 | z) >= 0.5$.

The function that we are interested in finding has one input $z = \theta.x$, and outputs a probability. So the domain of the function is $(-\infty,+\infty)$ and the range of the function is $[0,1]$. What function has this attribute? Let's find out.

Odds is the probability of something happening divided by the probability of it not happening! If $p$ is the probability of the event hapenning, then:

$$\text{odds} = \frac{p}{1-p}$$The odds function has probability as its input and hence its the domain is $[0,1)$ but the range of the odds function is $[0,+\infty)$. Let's see the shape of the odds function.

In [6]:

```
def odds(p):
return p/(1-p)
p = np.linspace(0, 0.99, 1000)
odds_p = odds(p)
fig = px.line(x=p, y=odds_p, title="Odds Ratio")
fig.update_xaxes(title="Probability")
fig.update_yaxes(title="Odds Ratio")
fig.show()
```