import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(23) #kallisti
plt.rcParams['figure.figsize'] = (4, 4)
plt.rcParams['figure.dpi'] = 150
sns.set()
rectangle = pd.read_csv("data/rectangle_data.csv")
rectangle.tail(5)
width | height | area | perimeter | |
---|---|---|---|---|
95 | 8 | 5 | 40 | 26 |
96 | 8 | 7 | 56 | 30 |
97 | 1 | 4 | 4 | 10 |
98 | 1 | 6 | 6 | 14 |
99 | 2 | 6 | 12 | 16 |
# center data
X = rectangle - np.mean(rectangle, axis=0)
Singular value decomposition is a numerical technique to automatically decompose matrix into two matrices. Given an input matrix X, SVD will return $U\Sigma$ and $V^T$ such that $ X = U \Sigma V^T $. (np.linalg.svd
documentation)
U, S, Vt = np.linalg.svd(X, full_matrices = False)
The SVD routine returns $U$ and $\Sigma$ as two separate variables.
pd.DataFrame(U) # nicer printing with DataFrames
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | -0.133910 | 0.005930 | 0.034734 | -0.296836 |
1 | 0.086354 | -0.079515 | 0.014948 | 0.711478 |
2 | 0.117766 | -0.128963 | 0.085774 | -0.065318 |
3 | -0.027274 | 0.183177 | 0.010895 | -0.031055 |
4 | -0.258806 | -0.094295 | 0.090270 | -0.032818 |
... | ... | ... | ... | ... |
95 | -0.092321 | 0.052007 | 0.029907 | -0.065218 |
96 | -0.175499 | -0.040147 | 0.039560 | -0.056327 |
97 | 0.109202 | -0.109114 | 0.013259 | -0.051000 |
98 | 0.092073 | -0.069417 | -0.131771 | -0.048640 |
99 | 0.059790 | -0.058653 | -0.107984 | -0.074241 |
100 rows × 4 columns
S
array([1.97388075e+02, 2.74346257e+01, 2.32626119e+01, 9.22425467e-15])
pd.DataFrame(Vt)
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | -0.098631 | -0.072956 | -9.312257e-01 | -0.343173 |
1 | 0.668460 | -0.374186 | -2.583754e-01 | 0.588548 |
2 | 0.314625 | -0.640483 | 2.570230e-01 | -0.651715 |
3 | 0.666667 | 0.666667 | 1.110223e-16 | -0.333333 |
The two key pieces of the decomposition are $U\Sigma$ and $V^T$, which we can think of for now as analogous to our 'data' and 'transformation operation' from our manual decomposition earlier.
As we did before with our manual decomposition, we can recover our original rectangle data by multiplying the left matrix $U\Sigma$ by the right matrix $V^T$.
pd.DataFrame(U @ np.diag(S) @ Vt)
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | 2.97 | 1.35 | 24.78 | 8.64 |
1 | -3.03 | -0.65 | -15.22 | -7.36 |
2 | -4.03 | -1.65 | -20.22 | -11.36 |
3 | 3.97 | -1.65 | 3.78 | 4.64 |
4 | 3.97 | 3.35 | 48.78 | 14.64 |
... | ... | ... | ... | ... |
95 | 2.97 | 0.35 | 16.78 | 6.64 |
96 | 2.97 | 2.35 | 32.78 | 10.64 |
97 | -4.03 | -0.65 | -19.22 | -9.36 |
98 | -4.03 | 1.35 | -17.22 | -5.36 |
99 | -3.03 | 1.35 | -11.22 | -3.36 |
100 rows × 4 columns
Original data for reference:
X
width | height | area | perimeter | |
---|---|---|---|---|
0 | 2.97 | 1.35 | 24.78 | 8.64 |
1 | -3.03 | -0.65 | -15.22 | -7.36 |
2 | -4.03 | -1.65 | -20.22 | -11.36 |
3 | 3.97 | -1.65 | 3.78 | 4.64 |
4 | 3.97 | 3.35 | 48.78 | 14.64 |
... | ... | ... | ... | ... |
95 | 2.97 | 0.35 | 16.78 | 6.64 |
96 | 2.97 | 2.35 | 32.78 | 10.64 |
97 | -4.03 | -0.65 | -19.22 | -9.36 |
98 | -4.03 | 1.35 | -17.22 | -5.36 |
99 | -3.03 | 1.35 | -11.22 | -3.36 |
100 rows × 4 columns
pc1 = S[0]*U[:, 0]
pd.DataFrame(pc1)
0 | |
---|---|
0 | -26.432217 |
1 | 17.045285 |
2 | 23.245695 |
3 | -5.383546 |
4 | -51.085217 |
... | ... |
95 | -18.223108 |
96 | -34.641325 |
97 | 21.555166 |
98 | 18.174109 |
99 | 11.801777 |
100 rows × 1 columns
Approach 2:
pd.DataFrame(X @ (Vt[0,:]).T)
0 | |
---|---|
0 | -26.432217 |
1 | 17.045285 |
2 | 23.245695 |
3 | -5.383546 |
4 | -51.085217 |
... | ... |
95 | -18.223108 |
96 | -34.641325 |
97 | 21.555166 |
98 | 18.174109 |
99 | 11.801777 |
100 rows × 1 columns
pd.DataFrame(U[:, 0:1] @ np.diag(S[0:1]) @ Vt[0:1,:])
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | 2.607034 | 1.928383 | 24.614360 | 9.070835 |
1 | -1.681193 | -1.243552 | -15.873008 | -5.849490 |
2 | -2.292745 | -1.695908 | -21.646989 | -7.977306 |
3 | 0.530984 | 0.392761 | 5.013297 | 1.847490 |
4 | 5.038583 | 3.726962 | 47.571869 | 17.531091 |
... | ... | ... | ... | ... |
95 | 1.797362 | 1.329481 | 16.969827 | 6.253687 |
96 | 3.416707 | 2.527285 | 32.258893 | 11.887984 |
97 | -2.126006 | -1.572574 | -20.072725 | -7.397161 |
98 | -1.792530 | -1.325906 | -16.924198 | -6.236872 |
99 | -1.164020 | -0.861008 | -10.990118 | -4.050057 |
100 rows × 4 columns
Original data for reference:
X
width | height | area | perimeter | |
---|---|---|---|---|
0 | 2.97 | 1.35 | 24.78 | 8.64 |
1 | -3.03 | -0.65 | -15.22 | -7.36 |
2 | -4.03 | -1.65 | -20.22 | -11.36 |
3 | 3.97 | -1.65 | 3.78 | 4.64 |
4 | 3.97 | 3.35 | 48.78 | 14.64 |
... | ... | ... | ... | ... |
95 | 2.97 | 0.35 | 16.78 | 6.64 |
96 | 2.97 | 2.35 | 32.78 | 10.64 |
97 | -4.03 | -0.65 | -19.22 | -9.36 |
98 | -4.03 | 1.35 | -17.22 | -5.36 |
99 | -3.03 | 1.35 | -11.22 | -3.36 |
100 rows × 4 columns