Matrix inversion and decomposition

class: center, middle, inverse, title-slide

.title[
# Matrix inversion and decomposition
]
.author[
### <a href="https://jmclip.github.io/MACSS_math_camp/">MACS 33000</a> <br /> University of Chicago
]

---

# Learning objectives

* Matrix transposition 
* Our pal, the diagonal: trace and determinant 
* Define matrix inversion
* Demonstrate how to solve systems of linear equations using matrix inversion
* Define the determinant of a matrix
* Define matrix decomposition
* Explain singular value decomposition and demonstrate the applicability of matrix algebra to real-world problems

---

# Matrix transposition

Transposing a matrix means *flipping* it in a sense -- swapping across the diagonal.

`$$\mathbf{X} = \begin{pmatrix} x_{11} & x_{12} & \ldots & x_{1n} \\ x_{21} & x_{22} & \ldots & x_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & \ldots & x_{mn} \\ \end{pmatrix}$$`

`$$\mathbf{X}' = \begin{pmatrix}  x_{11} & x_{21} & \ldots & x_{m1} \\ x_{12} & x_{22} & \ldots & x_{m2} \\ \vdots & \vdots & \ddots & \vdots \\ x_{1n} & x_{2n} & \ldots & x_{mn}\end{pmatrix}\\$$`

---
# Diagonal measures: trace and determinant

* **Trace** gives us information about the magnitude of the diagonal of a matrix: you simply add the values along the diagonal. 
  * For example, the trace of an identity matrix is equal to the number of rows (e.g.: `$tr(\mathbf{I}_2)=2$` )

* **Determinant**: summarizes structure of entire matrix. Easy for `$2\times 2$` matrix but a pain for larger ones.
  * **submatrix**: akin to a subset: smaller matrix from original when deleting a row / column
  * **minor**: determinant of submatrix (relates to deleting particular row/column)
  * **cofactor**: product of minor and signed element removed from the matrix
---

# Determinant

* Single number summary of a square matrix
* Summary of structure: helps us understand (loosely speaking) how much space our vectors are taking up (more on this soon!)
* Complicated to calculate: generally, multiply down the diagonal and subtract products along opposite diagonal (examples coming!)
* Referred to using single bars `$|\mathbf{X}|$` or `$det(\mathbf{X}$` )

---

# `$2 \times 2$` matrix

`$$\det(\mathbf{X}) = \mid \mathbf{X} \mid = \left| \begin{matrix}
x_{11} & x_{12} \\
x_{21} & x_{22}
\end{matrix} \right| = x_{11}x_{22} - x_{12}x_{21}$$`

`$$\left| \begin{matrix}
1 & 2 \\
3 & 4
\end{matrix} \right| = (1)(4) - (2)(3) = 4 - 6 = -2$$`

--
`$$\left| \begin{matrix}
10 & \frac{1}{2} \\
4 & 1
\end{matrix} \right| = (10)(1) - \left( \frac{1}{2} \right)(4) = 10 - 2 = 8$$`

`$$\left| \begin{matrix}
2 & 3 \\
6 & 9
\end{matrix} \right| = (2)(9) - (3)(6) = 18 - 18 = 0$$`

* Pay attention to if/when the determinant is zero!

---

# `$n \times n$` matrix: the submatrix
Unfortunately calculating determinants gets much more involved with square matrices larger than `$2 \times 2$`. First we need to define a **submatrix**. The submatrix is simply a form achieved by deleting rows and/or columns of a matrix, leaving the remaining elements in their respective places. So for the matrix `$\mathbf{X}$`, notice the following submatrices:

`$$\mathbf{X} = \begin{bmatrix} x_{11} & x_{12} & x_{13} & x_{14} \\
x_{21} & x_{22} & x_{23} & x_{24} \\
x_{31} & x_{32} & x_{33} & x_{34} \\
x_{41} & x_{42} & x_{43} & x_{44} \\
\end{bmatrix}$$`

`$$\mathbf{X}_{[11]} = \begin{bmatrix}
x_{22} & x_{23} & x_{24} \\
x_{32} & x_{33} & x_{34} \\
x_{42} & x_{43} & x_{44} \\
\end{bmatrix},
\mathbf{X}_{[24]} = \begin{bmatrix}
x_{11} & x_{12} & x_{13}  \\
x_{31} & x_{32} & x_{33}  \\
x_{41} & x_{42} & x_{43}  \\
\end{bmatrix}$$`

---

# `$n \times n$` matrix

`$$\mid \mathbf{X} \mid = \sum_{j=1}^n (-1)^{i+j} x_{ij} \mid\mathbf{X}_{[ij]}\mid$$`
  * NOTE: we can add across any one row or column (just be consistent)

* `$ij$`th **minor** of `$\mathbf{X}$` for `$x_{ij}$`, `$\mid\mathbf{X}_{[ij]}\mid$`
    * Determinant of the `$(n - 1) \times (n - 1)$` submatrix that results from taking the `$i$`th row and `$j$`th column out
    * Refer to this resulting submatrix at `$\mathbf{M_{ij}}$`
* **Cofactor** of `$\mathbf{X}$`
    * Minor signed as `$(-1)^{i+j}\mid\mathbf{X}_{[ij]}\mid$`
* Cycle recursively through columns

---

# Determinant: `$3 \times 3$` matrix

`$$\begin{aligned}
\mathbf{X} &= \begin{bmatrix}
x_{11} & x_{12} & x_{13} \\
x_{21} & x_{22} & x_{23} \\
x_{31} & x_{32} & x_{33} \\
\end{bmatrix} \\
\det(\mathbf{X}) &= (-1^{1+1})x_{11} \left| \begin{matrix}
x_{22} & x_{23} \\
x_{32} & x_{33} \\
\end{matrix} \right| +(-1^{1+2})x_{12} \left| \begin{matrix}
x_{21} & x_{23} \\
x_{31} & x_{33} \\
\end{matrix} \right| + (-1^{1+3})x_{13} \left| \begin{matrix}
x_{21} & x_{22} \\
x_{31} & x_{32} \\
\end{matrix} \right|\end{aligned}$$`
`$$= (-1)^2x_{11}\mathbf{M_{11}}+(-1)^3x_{12}\mathbf{M_{12}}+(-1)^4x_{13}\mathbf{M_{13}}$$`
--
NO YOU DO NOT NEED TO BE ABLE TO DO THIS GENERALLY (just for the homework today!)

---

# Cofactor matrix

You can also create the cofactor matrix using your work from the determinant work on the prior slide:

`$$\mathbf{C}_{ij}=(-1)^{i+j}\mathbf{M}_{ij}$$`
---
# Example: `$3 \times 3 matrix$`

`$$\begin{aligned}\begin{bmatrix}1 & 2 & 3 \\ 
2 & 1 & 3 \\ 
4 & 1 & 2\end{bmatrix}\end{aligned}$$`
* Compose into smaller sub-matrices
* Find cofactors
* Find determinant of each submatrix
* Create cofactor matrix
* Find determinant

---

# Answer: `$3 \times 3 matrix$`

`$$\begin{aligned}\begin{bmatrix}1 & 2 & 3 \\ 2 & 1 & 3 \\ 4& 1 & 2\end{bmatrix}\end{aligned}$$`
* Compose into smaller sub-matrices
* Find determinant of each submatrix

`$\mathbf{M_{11}} = \left|\begin{bmatrix} 1 & 3 \\ 1 & 2 \end{bmatrix}\right|= 2-3 =-1$`;
`$\mathbf{M_{12}} = \left|\begin{bmatrix} 2 &  3\\ 4 &2  \end{bmatrix}\right|=4-12=-8$`;
`$\mathbf{M_{13}} = \left|\begin{bmatrix}  2 &1  \\ 4 &1  \end{bmatrix}\right|=2-4=-2$`;
`$\mathbf{M_{21}} = \left|\begin{bmatrix}2  &3  \\ 1 &2  \end{bmatrix}\right|=4-3=1$`;
`$\mathbf{M_{22}} = \left|\begin{bmatrix} 1 & 3 \\ 4 & 2 \end{bmatrix}\right|=2-12=-10$`;
`$\mathbf{M_{23}} = \left|\begin{bmatrix}1  &2  \\ 4 & 1 \end{bmatrix}\right|=1-8=-7$`;
`$\mathbf{M_{31}} = \left|\begin{bmatrix} 2 & 3 \\ 1 & 3 \end{bmatrix}\right|=6-3=3$`;
`$\mathbf{M_{32}} = \left|\begin{bmatrix}  1&3  \\ 2 & 3 \end{bmatrix}\right|=3-6=-3$`;
`$\mathbf{M_{33}} = \left|\begin{bmatrix}  1& 2 \\ 2 & 1 \end{bmatrix}\right|=1-4=-3$`

---
# Answer: `$3 \times 3 matrix$`
* Create cofactor matrix
`$$\begin{aligned}\begin{bmatrix}
1 \mathbf{M_{11}} & -1\mathbf{M_{12}} & 1\mathbf{M_{13}} \\
-1 \mathbf{M_{21}}& 1\mathbf{M_{22}} & -1\mathbf{M_{23}} \\
1 \mathbf{M_{31}}& -1\mathbf{M_{32}} & 1\mathbf{M_{33}}
\end{bmatrix}\end{aligned}$$`

`$$\begin{aligned}\begin{bmatrix}
1 (-1) & -1(-8) & 1 (-2) \\
-1 (1)& 1(-10)  & -1 (-7) \\
1 (3) & -1(-3)  & 1 (-3)
\end{bmatrix} = \begin{bmatrix}
-1 & 8& -2 \\
-1& -10  & -7 \\
3 & 3  & -3
\end{bmatrix}\end{aligned}$$`
---

# Answer: `$3 \times 3 matrix$`

`$$\mid \mathbf{X} \mid = \sum_{j=1}^n (-1)^{i+j} x_{ij} \mid\mathbf{X}_{[ij]}\mid$$`

Let's live life on the edge and each compute the determiniant differently. I will show TWO examples: row two and column three (EXCITEMENT!).

Recall that what we do is we basically choose a row or column, and then multiply the corresponding row/column from the original matrix.

---
# Example: row (i =2)
`$$\begin{aligned}\mathbf{A}=
\begin{bmatrix}1 & 2 & 3 \\ 
2 & 1 & 3 \\ 
4 & 1 & 2\end{bmatrix}, 
\mathbf{C} =\begin{bmatrix}
-1 & 8& -2 \\
-1& -10  & 7 \\
3 & 3  & -3
\end{bmatrix}\end{aligned}$$`

Determinant: 
`$$\begin{aligned}
|\mathbf{A}|&=a_{21}\mathbf{C_{21}}+ a_{22}\mathbf{C_{22}} + a_{23}\mathbf{C_{23}}\\
&=2*-1+1*-10+3*7\\
&=-2-10+21\\
&=9 \end{aligned}$$`

Alternatively, for `$3 x 3$` ONLY: `$det(\mathbf{A}) = a_{11}a_{22}a_{33} − a_{11}a_{23}a_{32} − a_{12}a_{21}a_{33} +a_{12}a_{23}a_{31} + a_{13}a_{21}a_{32} − a_{13}a_{22}a_{31}$`

---
# Example: column (j = 2)
`$$\begin{aligned}\mathbf{A}=
\begin{bmatrix}1 & 2 & 3 \\ 
2 & 1 & 3 \\ 
4 & 1 & 2\end{bmatrix}, 
\mathbf{C} =\begin{bmatrix}
-1 & 8& -2 \\
-1& -10  & 7 \\
3 & 3  & -3
\end{bmatrix}\end{aligned}$$`

Determinant: 
`$$\begin{aligned}|\mathbf{A}|
&=a_{21}\mathbf{C_{21}}+ a_{22}\mathbf{C_{22}} + a_{23}\mathbf{C_{23}} \\
&=2*8 + 1*-10+1*3\\
&=16-10+3 \\
&=9\end{aligned}$$`
---
## Determinants, Minors, Cofactors

Help you break down the matrix, learn about its components, get a sense of it in space.

* Zero determinant is important (ominous foreshadowing)

---

class: middle, inverse, center

# Inverting matrices

---

#Why is this useful? Matrix inversion is necessary to:

* Solve systems of equations
* Perform linear regression
* Provide intuition about **colinearity**, **fixed effects**, and other relevant design matrices for social scientists.

---

# Matrix inversion
The inverse of a matrix is akin to dividing by itself (although since these are matrices, it's obviously more complicated than just dividing!)

* `$\mathbf{X}$` is an `$n \times n$` matrix
* Find the matrix `$\mathbf{X}^{-1}$` such that

`$$\mathbf{X}^{-1} \mathbf{X} = \mathbf{X} \mathbf{X}^{-1} = \mathbf{I}$$`

where `$\mathbf{I}$` is the `$n \times n$` identity matrix

If `$\mathbf{A} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}$` and `$ad \neq bc$`, then `$\mathbf{A}$` is invertible and

`$$\mathbf{A}^{-1} = \frac{1}{ad - bc} \begin{bmatrix}
d & -b \\
-c & a
\end{bmatrix}$$`

---
# Simple matrix inversion example:
For example

`$$\begin{aligned}
\mathbf{A} &= \begin{bmatrix}
9 & 7 \\
2 & 1
\end{bmatrix} \\
\mathbf{A}^{-1} &= \frac{1}{(-5)} \begin{bmatrix}
1 & -7 \\
-2 & 9
\end{bmatrix} = \begin{bmatrix}
-0.2 & 1.4 \\
0.4 & -1.8
\end{bmatrix}
\end{aligned}$$`

We can verify by

`$$\mathbf{A}^{-1} \mathbf{A} = \begin{bmatrix}
9 & 7 \\
2 & 1
\end{bmatrix} \begin{bmatrix}
-0.2 & 1.4 \\
0.4 & -1.8
\end{bmatrix} = \begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix} = \mathbf{I}$$`

---

# Calculating matrix inversions

`$$\begin{aligned}
x_{1} + x_{2} + x_{3} &= 0 \\
0x_{1} 	+ 	5x_{2} + 0x_{3}  & = 5 \\
0 x_{1} + 0 x_{2} + 3 x_{3} & =  6 \\
\end{aligned}$$`

`$$\begin{aligned}
\mathbf{A}  &= \begin{bmatrix} 1 & 1 & 1 \\ 0 & 5 & 0 \\ 0 & 0 & 3 \end{bmatrix} \\
\mathbf{x} &= (x_{1} , x_{2}, x_{3} ) \\
\mathbf{b} &= (0, 5, 6)
\end{aligned}$$`

* System of equations

`$$\mathbf{A}\mathbf{x} =\mathbf{b}$$`

* `$\mathbf{A}^{-1}$` exists **if and only if** `$\mathbf{A}\mathbf{x}  =  \mathbf{b}$` has only one solution

---

# Invertible matrices

* Suppose `$\mathbf{X}$` is an `$n \times n$` matrix
* Call `$\mathbf{X}^{-1}$` the **inverse** of `$\mathbf{X}$` if

`$$\mathbf{X}^{-1} \mathbf{X} = \mathbf{X} \mathbf{X}^{-1} = \mathbf{I}$$`

* If `$\mathbf{X}^{-1}$` exists then `$\mathbf{X}$` is invertible
* If `$\mathbf{X}^{-1}$` does not exist, `$\mathbf{X}$` is **singular**

---

# Inverting a matrix

```
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    0    5    0
## [3,]    0    0    3
## [1] "inverse of A"
##      [,1] [,2]   [,3]
## [1,]    1 -0.2 -0.333
## [2,]    0  0.2  0.000
## [3,]    0  0.0  0.333
```

* `$\mathbf{x}$`

```
## [1] -3  1  2
```

---

# When do inverses exist

* Linear independence

Remember when our determinant was zero? 
`$$\left| \begin{matrix}
2 & 3 \\
6 & 9
\end{matrix} \right| = (2)(9) - (3)(6) = 18 - 18 = 0$$`

Here, we can see that our two vectors are not linearly independent.(for example, try `$x=-3, y=2$`)

Thus, our determinant is zero. If we don't have linear independence, we can't take an inverse (singular matrix). [THIS IS WHY WE CARE ABOUT COLLINEARITY!]

---

# Inverse matrices examples:

### Example 1

`$$\mathbf{v}_{1} = (1, 0, 0), \quad \mathbf{v}_{2} = (0,1,0), \quad \mathbf{v}_{3} = (0,0,1)$$`
Thus, we can consider these vectors as linearly independent and check to see whether they are invertible. (surprise: it is...but returns itself!)
--

### Example 2

`$$\mathbf{v}_{1} = (1, 0, 0), \quad \mathbf{v}_{2} = (0,1,0), \quad \mathbf{v}_{3} = (0,0,1), \quad \mathbf{v}_{4} = (1, 2, 3)$$`

`$$\mathbf{v}_{4} = \mathbf{v}_{1} + 2 \mathbf{v}_{2} + 3\mathbf{v}_{3}$$`
--
In this situation, we would want to combine these into a matrix. However, if we wanted to do transformations, a) the matrix is not square and b) because they are linearly dependent, our determinant would be zero even if we *could* calculate it (but we can't). 
---

# Inverting a `$2 \times 2$` matrix

* If `$\mathbf{A} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}$` and `$ad \neq bc$`, then `$\mathbf{A}$` is invertible and

`$$\mathbf{A}^{-1} = \frac{1}{ad - bc} \begin{bmatrix}d & -b \\-c & a\end{bmatrix}$$`
    
  --
`$$\mathbf{A}^{-1} = \frac{1}{det(\mathbf{A})} \begin{bmatrix}d & -b \\-c & a\end{bmatrix}$$`
---

# Inverting a `$2 \times 2$` matrix

`$$\mathbf{A} = \begin{bmatrix}9 & 7 \\2 & 1\end{bmatrix} \\\mathbf{A}^{-1} = \frac{1}{(-5)} \begin{bmatrix}1 & -7 \\-2 & 9\end{bmatrix} = \begin{bmatrix}-0.2 & 1.4 \\0.4 & -1.8\end{bmatrix}$$`

## Verification

`$$\mathbf{A}^{-1} \mathbf{A} = \begin{bmatrix}9 & 7 \\2 & 1\end{bmatrix} \begin{bmatrix}-0.2 & 1.4 \\0.4 & -1.8\end{bmatrix} = \begin{bmatrix}1 & 0 \\0 & 1\end{bmatrix} = \mathbf{I}$$`

---

# Inverting an `$n \times n$` matrix
(this gets much less pleasant)

--
* Gauss-Jordan elimination
* Augmented matrix
* Elementary row operations (do in to get the identity on the left)
    1. Exchanging two rows in the matrix
    1. Subtracting a multiple of one row from another row
* Obtain a diagonal matrix

---

# Example of Gauss-Jordan

`$$\mathbf{A} = \begin{bmatrix}
2 & 1 & 2 \\
3 & 1 & 1 \\
3 & 1 & 2
\end{bmatrix}$$`

##### Setup the augmented matrix

`$$\left[\begin{array}{rrr|rrr}2 & 1 & 2 & 1 & 0 & 0 \\3 & 1 & 1 & 0 & 1 & 0 \\3 & 1 & 2 & 0 & 0 & 1\end{array}\right]$$`

##### Substract `$3/2$` times the first row from each of the other rows

`$$\left[\begin{array}{rrr|rrr}2 & 1 & 2 & 1 & 0 & 0 \\0 & -1/2 & -2 & -3/2 & 1 & 0 \\0 & -1/2 & -1 & -3/2 & 0 & 1\end{array}\right]$$`

---

# Example of Gauss-Jordan

##### Substract `$3/2$` times the first row from each of the other rows

`$$\left[\begin{array}{rrr|rrr}2 & 1 & 2 & 1 & 0 & 0 \\0 & -1/2 & -2 & -3/2 & 1 & 0 \\0 & -1/2 & -1 & -3/2 & 0 & 1\end{array}\right]$$`

##### Add twice the second row to the first row, and subtract the second row from the third row

`$$\left[\begin{array}{rrr|rrr}2 & 0 & -2 & -2 & 2 & 0 \\0 & -1/2 & -2 & -3/2 & 1 & 0 \\0 & 0 & 1 & 0 & -1 & 1\end{array}\right]$$`

---

# Example of Gauss-Jordan

##### Add twice the second row to the first row, and subtract the second row from the third row

`$$\left[\begin{array}{rrr|rrr}2 & 0 & -2 & -2 & 2 & 0 \\0 & -1/2 & -2 & -3/2 & 1 & 0 \\0 & 0 & 1 & 0 & -1 & 1\end{array}\right]$$`

##### Add twice the third row to the first and second rows

`$$\left[\begin{array}{rrr|rrr}2 & 0 & 0 & -2 & 0 & 2 \\0 & -1/2 & 0 & -3/2 & -1 & 2 \\0 & 0 & 1 & 0 & -1 & 1\end{array}\right]$$`
    
---

# Example of Gauss-Jordan

##### Add twice the third row to the first and second rows

`$$\left[\begin{array}{rrr|rrr}2 & 0 & 0 & -2 & 0 & 2 \\0 & -1/2 & 0 & -3/2 & -1 & 2 \\0 & 0 & 1 & 0 & -1 & 1\end{array}\right]$$`
--

##### Norm out the diagonal matrix to be `$\mathbf{I}$`
`$$\left[\begin{array}{rrr|rrr}1 & 0 & 0 & -1 & 0 & 1 \\0 & 1 & 0 & 3  & -2 & -4 \\0 & 0 & 1 & 0 & -1 & 1\end{array}\right]$$`
--
    
`$$\mathbf{A}^{-1} = \begin{bmatrix}-1 & 0 & 1 \\3 & 2 & -4 \\0 & -1 & 1\end{bmatrix}$$`

---

# Another example

`$$\mathbf{A} = \begin{bmatrix}
1 & 3 & 5 \\
1 & 7 & 2 \\
1 & 4 & 5
\end{bmatrix}$$`

##### First setup the augmented matrix

`$$\left[\begin{array}{rrr|rrr}1 & 3 & 5 & 1 & 0 & 0 \\1 & 7 & 2 & 0 & 1 & 0 \\1 & 4 & 5 & 0 & 0 & 1\end{array}\right]$$`
--
##### Subtract row 3 from row 2 and row 1 from row 3:

`$$\left[\begin{array}{rrr|rrr}1 & 3 & 5 & 1 & 0 & 0 \\0 & 3 & -3 & 0 & 1 & -1 \\0 & 1 & 0 & -1 & 0 & 1\end{array}\right]$$`
    
---

# Another example

##### Subtract row 2 from row 1; divide row 2 by 3; add rows 2 and 3

`$$\left[\begin{array}{rrr|rrr}1 & 0 & 8 & 1 & -1 & 1 \\0 & 1 & -1 & 0 & \frac{1}{3} & -\frac{1}{3} \\ 0 & 0 & 1 & -1 & -\frac{1}{3} & \frac{4}{3}\end{array}\right]$$`
    
##### Subtract 8 * row 3 from row 1; add row 2 to row 3

`$$\left[\begin{array}{rrr|rrr}1 & 0 & 0 & 9 & \frac{5}{3} & -\frac{29}{3} \\
0 & 1 & 0 & -1 & 0 & 1 \\ 0 & 0 & 1 & -1 & -\frac{1}{3} & \frac{4}{3}\end{array}\right]$$`

We found it!

---

# Alternative: Inverse matrix

`$$\frac{1}{det(A)}C^T$$`

Where `$C^T$` is the *transpose of the cofactor matrix* (aka adjoint matrix)

Note: by hand, really a `$3\times 3$` matrix is the absolute limit for finding an inverse, but even that can be quite time consuming. While we show you how to do this, you will likely never need to do this beyond a 'proof of concept' demonstration.

---

# Application: Regression analysis
In some methods/quantitative classes you learn about linear regression. For each `$i$` (individual) we observe covariates `$x_{i1}, x_{i2}, \ldots, x_{ik}$` and dependent variable `$Y_{i}$`. Then,

`$$\begin{aligned}
Y_{1} & = \beta_{0} + \beta_{1} x_{11} + \beta_{2} x_{12} + \ldots + \beta_{k} x_{1k} \\
Y_{2} & = \beta_{0} + \beta_{1} x_{21} + \beta_{2} x_{22} + \ldots + \beta_{k} x_{2k} \\
\vdots & \vdots & \vdots \\
Y_{i} & = \beta_{0} + \beta_{1} x_{i1} + \beta_{2} x_{i2} + \ldots + \beta_{k} x_{ik} \\
\vdots & \vdots & \vdots \\
Y_{n} & = \beta_{0} + \beta_{1} x_{n1} + \beta_{2} x_{n2} + \ldots + \beta_{k} x_{nk}
\end{aligned}$$`

* `$\mathbf{x}_{i} = (1, x_{i1}, x_{i2}, \ldots, x_{ik})$`

`$\mathbf{Y} = \begin{bmatrix}Y_{1} \\ Y_{2} \\  \vdots \\ Y_{n}\end{bmatrix}$`, 
 `$\boldsymbol{\beta} = \begin{bmatrix}\beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{k} \end{bmatrix}$`,
 `$\mathbf{X} = \begin{bmatrix} \mathbf{x}_{1}\\\mathbf{x}_{2}\\ \vdots \\ \mathbf{x}_{n} \end{bmatrix}$`

---

# Solve the system of equations

GOAL: find `$\beta$`
`$$\begin{aligned}
\mathbf{Y} &= \mathbf{X}\mathbf{\beta} \\\end{aligned}$$`

* Ok, so, we have an `$\mathbf{X}$`. We want to 'undo' it, so we need the inverse. Is it invertible? How do we make sure?

* Ah, yes, we multiply it by its transpose. It will certainly be square (and likely invertible) then

`$$\begin{aligned}
\mathbf{X}^{'} \mathbf{Y} &= \mathbf{X}^{'} \mathbf{X} \mathbf{\beta} \\\end{aligned}$$`

--
* Great! Now, we can 'undo' this (inverse-land)
--
`$$\begin{aligned}
(\mathbf{X}^{'}\mathbf{X})^{-1} \mathbf{X}^{'} \mathbf{Y} &= 
(\mathbf{X}^{'}\mathbf{X})^{-1}\mathbf{X}^{'} \mathbf{X} \mathbf{\beta} \\ \end{aligned}$$`
--

`$$\begin{aligned}
(\mathbf{X}^{'}\mathbf{X})^{-1} \mathbf{X}^{'} \mathbf{Y} &=\mathbf{I}\mathbf{\beta} 
\end{aligned}$$`

--
`$$\begin{aligned}
(\mathbf{X}^{'}\mathbf{X})^{-1} \mathbf{X}^{'} \mathbf{Y} &=\mathbf{\beta} 
\end{aligned}$$`

---

# Relevance of the determinant

Remember how we wanted to invert a `$2 \times 2$` matrix previously?

`$$\mathbf{A}^{-1} = \frac{1}{ad - bc} \begin{bmatrix}
d & -b \\
-c & a
\end{bmatrix}$$`

`$ad - bc$` is the formula for the determinant of a `$2 \times 2$` matrix! Recall that non-invertible (singular) matrices are square matrices which have columns or rows that are linearly dependent.

Note that singular matrices also have the unique property whereby their determinant is `$0$`. This is also important as we move into eigenvectors and diagonalization.

---
class: middle

# matrices: eigenvalues and vectors!!!!

---
# Eigen-wha?: WHAT ARE THEY?!!

I think about **eigenvectors** as the 'essence' of a vector -- meaning that transformations to an eigenvector don't 'move' it around / shift its direction. All it does is change length.

The **eigenvalue** is the length the vector changes in its transformation.

More formally, suppose we have a matrix `$\mathbf{X}$`. If we multiply that by our eigenvector `$\mathbf{v}$`, what we'll get out is simply a rescaled vector `$\mathbf{v}$`. How much will it be rescaled by? By some scalar amount of our eigenvalue, represented by `$\lambda$`.

---
# Calculating eigenvalues

An *eigenvector* of a matrix, `$\mathbf{A}$` is a nonzero vector \mathbf{v} where `$\mathbf{A}\mathbf{v}= \lambda \mathbf{v}$`, for some scalar `$\lambda$`.

An *eigenvalue* of `$\mathbf{A}$` is a scalar `$\lambda$` where `$\mathbf{A}\mathbf{v}= \lambda \mathbf{v}$` has a nontrivial solution.

cue: flashbacks from solving systems of equations!

---
# Eigenvalues: cont'd

To find eigenvalues, we can consider our equation, `$\mathbf{A}\mathbf{v}= \lambda \mathbf{v}$`.

We can move things around to find: `$0 = \lambda \mathbf{v}-\mathbf{A}\mathbf{v}$`
We can work in the identity matrix to make things a little nicer:

`$0 = \lambda \mathbf{I} \mathbf{v}-\mathbf{A}\mathbf{v}=(\lambda \mathbf{I} -\mathbf{A})\mathbf{v}$` (note: can also solve `$(\mathbf{A}- \lambda \mathbf{I})\mathbf{v}$`)

Additionally, this matrix needs a zero determinant, so we also know:
`$det(\lambda \mathbf{I}−\mathbf{A})=0$`

Example: start with `$\begin{pmatrix} 0 & 4 \\ 1 & 0 \end{pmatrix}$`.

`$det(\lambda \mathbf{I}−\mathbf{A}) = det(\begin{pmatrix}\lambda - 0 & -4 \\ -1 & \lambda - 0 \end{pmatrix})=0$`

`$(\lambda^2-4) = 0$`

`$\lambda \pm 2$`

So, `$\lambda$` is our eigenvalue(s) and we can plug this in to a system of equations to find our eigenvectors!

---
# Eigenvalues: becoming an eigenvector

We can start with our eigenvalue (let's begin with `$\lambda = -2$`). We then setup our system of equations use our matrix `$\lambda \mathbf{I}-\mathbf{A}$`, knowing that  `$(\lambda \mathbf{I}-\mathbf{A})\mathbf{v} =0$`.

`$\begin{pmatrix}\lambda & -4 \\ -1 & \lambda\end{pmatrix}\begin{pmatrix}v_1 \\ v_2\end{pmatrix}$` becomes

`$-2v_1 - 4v_2 =0 \\$`
`$-v_1 - 2v_2 = 0$`

We can rearrange to find that `$4v_2 = -2v_1$`, simplyfing to `$v_1 =-2v_2$`. Since these are going to have infinitely many solutions, we can choose an arbitrary value for one of our vectors, suppose `$v_2$` is 1 (start small and go from there!). Thus, we end up with `$v_1=-2$` and `$v_2 = 1$`.

By a similar process, we can substitute to find that `$2v_1 - 4v_2 = 0$`, arriving at `$v_1 = 2v_2$`. Again, we suppose `$v_2=1$` (for simplicity), and find that `$v_1=2$`

So, our eigenvectors are `$\begin{pmatrix} 2 \\ 1 \end{pmatrix}$` and `$\begin{pmatrix}-2 \\ 1\end{pmatrix}$`. (How we interpret them depends on what our question is!)

---
# Cool. So what?!

So glad you asked!!! We sometimes want to get an idea of what happens if we are reducing the thing we care about from a whole bunch of stuff to a few core directions / topics (cue PCA / topic analysis. )

Another way to consider eigenvalues and eigenvectors is as a way to understand how things are/not transformed and how (e.g. what are the most productive/effective ways of dealing with things)

---
# Which dimensions matter? 
(note: for illustrative purposes only!)

A way to conceptualize this is we want maximum information for minimum effort. Which dimensions are giving us the most variation here?

---
class: middle

#decomposition and dimension reduction

---
# Matrix decomposition

* Factorization of a matrix into a product of matrices
* Decomposed into more efficient matrices depending on the calculations needing to be performed.

## LU decomposition

`$$\mathbf{A} = \mathbf{L}\mathbf{U}$$`

* Only for square matrices

---

# Dimension reduction
*Dimension reduction* refers to decreasing the number of dimensions in your dataset. There are a couple reasons you might do this:

1. You want to visualize the data but you have a lot of variables. You could generate something like a scatterplot matrix, but once you have more than a handful of variables even these become difficult to interpret.
1. You want to use the variables in a supervised learning framework, but reduce the total number of predictors to make the estimation more efficient.

In either case, the goal is to reduce the dimensionality of the data by identifying a smaller number of representative variables/vectors/columns that collectively explain most of the variability in the original dataset. There are several methods available for performing such a task. First we will examine an example of applying dimension reduction techniques to summarize roll-call voting in the United States.

---

# DW-NOMINATE

* Studying legislative voting
* Roll-call votes
* Multidimensional scaling techniques
* DW-NOMINATE dimensions
    * Ideology (liberal-conservative)
    * Issue of the day

---

# House: First dimension

---

# House: Second dimension

---

# Senate: First dimension

---

# Senate: Second dimension

---

# Principal components analysis

* Technique for dimension reduction
* Find a low-dimensional representation of the data that contains as much as possible of the variation
* Dimensions are linear combinations of the variables

---

# Implementing PCA

1. Rescale each column to be mean 0 and standard deviation 1
1. Compute the covariance matrix `$\mathbf{S}$`
    `$$\mathbf{S} = \dfrac{1}{N} \mathbf{X}' \mathbf{X}$$`
1. Compute the `$K$` largest **eigenvectors** of `$\mathbf{S}$`
    * Eigenvector - principal directions
    * Values of observations along eigenvector - principal components
    * Eigenvalues - amount of variance along the eigenvector

* Not an efficient approach
* Let's use SVD!

---

# Calculating the PCA

`$$\mathbf{M} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^{*}$$`

* `$\mathbf{V}^{*}$` = principal directions (eigenvector)
* Columns of `$\mathbf{U} \boldsymbol{\Sigma}$` = principal components (values along eigenvector)
* Values of the diagonal elements of `$\boldsymbol{\Sigma}$` `$\approx$` eigenvalues

---

# `USArrests`

---

# `USArrests`

---

# MNIST data set

---

# MNIST data set

---

# MNIST data set

---

# Singular value decomposition

* Matrix decomposition method
* Works with any rectangular matrix 
* `$\mathbf{M}$` is an `$m \times n$` matrix. There exists a factorization of the form

`$$\mathbf{M} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^{*}$$`

where

* `$\mathbf{U}$` is an `$m \times n$` matrix
    * `$\boldsymbol{\Sigma}$` is an `$n \times n$` diagonal matrix
    * `$\mathbf{V}^{*}$` is the transpose of an `$n \times n$` matrix

---

# Image compression

---

# Image compression

```
##       [,1]  [,2]  [,3]  [,4]  [,5]
## [1,] 0.361 0.369 0.381 0.393 0.403
## [2,] 0.365 0.373 0.385 0.397 0.407
## [3,] 0.369 0.377 0.389 0.399 0.411
## [4,] 0.377 0.385 0.395 0.407 0.420
## [5,] 0.388 0.391 0.403 0.416 0.424
```

---

# Applying SVD

##### `$\mathbf{U}$`

```
##         [,1]    [,2]     [,3]
## [1,] -0.0398 -0.0291 -0.02032
## [2,] -0.0405 -0.0150 -0.00198
## [3,] -0.0396 -0.0186 -0.01972
```
    
##### `$\boldsymbol{\Sigma}$`

```
##      [,1] [,2] [,3]
## [1,]  193  0.0  0.0
## [2,]    0 29.2  0.0
## [3,]    0  0.0 16.2
```

##### `$\mathbf{V}^{*}$`

```
##         [,1]    [,2]   [,3]
## [1,] -0.0556 0.00838 0.0211
## [2,] -0.0558 0.00848 0.0179
## [3,] -0.0560 0.00874 0.0138
```

---

# Recovering the data

* Multiply back together

`$$\mathbf{M} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^{*}$$`

---

# Data reduction

* `$\Sigma$`

```
##   [1] 193.4417  29.1733  16.1600  14.9806  12.1708  11.3756  10.5788   8.9693
##   [9]   8.3404   7.6359   7.4752   6.8798   6.1244   5.9575   5.5327   5.3978
##  [17]   5.1953   4.8511   4.6521   4.6020   4.2501   4.1820   4.0820   4.0382
##  [25]   3.8938   3.8375   3.7173   3.5563   3.5273   3.4986   3.4396   3.4027
##  [33]   3.3417   3.2681   3.2025   3.1409   3.0671   3.0221   3.0124   2.9543
##  [41]   2.8912   2.8365   2.8076   2.7306   2.6768   2.6547   2.6008   2.5562
##  [49]   2.5353   2.5186   2.4892   2.4669   2.3997   2.3361   2.3274   2.2823
##  [57]   2.2424   2.2378   2.1923   2.1692   2.1122   2.0840   2.0704   2.0510
##  [65]   2.0241   2.0196   1.9849   1.9568   1.9305   1.9237   1.9052   1.8737
##  [73]   1.8433   1.8222   1.8107   1.7891   1.7699   1.7554   1.7195   1.7039
##  [81]   1.6870   1.6695   1.6453   1.6310   1.6101   1.5815   1.5727   1.5373
##  [89]   1.5198   1.5105   1.4861   1.4748   1.4609   1.4378   1.4321   1.4016
##  [97]   1.4001   1.3788   1.3624   1.3386   1.3301   1.3169   1.3057   1.2704
## [105]   1.2593   1.2419   1.2376   1.2065   1.1922   1.1825   1.1741   1.1584
## [113]   1.1405   1.1314   1.1157   1.1003   1.0921   1.0705   1.0602   1.0480
## [121]   1.0406   1.0314   1.0191   0.9983   0.9939   0.9919   0.9634   0.9500
## [129]   0.9434   0.9337   0.9213   0.9153   0.9044   0.8910   0.8777   0.8528
## [137]   0.8458   0.8419   0.8246   0.8196   0.8005   0.7967   0.7924   0.7866
## [145]   0.7734   0.7591   0.7564   0.7469   0.7365   0.7283   0.7198   0.7159
## [153]   0.7118   0.7009   0.6926   0.6874   0.6817   0.6634   0.6552   0.6517
## [161]   0.6493   0.6352   0.6184   0.6127   0.6073   0.6039   0.6014   0.5949
## [169]   0.5915   0.5810   0.5767   0.5627   0.5547   0.5456   0.5381   0.5351
## [177]   0.5310   0.5247   0.5211   0.5139   0.5025   0.4998   0.4966   0.4808
## [185]   0.4763   0.4725   0.4613   0.4552   0.4529   0.4471   0.4411   0.4374
## [193]   0.4326   0.4309   0.4232   0.4178   0.4152   0.4047   0.4005   0.3970
## [201]   0.3884   0.3795   0.3790   0.3770   0.3705   0.3690   0.3597   0.3535
## [209]   0.3506   0.3465   0.3434   0.3387   0.3341   0.3243   0.3201   0.3183
## [217]   0.3099   0.3073   0.3020   0.2980   0.2972   0.2953   0.2911   0.2826
## [225]   0.2787   0.2738   0.2705   0.2644   0.2584   0.2542   0.2533   0.2472
## [233]   0.2424   0.2397   0.2356   0.2320   0.2300   0.2268   0.2205   0.2187
## [241]   0.2160   0.2096   0.2077   0.1980   0.1961   0.1930   0.1895   0.1891
## [249]   0.1853   0.1814   0.1798   0.1772   0.1720   0.1704   0.1681   0.1658
## [257]   0.1650   0.1617   0.1539   0.1523   0.1483   0.1457   0.1436   0.1424
## [265]   0.1367   0.1360   0.1332   0.1304   0.1276   0.1265   0.1259   0.1232
## [273]   0.1201   0.1158   0.1119   0.1112   0.1079   0.1069   0.1044   0.1010
## [281]   0.0993   0.0980   0.0934   0.0905   0.0900   0.0878   0.0868   0.0847
## [289]   0.0838   0.0796   0.0763   0.0744   0.0733   0.0710   0.0682   0.0674
## [297]   0.0671   0.0637   0.0612   0.0595   0.0570   0.0556   0.0537   0.0501
## [305]   0.0485   0.0446   0.0435   0.0426   0.0401   0.0361   0.0354   0.0336
## [313]   0.0311   0.0295   0.0286   0.0257   0.0248   0.0238   0.0235   0.0233
## [321]   0.0224   0.0221   0.0218   0.0208   0.0203   0.0200   0.0195   0.0191
## [329]   0.0184   0.0181   0.0175   0.0174   0.0170   0.0162   0.0157   0.0155
## [337]   0.0152
```

---

# Data reduction

---

# Draw image using rank 2

---

# Different ranks

---

# Recap:

* Matrix **transposition**: flipping elments across the diagonal (results in switching the row/column numbers for each element -- e.g. `$x_{ij}$` becomes `$x_{ji}$`)
* Matrix **inversion**: matrix version of division / reversing. General formula is `$\frac{1}{det(A)}Adj$` where the adjoint matrix is the transposed cofactor matrix
* **Trace** gives us information about the magnitude of the diagonal of a matrix: you simply add the values along the diagonal. 
  * For example, the trace of an identity matrix is equal to the number of rows (e.g.: `$tr(\mathbf{I}_2)=2$` )

---
# Determinants, minors and cofactors: `$n \times n$` matrix

`$$\mid \mathbf{X} \mid = \sum_{j=1}^n (-1)^{i+j} x_{ij} \mid\mathbf{X}_{[ij]}\mid$$`
  * NOTE: we can add across any one row or column (just be consistent)

* Determinants (general form): `$|\mathbf{A}|=a_{21}\mathbf{C_{21}}+ a_{22}\mathbf{C_{22}} + a_{23}\mathbf{C_{23}}$`
  * `$2\times2$`: `$a_{11}a_{22}-a_{12}a_{21}$`
  * `$3\times3$`: `$a_{11}a_{22}a_{33} − a_{11}a_{23}a_{32} − a_{12}a_{21}a_{33} +a_{12}a_{23}a_{31} + a_{13}a_{21}a_{32} − a_{13}a_{22}a_{31}$`