class: center, middle, inverse, title-slide .title[ # Critical points and approximation ] .author[ ###
MACS 33000
University of Chicago ] --- # Misc * Gradescope: grading underway -- should be all OK * Academic Integrity: 8/29 @ 11:30 am * No homework tomorrow (!!) --- # Learning objectives * Define critical points * Calculate critical points via analytical methods * Demonstrate optimization using maximum likelihood estimation * Identify need for approximation methods for calculating critical points * Explain and demonstrate root finding procedures using Newton-Raphson hill climber * Demonstrate comptuational optimization using gradient descent --- # Rolle's theorem * Suppose `\(f:[a, b] \rightarrow \Re\)` * Suppose `\(f\)` has a relative maxima or minima on `\((a,b)\)` and call that `\(c \in (a, b)\)` * Then `\(f'(c) = 0\)` -- .pull-left[ <img src="04-critical-points_files/figure-html/rolles-theorem-1.png" width="432" style="display: block; margin: auto;" /> ] .pull-right[ * Rolle's theorem guarantees that, at some point, `\(f^{'}(x) = 0\)` * Intuition from proof - what happens as we approach from the left? * Intuition from proof - what happens as we approach from the right? ] --- # Higher order derivatives Conceptually, we are thinking about changes (get excited for all the *moments* you're going to have in stats later!). The first derivative tells us about the rate of change of our function. -- **BUT!** we can also explore how fast the function is changing...that is, the rate of change for our rate of change. Here, we can understand the acceleration of change. We can keep going (provided our function is differentiable). --- # Higher order derivatives (in other words:) * Derivatives of derivatives * First derivative `$$f'(x), ~~ y', ~~ \frac{d}{dx}f(x), ~~ \frac{dy}{dx}$$` * Second derivative `$$f''(x), ~~ y'', ~~ \frac{d^2}{dx^2}f(x), ~~ \frac{d^2y}{dx^2}$$` * `\(n\)`th derivative `$$\frac{d^n}{dx^n}f(x), \quad \frac{d^ny}{dx^n}$$` --- # Higher order derivatives: example Suppose we start with a function. `\(x^3\)` -- let's explore the different derivatives we might take. Try it out! -- $$ `\begin{aligned} f(x) &=x^3\\ f^{\prime}(x) &=3x^2\\ f^{\prime\prime}(x) &=6x \\ f^{\prime\prime\prime}(x) &=6\\ f^{\prime\prime\prime\prime}(x) &=0\\ \end{aligned}` $$ * If `\(f(x)\)` is differentiable, also continuous * If `\(f'(x)\)` is differentiable, then **continuously differentiable** * Optimization requires differentiation --- # Inflection point Inflection points are where we move from being concave down to concave up (or vice versa). <img src="04-critical-points_files/figure-html/concave-inf-1.png" width="864" style="display: block; margin: auto;" /> --- # Inflection point > For a given function `\(y = f(x)\)`, a point `\((x^∗, y^∗)\)` where the second derivative immediately on one side of the point is signed oppositely to the second derivative immediately on the other side <img src="04-critical-points_files/figure-html/inflect-1.png" width="864" style="display: block; margin: auto;" /> --- # Inflection point > For a given function `\(y = f(x)\)`, a point `\((x^∗, y^∗)\)` where the second derivative immediately on one side of the point is signed oppositely to the second derivative immediately on the other side Let's start with a basic plot: here, we have a function: `\(\frac{x^3}{15} - x^2 + 4x +2\)`. We can see where we might guess there are the concave up / down points and where the inflection point might be. <img src="04-critical-points_files/figure-html/inflect-1-1.png" width="864" style="display: block; margin: auto;" /> --- # Inflection point Let's now consider the same function, but the derivative. We are expecting THE VALUE OF CHANGE (AKA RATE OF CHANGE TO BE ZERO). GREEN: where the derivative (change) is positive or negative. GRAY: point where we don't see any more change (rate of change = 0). <img src="04-critical-points_files/figure-html/inflect-deriv0-1.png" width="864" style="display: block; margin: auto;" /> -- ### Derivative of original function: <img src="04-critical-points_files/figure-html/inflect-deriv1-1.png" width="864" style="display: block; margin: auto;" /> --- # Inflection point > For a given function `\(y = f(x)\)`, a point `\((x^∗, y^∗)\)` where the second derivative immediately on one side of the point is signed oppositely to the second derivative immediately on the other side <img src="04-critical-points_files/figure-html/inflect-all-1.png" width="864" style="display: block; margin: auto;" /> --- # Concavity * Concave up (convex) * Concave down (concave) * Verification * Graphically * Analytically --- # Concavity * Where a function is twice differentiable and concave over some area, then the function is concave down where `\(f''(x) < 0\)` and concave up where `\(f''(x) > 0\)` What does this mean? -- * If it's concave, there must be a critical point `\((f'(x)=0)\)`. Let's start there. * We are then moving away from that point and we're either *increasing* (min - concave up) or *decreasing* (max - concave down). * If we're increasing, the second derivative is positive, `\(f''(x) > 0\)`. * If we're decreasing, the second derivative is negative, `\(f''(x) < 0\)`. --- # Concavity * Where a function is twice differentiable and concave over some area, then the function is concave down where `\(f''(x) < 0\)` and concave up where `\(f''(x) > 0\)` <img src="04-critical-points_files/figure-html/concave-1.png" width="864" style="display: block; margin: auto;" /> --- # Exponential function .pull-left[ <img src="04-critical-points_files/figure-html/strict-e-1.png" width="432" style="display: block; margin: auto;" /> ] .pull-right[ Exploring change: <font size="2"> $$ `\begin{aligned} f(x) & = e^{x} \text{ (what the function is doing?)}\\ f^{'}(x) & = e^{x} \text{ rate of change (never zero!)} \\ f^{''}(x) & = e^{x} \text{ rate of change of rate of change} \\ & \text{ (still never zero!)} \end{aligned}` $$ </font> ] --- # Natural logarithm .pull-left[ <img src="04-critical-points_files/figure-html/strict-log-1.png" width="432" style="display: block; margin: auto;" /> ] .pull-right[ Exploring change: <font size="2"> $$ `\begin{aligned} f(x) & = \ln(x) \\ \\ f^{'}(x) & = \frac{1}{x} \\ & \text{ rate of change (always positive (above 0)*}\\ \\ f^{''}(x) & = -\frac{1}{x^2} \\ & \text{ rate of change of rate of change} \end{aligned}` $$ </font> \* .tiny[note that `\(ln(x)\)` only defined for `\(x > 0\)`] ] --- # Types of extreme values * Maximum or minimum * Local or global -- *You can find these graphically or analytically* --- # Types of extreme values: Examples How would we evaluate the extreme values here? <img src="04-critical-points_files/figure-html/endpoints-1.png" width="864" style="display: block; margin: auto;" /> -- * Derivative of the function. `\(f'(x) = 1\)` * Set to zero and solve. (n/a) * Options: * Plug in values to original function to find corresponding f(x). (only evaluate at end points (0 and 5)) * Find second derivative to see if positive (min) or negative (max) --- # Types of extreme values: Example Try with a neighbor: <img src="04-critical-points_files/figure-html/max-middle-1.png" width="864" style="display: block; margin: auto;" /> -- #### `\(f'(x) = -2x\)`; zero at `\(x=0\)`, `\(f(x) = 0^2+5=5\)`. But is it a max or min? -- Try adjacent point (e.g. -1 or 1). `\((-1^2+5)=4\)` OR take second derivative: `\(-2\)` -- Since 4 is less than 5, we know this is a max at x = 0, since `\(f(x = 0) > f(x=1)\)`. Similarly, the second derivative is negative, so this also indicates that it is a max (decreasing from this point). --- # Types of extreme values <img src="04-critical-points_files/figure-html/min-middle-1.png" width="864" style="display: block; margin: auto;" /> --- # Types of extreme values <img src="04-critical-points_files/figure-html/local-all-1.png" width="864" style="display: block; margin: auto;" /> --- # Types of extreme values <img src="04-critical-points_files/figure-html/inflection-point-1.png" width="864" style="display: block; margin: auto;" /> --- ### Function and derivative <img src="04-critical-points_files/figure-html/inflection-point-2-1.png" width="864" style="display: block; margin: auto;" /> --- # Framework for analytical optimization 1. Find `\(f'(x)\)` 1. Set `\(f'(x)=0\)` and solve for `\(x\)`. Call all `\(x_0\)` such that `\(f'(x_0)=0\)` **critical values** 1. Find `\(f''(x)\)`. Evaluate at each `\(x_0\)` * If `\(f''(x) > 0\)`, concave up, and therefore a local minimum * If `\(f''(x) < 0\)`, concave down, and therefore a local maximum * If it's the global maximum/minimum, it will produce the largest/smallest value for `\(f(x)\)` * On a closed range along the domain, **check the endpoints as well** --- ### Ex: `\(f(x) = -x^2\)`, `\(x \in [-3, 3]\)` <img src="04-critical-points_files/figure-html/ex-1-1.png" width="864" style="display: block; margin: auto;" /> --- ### Ex: `\(f(x) = x^3 - 6x-2\)`, `\(x \in [-3, 3]\)` <img src="04-critical-points_files/figure-html/ex-2-1.png" width="864" style="display: block; margin: auto;" /> --- class: center, middle, inverse # CSS Application of optimization: MLE You aren't going to have to calculate this, but you need to understand the big picture --- # Maximum likelihood estimation * Likelihood function * Distinguish from probability * Known data, unknown parameters * Maximize to find the values located at the **global maximum** of the likelihood function --- # Maximum likelihood estimation $$ `\begin{aligned} f(\mu) & = \prod_{i=1}^{N} \exp( \frac{-(Y_{i} - \mu)^2}{ 2}) \\ & = \exp(- \frac{(Y_{1} - \mu)^2}{ 2}) \times \ldots \times \exp(- \frac{(Y_{N} - \mu)^2}{ 2}) \\ & = \exp( - \frac{\sum_{i=1}^{N} (Y_{i} - \mu)^2} {2}) \end{aligned}` $$ * Maximizing a function with very very very small numbers --- # Maximum likelihood estimation * Log-likelihood * `\(f:\Re \rightarrow (0, \infty)\)` * If `\(x_{0}\)` maximizes `\(f\)`, then `\(x_{0}\)` maximizes `\(\log(f(x))\)` * Maximize the log-likelihood instead --- # Maximum likelihood estimation $$ `\begin{aligned} \log f(\mu) & = \log \left( \exp( - \frac{\sum_{i=1}^{N} (Y_{i} - \mu)^2} {2}) \right) \\ & = - \frac{\sum_{i=1}^{N} (Y_{i} - \mu)^2} {2} \\ & = -\frac{1}{2} \left(\sum_{i=1}^{N} Y_{i}^2 - 2\mu \sum_{i=1}^{N} Y_{i} + N\times\mu^2 \right) \\ \frac{ \partial \log f(\mu) }{ \partial \mu } & = -\frac{1}{2} \left( - 2\sum_{i=1}^{N} Y_{i} + 2 N \mu \right) \end{aligned}` $$ --- # Maximum likelihood estimation $$ `\begin{aligned} 0 & = -\frac{1}{2} \left( - 2 \sum_{i=1}^{N} Y_{i} + 2 N \mu^{*} \right) \\ 0 & = \sum_{i=1}^{N} Y_{i} - N \mu^{*} \\ N \mu^{*} & = \sum_{i=1}^{N}Y_{i} \\ \mu^{*} & = \frac{\sum_{i=1}^{N}Y_{i}}{N} \\ \mu^{*} & = \bar{Y} \end{aligned}` $$ --- # Maximum likelihood estimation * Second derivative test $$ `\begin{aligned} f^{'}(\mu ) & = -\frac{1}{2} \left( - 2\sum_{i=1}^{N} Y_{i} + 2 N \mu \right) \\ f^{'}(\mu ) & = \sum_{i=1}^{N} Y_{i} - N \mu \\ f^{''}(\mu ) & = -N \end{aligned}` $$ -- * `\(-N<0 \leadsto \text{concave down}\)` --- # Maximum likelihood estimation: MLE in life Logit and other limited-dependent variable models: what function creates a curve that best predicts our outcome variable? * Simplest example: 0s and 1s as outcome * Trying to determine which function best predicts the outcome for parameter estimates --- class: inverse, middle, center # Optimization in CSS --- # Computational optimization procedures * Analytical approaches can be difficult/impossible * Computational approaches simplify the problem * Different algorithms available with benefits/drawbacks * Newton-Raphson * Grid search * Gradient descent --- # Newton-Raphson root finding * Roots of a function where `\(f(x) = 0\)` * Analytical method * Iterative procedures to approximate `\(x^{*}\)` --- # Tangent lines In words: `\(y = mx + b\)` where b is our intercept (at `\(x_0\)`) and the slope is `\(f'(x_0)\)` times how far away we want to get `\((x-x_0)\)` `$$g(x) = f^{'}(x_{0}) (x - x_{0} ) + f(x_{0})$$` <img src="04-critical-points_files/figure-html/tangent-anim-1.gif" style="display: block; margin: auto;" /> --- # Newton-Raphson method * Initial guess `\(x_{0}\)` * Approximate `\(f(x)\)` with the tangent line to generate a new guess: `$$\begin{aligned}g(x) & = f^{'}(x_{0})(x - x_{0} ) + f(x_{0} ) \\ 0 & = f^{'}(x_{0}) (x_{1} - x_{0}) + f(x_{0} ) \\x_{1} & = x_{0} - \frac{f(x_{0}) }{f^{'}(x_{0})}\end{aligned}$$` * Great! Now we will use `\(x_1\)` as our next guess! * Iterate until updated values are the same --- # Example of Newton-Raphson .pull-left[ <img src="04-critical-points_files/figure-html/base-plot-1.png" width="432" style="display: block; margin: auto;" /> ] .pull-right[ $$ `\begin{aligned} y &= -x^2 \\ \frac{\partial y}{\partial x} &= -2x \\ \end{aligned}` $$ ] --- # Example of Newton-Raphson .pull-left[ <img src="04-critical-points_files/figure-html/optims-1.png" width="432" style="display: block; margin: auto;" /> ] .pull-right[ $$ `\begin{aligned} y &= -x^2 \\ \frac{\partial y}{\partial x} &= -2x \\ \end{aligned}` $$ ] --- # Implementing Newton-Raphson: `$$x_1 = x_0 - \frac{f(x_0)}{f'(x_0)}$$` -- `$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$$` -- `$$x_{n+1} = x_n - \frac{-x^2}{-2x}$$` * Stop at `\(x_n\)` if `\(\mid x_n - x_{n-1} \mid < 0.0001\)` --- # Implementing Newton-Raphson <img src="04-critical-points_files/figure-html/first-func-first-guess-1.gif" style="display: block; margin: auto;" /> --- # Implementing Newton-Raphson .pull-left[ <img src="04-critical-points_files/figure-html/base-plot-two-points-1.png" width="432" style="display: block; margin: auto;" /> ] .pull-right[ $$ `\begin{aligned} y &= x^3 -6x -2 \\ \frac{\partial y}{\partial x} &= 3x^2 -6 \end{aligned}` $$ ] --- # Initial guess `\(x_0 = 5\)` <img src="04-critical-points_files/figure-html/second-func-first-guess-1.gif" style="display: block; margin: auto;" /> --- # Initial guess `\(x_0 = -6\)` <img src="04-critical-points_files/figure-html/second-func-second-guess-1.gif" style="display: block; margin: auto;" /> --- # Newton Raphson: AND BEYOND! * Just like we were doing, we can also use it to find max / min. * It's similar but different: `$$x_0 - \frac{f^{'}(x)}{f^{''}(x)}$$` --- # Grid search *Often used in finetuning hyperparameters (e.g. structure of the model) * * Exhaustive search algorithm * Define a specified set of `\(f_i\)` * Calculate `\(f(x_i) \forall i\)` * Compare all resulting values --- # Grid search `$$y = -x^2$$` * Evaluate the function for all `\(x \in \{ -2, -1.99, -1.98, \ldots, 1.98, 1.99, 2 \}\)` -- <img src="04-critical-points_files/figure-html/grid-search-1.png" width="864" style="display: block; margin: auto;" /> <!-- <!-- --> <!-- # Analytically optimize --> <!-- $$ --> <!-- \begin{aligned} --> <!-- f'(x) &= 2.4x - 4.8 \\ --> <!-- 0 &= 2.4x - 4.8 \\ --> <!-- 4.8 &= 2.4x \\ --> <!-- x &= 2 --> <!-- \end{aligned} --> <!-- $$ --> <!-- -- --> <!-- $$ --> <!-- \begin{aligned} --> <!-- f''(x) &= 2.4 \\ --> <!-- f''(2) &= 2.4 --> <!-- \end{aligned} --> <!-- $$ --> <!-- --> --- # Gradient descent <img src="04-critical-points_files/figure-html/grad-ex-1.png" width="864" style="display: block; margin: auto;" /> --- # Gradient descent `$$x_1 = x_0 - \alpha f'(x_0)$$` * Gradient * Learning rate * Iterative algorithm * Important components --- # `\(\alpha = 0.6\)` <img src="04-critical-points_files/figure-html/grad-descent-learn-rate-6-1.gif" style="display: block; margin: auto;" /> --- # `\(\alpha = 0.1\)` <img src="04-critical-points_files/figure-html/grad-descent-learn-rate-2-1.gif" style="display: block; margin: auto;" /> --- class: inverse, center, middle # STARTING THE PARTY: Matrices --- ## Example matricies We have two matrices: X and Y. `$$\begin{aligned}\mathbf{X} &= \left[ \begin{array}{rrr}1 & 2 & 3 \\ 2 & 1 & 4 \\ \end{array} \right] \\ \mathbf{Y} &= \left[ \begin{array}{rr} 1 & 2 \\ 3 & 2 \\ 1 & 4 \\ \end{array} \right] \end{aligned}$$` X has two rows and three columms, so is `\(2 \times 3\)`. Y has three rows and two columns, so is `\(3 \times 2\)`. -- Matrices are incredibly helpful, despite also being a bit challenging to get the hang of. We'll talk more about them tomorrow but you will want to note the dimensions of a matrix. You can do some operations with them (addition / subtraction) and they work the way you are probably guessing (need to be the same size). **MULTIPLICATION DOES NOT WORK THAT WAY! WARNING!!!** --- # INTROS * Name, pronouns, subfield/research area, where you are currently, something fun/interesting about you and/or your hobbies --- class: inverse, middle, center # RECAP --- # SO FAR: * Critical points * Inflection points * Root finding * Grid search * Gradient descent * Matrices