class: center, middle, inverse, title-slide .title[ # Functions of several variables and optimization with several variables ] .author[ ###
MACS 33000
University of Chicago ] --- # Misc * HELLO and welcome back! * Canvas page -- join, class info, preceptor groups * When submitting class requests -- expect ~1 week and **PLEASE include a syllabus**! * **NO HOMEWORK DUE TONIGHT!** --- # Learning objectives * Revisit derivatives * Define a partial derivative * Identify higher order derivatives and partial derivatives * Define notation for calculus performed on vector and matrix forms * Demonstrate multivariable calculus methods on social scientific research * Calculate critical points, partial derivatives, saddle points --- # Derivatives: Rates of change * Way to understand how a function is changing * (So far) Change relative to one variable * Useful for max, min (derivative at zero) * Can determine when shifting from concave down to concave up (convex) (second derivative at zero) --- # Higher order derivatives: can talk about the change in the rate of change * Derivatives of derivatives -- * First derivative `$$f'(x), ~~ y', ~~ \frac{d}{dx}f(x), ~~ \frac{dy}{dx}$$` * Second derivative `$$f''(x), ~~ y'', ~~ \frac{d^2}{dx^2}f(x), ~~ \frac{d^2y}{dx^2}$$` * `\(n\)`th derivative `$$\frac{d^n}{dx^n}f(x), \quad \frac{d^ny}{dx^n}$$` --- # Multivariate function Multivariate functions have ... multiple ... variables ... These are situations where we have a function that is not within one plane. It is easy to imagine how challenging optimizing this can be. Instead of a line, we have what is effectively a surface that we're trying to look at. We're trying to get some understanding of the underlying function and see how things do/not change in relation to our variables. --- # Multivariate function: Example Suppose we have the following function: `$$f(x_{1}, x_{2}) = x_{1} + x_{2}$$` --
--- # Multivariate function `$$f(x_{1}, x_{2}) = x_{1}^2 + x_{2}^2$$`
--- # Multivariate function `$$f(x_{1}, x_{2}) = \sin(x_1)\cos(x_2)$$`
--- # Multivariate function `$$f(x_{1}, x_{2}) = -(x-5)^2 - (y-2)^2$$`
--- # Multivariate function evaluation Suppose we have the following function: `$$f(x_{1}, x_{2}, x_{3} ) = x_1 + x_2 + x_3$$` To evaluate it, we would then need to add our three variables up for whichever value we're exploring. We could imagine having `\(n\)` many values of `\(x\)`. (to be clear, we're thinking of each x as different variable, akin to `\(x,y,z\)`, etc.) -- $$ `\begin{aligned} f(\mathbf{x} )&= f(x_{1}, x_{2}, \ldots, x_{N} ) \\ &= x_{1} +x_{2} + \ldots + x_{N} \\ &= \sum_{i=1}^{N} x_{i} \end{aligned}` $$ --- # Multivariate function: definition * `\(f:\Re^{n} \rightarrow \Re^{1}\)` * `\(f\)` is a multivariate function `$$f(\mathbf{x}) = f(x_{1}, x_{2}, x_{3}, \ldots, x_{n} )$$` * `\(\Re^{n} = \Re \underbrace{\times}_{\text{cartesian}} \Re \times \Re \times \ldots \Re\)` * `\(n\)` inputs * `\(1\)` output --- # Evaluating multivariate functions `$$f(x_{1}, x_{2}, x_{3}) = x_1 + x_2 + x_3$$` -- ### Evaluate at `\(\mathbf{x} = (x_{1}, x_{2}, x_{3}) = (2, 3, 2)\)` `$$\begin{aligned}f(2, 3, 2) & = 2 + 3 + 2 \\ & = 7 \end{aligned}$$` --- # Evaluating multivariate functions `$$f(x_{1}, x_{2} ) = x_{1} + x_{2} + x_{1} x_{2}$$` -- ### Evaluate at `\(\mathbf{w} = (w_{1}, w_{2} ) = (1, 2)\)` -- `$$\begin{aligned}f(w_{1}, w_{2}) & = w_{1} + w_{2} + w_{1} w_{2} \\ & = 1 + 2 + 1 \times 2 \\ & = 5 \end{aligned}$$` --- # Preferences for multidimensional policy * Spatial policy model * Policy and political actors * Policy is `\(N\)` dimensional - or `\(\mathbf{x} \in \Re^{N}\)` * Legislator `\(i\)`'s utility is `\(U:\Re^{N} \rightarrow \Re^{1}\)` `$$\begin{aligned}U(\mathbf{x}) & = U(x_{1}, x_{2}, \ldots, x_{N} ) \\ & = - (x_{1} - \mu_{1} )^2 - (x_{2} - \mu_{2})^2 - \ldots - (x_{N} -\mu_{N})^{2} \\ & = -\sum_{j=1}^{N} (x_{j} - \mu_{j} )^{2}\end{aligned}$$` -- * Suppose: `\(\mathbf{\mu} = (\mu_{1}, \mu_{2}, \ldots, \mu_{N} ) = (0, 0, \ldots, 0)\)` --- # Preferences for multidimensional policy We can take these preferences and put them into our utility function. * Evaluate legislator's utility for a policy proposal of `\(\mathbf{m} = (1, 1, \ldots, 1)\)` `$$\begin{aligned}U(\mathbf{m} ) & = U(1, 1, \ldots, 1) \\ & = - (1 - 0)^2 - (1- 0) ^2 - \ldots - (1- 0) ^2 \\ & = -\sum_{j=1}^{N} 1 = - N \\\end{aligned}$$` --- # Multivariate function: Policy `$$f(x_{1}, x_{2}) = -(x-0)^2 - (y-0)^2$$`
--- # Regression models * Randomized experiments * Voter mobilization - if individual `\(i\)` turns out to vote, `\(\text{Vote}_{i}\)` * `\(T = 1\)` (treated): voter receives mobilization * `\(T = 0\)` (control): voter does not receive mobilization -- ## Regression model * `\(x_{2} =\)` participant's age `$$\begin{aligned}f(T,x_2) & = \Pr(\text{Vote}_{i} = 1 | T, x_{2} ) \\ & = \beta_{0} + \beta_{1} T + \beta_{2} x_{2} \end{aligned}$$` * Effect of the experiment as: `$$\begin{aligned}f(T = 1, x_2) - f(T=0, x_2) & = \beta_{0} + \beta_{1} 1 + \beta_{2} x_{2} - (\beta_{0} + \beta_{1} 0 + \beta_{2} x_{2}) \\& = \beta_{0} - \beta_{0} + \beta_{1}(1 - 0) + \beta_{2}(x_{2} - x_{2} ) \\ & = \beta_{1} \end{aligned}$$` --- # NOW WHAT? We have a function that we can start to sort out re: how it is that these pieces come together. However, we may want or need to figure out how our *outcome* variable may change relative to our input variables. -- **INTRODUCING...** --- class: middle, center ## Multivariate derivatives: AKA partial derivatives --- # Multivariate function: Optimizing To find or max / min, what would you do (think about realistically and then mathematically). Ideas: slowly move across an axis, finding a max or min. Metaphorically, this *is* what we'll be doing when we think about partial derivatives. --- # Multivariate derivatives If you have more than one variable changing simultaneously, you will need to evaluate **partial derivatives**. -- * Let `\(f\)` be a function of the variables `\((x_1,\ldots,x_n)\)`. Partial derivative of `\(f\)` with respect to `\(x_i\)` is `$$\frac{\partial f}{\partial x_i} (x_1,\ldots,x_n) = \lim\limits_{h\to 0} \frac{f(x_1,\ldots,x_i+h,\ldots,x_n)-f(x_1,\ldots,x_i,\ldots,x_n)}{h}$$` -- * NOTICE: only the `\(i\)`th variable changes. Same basic concept as derivatives! --- # Multivariate derivatives: example `$$f(x_1, x_2) = 2x+3y$$` -- `$$\frac{\partial f}{\partial x} (x,y) = \lim\limits_{h\to 0} \frac{f(x+h,y)-f(x,y)}{h}$$` -- `$$\lim\limits_{h\to 0}\frac{2(x+h) +3y}{h} = \frac{(2x + 2h + 3y)-(2x+3y)}{h}$$` -- We can simplify: `$$\lim\limits_{h\to 0}\frac{2h}{h}=2$$` --- # Multivariate derivatives: strategy / procedure To take a derivative when you have a multivariate function, you will focus on ONE variable of interest at a time. You will then take the derivative treating all other elements / variables as CONSTANTS. Finally, you will repeat this for EACH variable present. -- `$$f(x,y)=x^2+y^2$$` -- Try it out, how many partial derivatives? What are they? -- `$$\frac{\partial f}{\partial x}(x,y) = 2x \\$$` `$$\frac{\partial f}{\partial y}(x,y) = 2y\\$$` --- ## Second derivatives! Like usual, you can take a second derivative. But wait, there's more! You can also take the derivative with respect to one variable and then the other. e.g. dx then dy. $$\frac{\partial f}{\partial x}(x,y) = 2x, \text{ } \frac{\partial f}{\partial y}(x,y) = 2y $$ -- `$$\frac{\partial^2 f}{\partial x^2}(x,y) = 2, \text{ } \frac{\partial^2 f}{\partial y^2}(x,y) = 2$$` -- `$$\frac{\partial^2 f}{\partial x \partial y}(x,y) = 0 \text{ } \frac{\partial^2 f}{\partial y \partial x}(x,y) = 0$$` --- # Multivariate derivatives: Simple-ish example $$ 2xy^2+3x+4y-yx^3$$ Find all first and second partial derivatives. -- `$$\frac{\partial f}{\partial x}(x,y) = 2y^2+ 3-3yx^2, \text{ } \frac{\partial f}{\partial y}(x,y) = 4xy+ 4-x^3$$` -- `$$\frac{\partial^2 f}{\partial x^2}(x,y) = -6xy,\text{ } \frac{\partial^2 f}{\partial y^2}(x,y) = 4x$$` -- `$$\frac{\partial^2 f}{\partial x \partial y}(x,y) = 4y-3x^2, \text{ } \frac{\partial^2 f}{\partial y \partial x}(x,y) = 4y-3x^2$$` --- # Interpretations: Rates of change Think about how things are changing with respect to y, respect to x, x and then y, etc. --- # Multivariate derivatives `$$f(x,y)=x^3 y^4 +e^x -\ln y$$` -- `$$\frac{\partial f}{\partial x}(x,y) = 3x^2y^4 + e^x$$` `$$\frac{\partial f}{\partial y}(x,y) =4x^3y^3 - \frac{1}{y}$$` -- `$$\frac{\partial^2 f}{\partial x^2}(x,y) = 6xy^4 + e^x$$` `$$\frac{\partial^2 f}{\partial x \partial y}(x,y) = 12x^2y^3$$` --- # Rate of change, linear regression * Regress `\(\text{Approval}_{i}\)` rate for Trump in month `\(i\)` on `\(\text{Employ}_{i}\)` and `\(\text{Gas}_{i}\)` `$$\text{Approval}_{i} = 0.8 -0.5 \text{Employ}_{i} -0.25 \text{Gas}_{i}$$` * `\(\text{Approval}_{i} = f(\text{Employ}_{i}, \text{Gas}_{i} )\)` * Partial derivative with respect to employment `$$\frac{\partial f(\text{Employ}_{i}, \text{Gas}_{i} ) }{\partial \text{Employ}_{i} } = -0.5$$` --- class: middle # Optimization approaches --- # Optimizing multivariate functions * Parameters `\(\mathbf{\beta} = (\beta_{1}, \beta_{2}, \ldots, \beta_{n} )\)` such that `\(f(\mathbf{\beta}| \mathbf{X}, \mathbf{Y})\)` is maximized * Policy `\(\mathbf{x} \in \Re^{n}\)` that maximizes `\(U(\mathbf{x})\)` * Weights `\(\mathbf{\pi} = (\pi_{1}, \pi_{2}, \ldots, \pi_{K})\)` such that a weighted average of forecasts `\(\mathbf{f} = (f_{1} , f_{2}, \ldots, f_{k})\)` have minimum loss `$$\min_{\mathbf{\pi}} = - (\sum_{j=1}^{K} \pi_{j} f_{j} - y ) ^ 2$$` -- ## Methods * Analytic approach * Computational approach * Multivariate Newton-Raphson * Grid search * Gradient descent --- # Differences from single variable optimization * Multiple parameters of interest * Use linear algebra to track all the components and optimize over the multidimensional space --- # Neighborhood * Let `\(\mathbf{x} \in \Re^{n}\)` and let `\(\delta >0\)` -- * Define a **neighborhood** of `\(\mathbf{x}\)`, `\(B(\mathbf{x}, \delta)\)`, as the set of points such that, `$$B(\mathbf{x}, \delta) = \{ \mathbf{y} \in \Re^{n} : ||\mathbf{x} - \mathbf{y}||< \delta \}$$` -- * `\(B(\mathbf{x}, \delta)\)` is the set of points where the vector `\(\mathbf{y}\)` is a vector in n-dimensional space such that vector norm of `\(\mathbf{x} - \mathbf{y}\)` is less than `\(\delta\)` --- # Maxima/minima * `\(f:X \rightarrow \Re\)` with `\(X \subset \Re^{n}\)` * **Global maximum** - a vector `\(\mathbf{x}^{*} \in X\)` if, for all other `\(\mathbf{x} \in X\)`, `\(f(\mathbf{x}^{*}) > f(\mathbf{x} )\)` * **Local maximum** - a vector `\(\mathbf{x}^{\text{local}}\)` if there is a neighborhood around `\(\mathbf{x}^{\text{local}}\)`, `\(Q \subset X\)` such that, for all `\(x \in Q\)`, `\(f(\mathbf{x}^{\text{local} }) > f(\mathbf{x} )\)` * Reverse for minima: * **Global min**: a vector `\(\mathbf{x}^{*} \in X\)` if, for all other `\(\mathbf{x} \in X\)`, `\(f(\mathbf{x}^{*}) < f(\mathbf{x} )\)` * **Local min**: a vector `\(\mathbf{x}^{\text{local}}\)` if there is a neighborhood around `\(\mathbf{x}^{\text{local}}\)`, `\(Q \subset X\)` such that, for all `\(x \in Q\)`, `\(f(\mathbf{x}^{\text{local} }) < f(\mathbf{x} )\)` * Values for `\(f:X \rightarrow \Re\)` on the real number line (in n-dimensional space) * Fall somewhere along `\(\mathbf{X}\)` --- # VOCAB ROUNDUP We want to think about how these pieces come together and what they're telling us about the overall picture. We had been working with derivatives already, but now that our functions are more complicated, we're going to have to explore and track things a little differently. -- * **Gradient** `\(\nabla f\)`: vector, contains the first derivatives with respect to each of our variables in our function.This is a vector of derivatives, typically of a function. * *Jacobian* matrix of derivatives (sometimes you'll see reference to this). It is a matrix of derivatives of a vector. * **Hessian** matrix of second derivatives, *including partials*. This is a matrix of second derivatives, typically of a vector (Hessian may be a Jacobian from a Jacobian!). --- # First derivative test: Gradient * `\(f:X \rightarrow \Re^{n}\)` with `\(X \subset \Re^{1}\)` is a differentiable function * Gradient vector of `\(f\)` at `\(\mathbf{x}_{0}\)` `$$\nabla f (\mathbf{x}_{0}) = \left(\frac{\partial f (\mathbf{x}_{0}) }{\partial x_{1} }, \frac{\partial f (\mathbf{x}_{0}) }{\partial x_{2} }, \frac{\partial f (\mathbf{x}_{0}) }{\partial x_{3} }, \ldots, \frac{\partial f (\mathbf{x}_{0}) }{\partial x_{n} } \right)$$` * First partial derivatives for each variable `\(x_n\)` stored in a vector * Gradient points in the direction that the function is **changing** in the fastest direction (e.g. positive -> greatest increase; negative -> greatest decrease) -- * If `\(\mathbf{a} \in X\)` is a **local** extremum, then, `$$\begin{aligned}\nabla f(\mathbf{a}) &= \mathbf{0} \\ &= (0, 0, \ldots, 0) \end{aligned}$$` --- # Example $$ `\begin{aligned} f(x,y) &= x^2+y^2 \\ \nabla f(x,y) &= (2x, \, 2y) \end{aligned}` $$ -- $$ `\begin{aligned} f(x,y) &= x^3 y^4 +e^x -\log y \\ \nabla f(x,y) &= (3x^2 y^4 + e^x, \, 4x^3y^3 - \frac{1}{y}) \end{aligned}` $$ --- # Jacobian The Jacobian matrix is a matrix of derivatives (derivative of a vector function) while the gradient is the derivative of a scalar function. If we start with a function, `\(f\)`, we can evaluate its gradient. -- ### Gradient * `\(f:X \rightarrow \Re^{n}\)` with `\(X \subset \Re^{1}\)` is a differentiable function * Gradient vector of `\(f\)` at `\(\mathbf{x}_{0}\)` `$$\nabla f (\mathbf{x}_{0}) = \left(\frac{\partial f (\mathbf{x}_{0}) }{\partial x_{1} }, \frac{\partial f (\mathbf{x}_{0}) }{\partial x_{2} }, \frac{\partial f (\mathbf{x}_{0}) }{\partial x_{3} }, \ldots, \frac{\partial f (\mathbf{x}_{0}) }{\partial x_{n} } \right)$$` In contrast, the Jacobian is the matrix of gradients for a vector. `$$\mathbf{J_{ij}} = \begin{bmatrix}\frac{\partial F (\mathbf{x}_{i}) }{\partial x_{j} } \end{bmatrix}$$` -- All this to say, that if we wanted to have a matrix of derivatives of our gradient, the **Jacobian of the gradient vector** is the **Hessian**. --- # Hessian The Hessian is a matrix that has all the second derivatives of our gradient vector (as we just stated, it's the Jacobian of the gradient). `$$\mathbf{H_{ij}} = \begin{bmatrix}\frac{\partial f }{\partial x_{i}\partial x_{j} } \end{bmatrix}$$` --- # Example with functions $$ 2xy^2+3x+4y-yx^3$$ Gradient: `$$\nabla f (\mathbf{x}_{0}) = \left(2y^2+3-3yx^2, 4xy+4-x^3 \right)$$` Jacobian: `$$\begin{bmatrix} 2y^2+3-3yx^2 & 4xy+4-x^3 \end{bmatrix}$$` Hessian: `$$\begin{bmatrix} -6yx & 4y-3x^2 \\ 4y-3x^2 & 4x \end{bmatrix}$$` --- # Critical values Like before, we are also interested in where and how our function changes. We want to get a sense of maxes and mins and critical values. Now, things are a little more complicated, so instead of concave up/down, we have multidimensional versions. In these instances, we're looking for the analogous concept, now called a *saddle point*. -- 1. Maximum 1. Minimum 1. Saddle point * Second derivative test --- # Second derivative test: Hessian * `\(f:X \rightarrow \Re^{1}\)`, `\(X \subset \Re^{n}\)`, with `\(f\)` a twice differentiable function * Hessian matrix of second derivatives at `\(\mathbf{x}^{*} \in X\)` `$$\mathbf{H}(f)(\mathbf{x}^{*} ) = \begin{bmatrix} \frac{\partial^{2} f }{\partial x_{1} \partial x_{1} } (\mathbf{x}^{*} ) & \frac{\partial^{2} f }{\partial x_{1} \partial x_{2} } (\mathbf{x}^{*} ) & \ldots & \frac{\partial^{2} f }{\partial x_{1} \partial x_{n} } (\mathbf{x}^{*} ) \\ \frac{\partial^{2} f }{\partial x_{2} \partial x_{1} } (\mathbf{x}^{*} ) & \frac{\partial^{2} f }{\partial x_{2} \partial x_{2} } (\mathbf{x}^{*} ) & \ldots & \frac{\partial^{2} f }{\partial x_{2} \partial x_{n} } (\mathbf{x}^{*} ) \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^{2} f }{\partial x_{n} \partial x_{1} } (\mathbf{x}^{*} ) & \frac{\partial^{2} f }{\partial x_{n} \partial x_{2} } (\mathbf{x}^{*} ) & \ldots & \frac{\partial^{2} f }{\partial x_{n} \partial x_{n} } (\mathbf{x}^{*} ) \\\end{bmatrix}$$` -- * Hessians are symmetric * Describe the **curvature** of the function * Requires differentiating on the entire gradient with respect to each `\(x_n\)` --- # Example Hessians Start with the below function and identify the gradient and Hessian. `$$f(x,y) = x^2+y^2$$` -- `$$\nabla f(x,y) = (2x, \, 2y)$$` -- `$$\mathbf{H}(f)(x,y) = \begin{bmatrix}2 & 0 \\0 & 2\end{bmatrix}$$` --- # Hessian: practice and example Try this equation, finding the gradient and Hessian: `$$f(x,y)= x^3 y^4 +e^x -\log y$$` -- `$$\begin{aligned}f(x,y) &= x^3 y^4 +e^x -\log y \\\nabla f(x,y) &= (3x^2 y^4 + e^x, \, 4x^3y^3 - \frac{1}{y}) \\\mathbf{H}(f)(x,y) &= \begin{bmatrix}6xy^4 + e^x & 12x^2y^3 \\12x^2y^3 & 12x^3y^2 + \frac{1}{y^2}\end{bmatrix}\end{aligned}$$` --- # Definiteness of a matrix This is important and tells us about the matrix and can give indications regarding how the underlying function is changing. A positive definite matrix has positive eigenvalues (but those are a pain to calculate!). We can also approach it y looking at our matrix through multiplication. -- * Consider `\(n \times n\)` matrix `\(\mathbf{A}\)` * If, for all `\(\mathbf{x} \in \Re^{n}\)` where `\(\mathbf{x} \neq \mathbf{0}\)` `$$\begin{aligned}\mathbf{x}^{'} \mathbf{A} \mathbf{x} &> 0, \quad \mathbf{A} \text{ is positive definite} \\ \mathbf{x}^{'} \mathbf{A} \mathbf{x} &< 0, \quad \mathbf{A} \text{ is negative definite } \end{aligned}$$` * If `\(\mathbf{x}^{'} \mathbf{A} \mathbf{x} >0\)` for some `\(\mathbf{x}\)` and `\(\mathbf{x}^{'} \mathbf{A} \mathbf{x}<0\)` for other `\(\mathbf{x}\)`, then we say `\(\mathbf{A}\)` is **indefinite** --- ## Second derivative test * If `\(\mathbf{H}(f)(\mathbf{a})\)` is positive definite then `\(\mathbf{a}\)` is a local minimum * If `\(\mathbf{H}(f)(\mathbf{a})\)` is negative definite then `\(\mathbf{a}\)` is a local maximum * If `\(\mathbf{H}(f)(\mathbf{a})\)` is indefinite then `\(\mathbf{a}\)` is a saddle point --- # Use the determinant * Determinant of the Hessian of `\(f\)` at the critical value `\(\mathbf{a}\)` `$$\mathbf{H}(f)(\mathbf{a}) = \begin{bmatrix} A & B \\ B & C \\ \end{bmatrix}$$` * Formula for the determinant for a `\(2 \times 2\)` matrix `$$AC - B^2$$` -- * `\(AC - B^2> 0\)` and `\(A>0\)` `\(\leadsto\)` positive definite `\(\leadsto\)` `\(\mathbf{a}\)` is a local minimum * `\(AC - B^2> 0\)` and `\(A<0\)` `\(\leadsto\)` negative definite `\(\leadsto\)` `\(\mathbf{a}\)` is a local maximum * `\(AC - B^2<0\)` `\(\leadsto\)` indefinite `\(\leadsto\)` saddle point * `\(AC- B^2 = 0\)` inconclusive --- # Basic procedure summarized 1. Calculate gradient 1. Set equal to zero, solve system of equations 1. Calculate Hessian 1. Assess Hessian at critical values 1. Boundary values? (if relevant) --- # A simple optimization example * `\(f:\Re^{2} \rightarrow \Re\)` with `$$f(x_{1}, x_{2}) = 3(x_1 + 2)^2 + 4(x_{2} + 4)^2$$` -- $$ `\begin{aligned} \nabla f(\mathbf{x}) &= (6 x_{1} + 12 , 8x_{2} + 32 ) \\ \mathbf{0} &= (6 x_{1}^{*} + 12 , 8x_{2}^{*} + 32 ) \end{aligned}` $$ -- `$$x_{1}^{*} = - 2, \quad x_{2}^{*} = -4$$` -- `$$\textbf{H}(f)(\mathbf{x}^{*}) = \begin{bmatrix} 6 & 0 \\ 0 & 8 \\ \end{bmatrix}$$` -- * `\(\det(\textbf{H}(f)(\mathbf{x}^{*}))\)` = 48 and `\(6>0\)` so `\(\textbf{H}(f)(\mathbf{x}^{*})\)` is positive definite * `\(\mathbf{x^{*}}\)` is a **local minimum** --- class: middle # Applications --- # Maximum likelihood estimation $$ `\begin{aligned} Y_{i} &\sim \text{Normal}(\mu, \sigma^2) \\ \mathbf{Y} &= (Y_{1}, Y_{2}, \ldots, Y_{n} ) \end{aligned}` $$ * Obtain likelihood (summary estimator) * Derive maximum likelihood estimators for `\(\mu\)` and `\(\sigma^2\)` -- `$$\begin{aligned}L(\mu, \sigma^2 | \mathbf{Y} ) &\propto \prod_{i=1}^{n} f(Y_{i}|\mu, \sigma^2) \\ &\propto \prod_{i=1}^{N} \frac{\exp[ - \frac{ (Y_{i} - \mu)^2 }{2\sigma^2} ]}{\sqrt{2 \pi \sigma^2}} \\&\propto \frac{\exp[ -\sum_{i=1}^{n} \frac{(Y_{i} - \mu)^2}{2\sigma^2} ]}{ (2\pi)^{n/2} \sigma^{2n/2} } \end{aligned}$$` --- # Maximum likelihood estimation `$$l(\mu, \sigma^2|\mathbf{Y} ) = -\sum_{i=1}^{n} \frac{(Y_{i} - \mu)^2}{2\sigma^2} - \frac{n}{2} \log(2 \pi) - \frac{n}{2} \log (\sigma^2)$$` -- $$ `\begin{aligned} l(\mu, \sigma^2|\mathbf{Y} ) &= -\sum_{i=1}^{n} \frac{(Y_{i} - \mu)^2}{2\sigma^2} - \frac{n}{2} \log (\sigma^2) \\ \frac{\partial l(\mu, \sigma^2)|\mathbf{Y} )}{\partial \mu } &= \sum_{i=1}^{n} \frac{2(Y_{i} - \mu)}{2\sigma^2} \\ \frac{\partial l(\mu, \sigma^2)|\mathbf{Y})}{\partial \sigma^2} &= -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^{n} (Y_{i} - \mu)^2 \end{aligned}` $$ -- $$ `\begin{aligned} 0 &= -\sum_{i=1}^{n} \frac{2(Y_{i} - \widehat{\mu})}{2\widehat{\sigma}^2} \\ 0 &= -\frac{n}{2\widehat{\sigma}^2 } + \frac{1}{2\widehat{\sigma}^4} \sum_{i=1}^{n} (Y_{i} - \mu^{*})^2 \end{aligned}` $$ --- # Maximum likelihood estimation `$$\begin{aligned} \widehat{\mu} &= \frac{\sum_{i=1}^{n} Y_{i} }{n} \\ \widehat{\sigma}^{2} &= \frac{1}{n} \sum_{i=1}^{n} (Y_{i} - \overline{Y})^2 \end{aligned}$$` -- `$$\textbf{H}(f)(\widehat{\mu}, \widehat{\sigma}^2) = \begin{bmatrix} \frac{\partial^{2} l(\mu, \sigma^2|\mathbf{Y} )}{\partial \mu^{2}} & \frac{\partial^{2} l(\mu, \sigma^2|\mathbf{Y} )}{\partial \sigma^{2} \partial \mu} \\ \frac{\partial^{2} l(\mu, \sigma^2|\mathbf{Y} )}{\partial \sigma^{2} \partial \mu} & \frac{\partial^{2} l(\mu, \sigma^2|\mathbf{Y} )}{\partial^{2} \sigma^{2}} \end{bmatrix}$$` -- `$$\textbf{H}(f)(\widehat{\mu}, \widehat{\sigma}^2) = \begin{bmatrix} \frac{-n}{\widehat{\sigma}^2} & 0 \\ 0 & \frac{-n}{(\widehat{\sigma}^2)^2} \\ \end{bmatrix}$$` -- * `\(\text{det}(\textbf{H}(f)(\widehat{\mu}, \widehat{\sigma}^2)) = \dfrac{n^2}{2(\widehat{\sigma}^2)^3} > 0\)` and `\(A = \dfrac{-n}{\widehat{\sigma}^2} < 0\)` `\(\leadsto\)` maximum * Determinant is greater than 0 and `\(A\)` is less than zero - local maximum --- # Computational optimization procedures * Multivariate analogies * Different algorithms available with benefits/drawbacks * Newton-Raphson * Grid search * Gradient descent --- # Multivariate Newton-Raphson * `\(f:\Re^{n} \rightarrow \Re\)` * Initial guess `\(\mathbf{x}_{t}\)` `$$\mathbf{x}_{t+1} = \mathbf{x}_{t} - [\textbf{H}(f)(\mathbf{x}_{t})]^{-1} \nabla f(\mathbf{x}_{t})$$` * Approximate function with **tangent plane** * Find value of `\(x_{t+1}\)` that makes the plane equal to zero * Update again --- # Grid search * MLE for a normal distribution * Simulate 10,000 realizations from `\(Y_{i} \sim \text{Normal}(0.25, 100)\)` * Evaluate `\(l(\mu, \sigma^2| \mathbf{y} )\)` across a range of values --- # Grid search <img src="07-multivariable-differentiation_files/figure-html/loglik-1.png" width="864" style="display: block; margin: auto;" /> --- # Gradient descent
--- # Recap: * Multivariate functions + Second derivatives: work the way we'd guess * Bringing matrices to the party * Gradient vs Hessian * Finding critical values: 1. Calculate gradient 1. Set equal to zero, solve system of equations 1. Calculate Hessian 1. Assess Hessian at critical values 1. Boundary values? (if relevant)