class: center, middle, inverse, title-slide .title[ # Sample space and probability ] .subtitle[ ## Funky Cats and Their Feisty Stats ] .author[ ###
MACS 33000
University of Chicago ] --- # Learning objectives * Review set notation and operations * Define probabilistic models * Describe conditional probability * Define total probability theorem * Implement Bayes' Rule * Define and evaluate independence of events * Identify the importance of counting possible events --- # Sets * Collection of objects $$ `\begin{aligned} A & = \{1, 2, 3\} \nonumber \\ B & = \{4, 5, 6\}\nonumber \\ C & = \{ \text{First year cohort} \} \\ D & = \{ \text{U of Chicago SIPs} \} \end{aligned}` $$ --- # Sets * If `\(A\)` is a set, we say that `\(x\)` is an element of `\(A\)` by writing `\(x \in A\)` * If `\(x\)` is not an element of `\(A\)` then, we write `\(x \notin A\)` -- * `\(1 \in \{ 1, 2, 3\}\)` * `\(4 \in \{4, 5, 6\}\)` * `\(\text{Hanrui} \notin \{ \text{First year cohort} \}\)` * `\(\text{Jean} \in \{ \text{U of Chicago SIPs} \}\)` -- Why do we care about sets? * Sets are necessary for probability theory * Defining **set** is equivalent to choosing population of interest (usually) --- # Subsets * If `\(A\)` and `\(B\)` are sets, then we say that `\(A = B\)` if, for all `\(x \in A\)` then `\(x \in B\)` and for all `\(y \in B\)` then `\(y \in A\)` * Test to determine equality: * Take all elements of `\(A\)`, see if in `\(B\)` * Take all elements of `\(B\)`, see if in `\(A\)` -- ----------- * If `\(A\)` and `\(B\)` are sets, then we say that `\(A \subset B\)` is, for all `\(x \in A\)`, then `\(x \in B\)` * What is the difference between the definitions? --- # Set builder notation Some famous sets: * `\(\mathbb{N} = \{1, 2, 3, \ldots \}\)` * `\(\mathbb{Z} = \{\ldots, -2, -1, 0, 1, 2, \ldots, \}\)` * `\(\Re = \mbox{Real numbers}\)` -- Use **set builder notation** to identify subsets: * `\([a, b] = \{x: x \in \Re \text{ and } a \leq x \leq b \}\)` * `\((a, b] = \{x: x \in \Re \text{ and } a < x \leq b \}\)` * `\([a, b) = \{x: x \in \Re \text{ and } a \leq x < b \}\)` * `\((a, b) = \{x: x \in \Re \text{ and } a < x < b \}\)` * `\(\emptyset\)` --- # Union * `\(A\)` and `\(B\)` are sets * New set that contains all elements in set `\(A\)` *or* in set `\(B\)` `$$\begin{aligned}C & = A \cup B \\ & = \{x: x \in A \text{ or } x \in B \}\end{aligned}$$` -- * `\(A = \{1, 2, 3\}, B = \{3, 4, 5\}\)`, then `\(C = A \cup B = \{ 1, 2, 3, 4, 5\}\)` * `\(D = \{\text{First Year Cohort} \}, E = \{\text{Me} \}\)`, then `\(F = D \cup E = \{ \text{First Year Cohort, Me} \}\)` --- # Intersection * New set that contains all elements in set `\(A\)` *and* set `\(B\)` `$$\begin{aligned}C & = A \cap B \\ & = \{x: x \in A \text{ and } x \in B \}\end{aligned}$$` -- * `\(A =\{1, 2, 3\}, B = \{3, 4, 5\}\)`, then, `\(C = A \cap B = \{3\}\)` * `\(D = \{\text{First Year Cohort} \}, E = \{\text{Me} \}\)`, then `\(F = D \cap E = \emptyset\)` --- # Some facts about sets 1. `\(A \cap B = B \cap A\)` 1. `\(A \cup B = B \cup A\)` 1. `\((A \cap B) \cap C = A \cap (B \cap C)\)` 1. `\((A \cup B) \cup C = A \cup (B \cup C)\)` 1. `\(A \cap (B \cup C) = (A \cap B) \cup (A \cap C)\)` 1. `\(A \cup (B \cap C) = (A \cup B) \cap (A \cup C)\)` --- # Model of probability 1. Sample space 1. Events 1. Probability --- # Sample space * Set of all things that can occur * Distinct outcomes in set `\(\Omega\)` -- 1. House of Representatives - elections every 2 years * One incumbent: `\(\Omega = \{W, N\}\)` * Two incumbents: `\(\Omega = \{(W,W), (W,N), (N,W), (N,N)\}\)` * 435 incumbents: `\(\Omega = 2^{435}\)` possible outcomes (permutations) 1. Number of countries signing treaties * `\(\Omega = \{0, 1, 2, \ldots, 194\}\)` 1. Duration of cabinets * All non-negative real numbers: `\([0, \infty)\)` * `\(\Omega = \{x : 0 \leq x < \infty\}\)` * All possible `\(x\)` such that `\(x\)` is between 0 and infinity -- The sample space must define all possible realizations --- # Events * Subset of the sample space `$$E \subset \Omega$$` * Congressional election example * One incumbent * `\(E = W\)` * `\(F = N\)` * Two incumbents * `\(E = \{(W, N), (W, W) \}\)` * `\(F = \{(N, N)\}\)` * 435 incumbents * Outcome of 2016 election - one event * All outcomes where Dems retake control of the House - one event * `\(x\)` is an **element** of a set `\(E\)` `$$x \in E$$` `$$\{N, N\} \in E$$` --- # Event operations * Perform operations on sets to create new sets * `\(E = \{ (W,W), (W,N) \}\)` * `\(F = \{ (N, N), (W,N) \}\)` * `\(\Omega = \{(W,W), (W,N), (N,W), (N,N) \}\)` -- * Operations determine what lies in the new set `\(E^{\text{new}}\)` --- # Event operations * `\(E = \{ (W,W), (W,N) \}\)` * `\(F = \{ (N, N), (W,N) \}\)` * `\(\Omega = \{(W,W), (W,N), (N,W), (N,N) \}\)` ## Union: `\(\cup\)` * All objects that appear in either set (OR) * `\(E^{\text{new}} = E \cup F = \{(W,W), (W,N), (N,N) \}\)` --- # Event operations * `\(E = \{ (W,W), (W,N) \}\)` * `\(F = \{ (N, N), (W,N) \}\)` * `\(\Omega = \{(W,W), (W,N), (N,W), (N,N) \}\)` ## Intersection: `\(\cap\)` * All objects that appear in both sets (AND) * `\(E^{\text{new}} = E \cap F = \{(W,N)\}\)` --- # Event operations * `\(E = \{ (W,W), (W,N) \}\)` * `\(F = \{ (N, N), (W,N) \}\)` * `\(\Omega = \{(W,W), (W,N), (N,W), (N,N) \}\)` ## Complement of set `\(E\)`: `\(E^{c}\)` * All objects in `\(S\)` that are not in `\(E\)` * `\(E^{c} = \{(N, W) , (N, N) \}\)` * `\(F^{c} = \{(N, W) , (W, W) \}\)` * What is `\(\Omega^{c}\)`? - an **empty set** `\(\emptyset\)` * Suppose `\(E = {W}\)`, `\(F = {N}\)`. Then `\(E \cap F = \emptyset\)` --- # Mutual exclusivity * `\(E\)` and `\(F\)` are events * If `\(E \cap F = \emptyset\)` then we'll say `\(E\)` and `\(F\)` are **mutually exclusive** -- * Mutual exclusivity `\(\neq\)` independence * `\(E\)` and `\(E^{c}\)` are mutually exclusive events -- ## Examples Suppose `\(S = \{H, T\}\)`. Then `\(E = H\)` and `\(F = T\)`, then `\(E \cap F = \emptyset\)` -- Suppose `\(S = \{(H, H), (H,T), (T, H), (T,T) \}\)`. `\(E = \{(H,H)\}\)`, `\(F = \{(H, H), (T,H)\}\)`, and `\(G = \{(H, T), (T, T) \}\)` * `\(E \cap F = (H, H)\)` * `\(E \cap G = \emptyset\)` * `\(F \cap G = \emptyset\)` -- Suppose `\(S = \Re_{+}\)`. `\(E = \{x: x> 10\}\)` and `\(F = \{x: x < 5\}\)`. Then `\(E \cap F = \emptyset\)`. --- # Subsets of the event space Suppose we have events `\(E_{1}, E_{2}, \ldots, E_{N}\)` `$$\cup_{i=1}^{N} E_{i} = E_{1} \cup E_{2} \cup E_{3} \cup \ldots \cup E_{N}$$` * `\(\cup_{i=1}^{N} E_{i}\)` is the set of outcomes that occur at least once in `\(E_{1} , \ldots, E_{N}\)` -- `$$\cap_{i=1}^{N} E_{i} = E_{1} \cap E_{2} \cap \ldots \cap E_{N}$$` * `\(\cap_{i=1}^{N} E_{i}\)` is the set of outcomes that occur in each `\(E_{i}\)` --- # Probability * Probability is the chance of an event occurring * `\(\Pr\)` is a function * The domain contains all events `\(E\)` --- # Three axioms of probability 1. **Nonnegativity**: For all events `\(E\)`, `\(0 \leq \Pr(E) \leq 1\)` 1. **Normalization**: `\(\Pr(S) = 1\)` 1. **Additivity**: For all sequences of **mutually exclusive events** `\(E_{1}, E_{2}, \ldots,E_{N}\)` (where `\(N\)` can go to infinity): `$$\Pr\left(\cup_{i=1}^{N} E_{i} \right) = \sum_{i=1}^{N} \Pr(E_{i} )$$` --- # Coins and die * Suppose we are flipping a fair coin. Then `\(\Pr(H) = \Pr(T) = 1/2\)` * Suppose we are rolling a six-sided die. Then `\(\Pr(1) = 1/6\)` * Suppose we are flipping a pair of fair coins. Then `\(\Pr(H, H) = 1/4\)` --- # Congressional incumbents * One candidate example * `\(\Pr(W)\)`: probability incumbent wins * `\(\Pr(N)\)`: probability incumbent loses (does not win) * Two candidate example * `\(\Pr(\{W,W\})\)`: probability both incumbents win * `\(\Pr( \{W,W\}, \{W, N\} )\)`: probability incumbent `\(1\)` wins * Full House example: * `\(\Pr( \{ \text{All Democrats Win}\} )\)` * We'll use data to infer these things --- # Rolling the dice * Roll a pair of 4-sided dice * Fair dice (equal probability of all 16 possible outcomes) $$ `\begin{aligned} \Omega &= \{(1,1), (1,2), (1,3), (1,4), (2,1), (2,2), (2,3), (2,4), \\ &\quad (3,1), (3,2), (3,3), (3,4), (4,1), (4,2), (4,3), (4,4) \} \end{aligned}` $$ -- $$ `\begin{aligned} \Pr (\text{the sum of the rolls is even}) &= 8/16 = 1/2 \\ \Pr (\text{the sum of the rolls is odd}) &= 8/16 = 1/2 \\ \Pr (\text{the first roll is equal to the second}) &= 4/16 = 1/4 \\ \Pr (\text{the first roll is larger than the second}) &= 6/16 = 3/8 \\ \Pr (\text{at least one roll is equal to 4}) &= 7/16 \end{aligned}` $$ --- # Suprising probability facts Helps us to avoid silly reasoning -- ----------------------------- "What are the odds" `\(\leadsto\)` not great, but neither are all the other non-patterns that are missed -- ----------------------------- "There is no way a candidate has a 80% chance of winning, the forecasted vote share is only 55%" `\(\leadsto\)` confuses different events -- ----------------------------- "Group A has a higher rate of some behavior, therefore most of the behavior is from group A" `\(\leadsto\)` confuses two different problems -- ----------------------------- "This is a low probability event, therefore God designed it" `\(\leadsto\)` 1. Even if we stipulate to a low probability event, intelligent design is an assumption 1. Low probability obviously doesn't imply divine intervention. Take 100 balls and let them sort into 2 bins. You'll get a result, but probability of that result `\(= 1/(10^{29} \times \text{Number of Atoms in Universe})\)` --- # Birthday problem * Suppose we have a room full of `\(N\)` people. What is the probability at least 2 people have the same birthday? * Assuming leap year counts, `\(N = 367\)` guarantees at least two people with same birthday * For `\(N< 367?\)` --- # Birthday problem <img src="09-sample-space-probability_files/figure-html/birth-sim-1.png" width="864" style="display: block; margin: auto;" /> --- # BIRTHDAY: BUT WHYYYYY? * Me and you: only two possible birthdays and only one pair: (Me, You) * Me and you and you there: three possible birthdays BUT THREE possible pairs (MY, MYT, YYT) * Fifty of us! Then, we have about 97% chance that there's a shared birthday: 1225 pairs (!!!) -- To math it even more, think about it: suppose you have one person's birthday. Now, it has to be the case that each other person does not share thair birthday. When it's you and another person, that means there are only two days that are off limits. However, if you have 50 people, it would have to be the case that you have fifty separate days, which is pretty unlikely (about 3% likelihood). --- # Combining Birthdays: Passwords and Committees We can figure out how many different groups are possible using combinatorics. They bring the fun! If we have a group of `\(n\)` individuals and we want to select a committee of `\(k\)` individuals, we can represent this mathematically as: `$$n \choose k$$` -- This becomes: `$$\frac{n!}{k!(n-k)!}$$` In words: we have n groups and we're splitting them into loose groups of k. This is the **committee** version -- you are on the committee or not, so the order of our list is inconsequential. Mathematically, we call this is a **combination**. --- # Combining Birthdays: Passwords and Committees In contrast, suppose you are making a TOP SECRET PASSWORD! Here, it needs to be `\(k\)` letters long, from `\(n\)` possible letters. In this case, ORDER DOES MATTER (hello vs elloh). -- If this is the case, we would use the formula: `$$\frac{n!}{(n-k)!}$$` Here, we're allowing **MORE GROUPS** because, for example, (1,2,3) is different from (2,1,3). This is the **password** or **permutation** approach. --- # Conditional probability Social scientists almost always examine **conditional** relationships * Given opposite Party ID, probability of date * Given low-interest rates probability of high inflation * Given "economic anxiety" probability of voting for politician -- Intuition: * Some event has occurred: an outcome was realized * And with the knowledge that this outcome has already happened * What is the probability that something in another set happens? --- # Conditional probability Suppose we have two events, `\(E\)` and `\(F\)`, and that `\(\Pr(F)>0\)`. Then, `$$\Pr(E|F) = \frac{\Pr(E\cap F ) } {\Pr(F) }$$` * `\(\Pr(E \cap F)\)` * `\(\Pr(F)\)` --- # Elections ### Example 1 * `\(F = \{\text{All Democrats Win} \}\)` * `\(E = \{\text{Tammy Duckworth Wins (D-IL} \}\)` * If `\(F\)` occurs then `\(E\)` must occur, `\(\Pr(E|F) = 1\)` -- ### Example 2 * `\(F = \{\text{All Democrats Win} \}\)` * `\(E = \{ \text{Chuck Grassley Wins (R-IA)} \}\)` * `\(F \cap E = \emptyset \Rightarrow \Pr(E|F) = \frac{\Pr(F \cap E) }{\Pr(F)} = \frac{\Pr(\emptyset)}{\Pr(F)} = 0\)` --- # Elections ### Incumbency advantage * `\(I = \{ \text{Candidate is an incumbent} \}\)` * `\(D = \{ \text{Candidate Defeated} \}\)` * `\(\Pr(D|I) = \frac{\Pr(D \cap I)}{\Pr(I) }\)` --- # Difference between `\(\Pr(A|B)\)` and `\(\Pr(B|A)\)` $$ `\begin{aligned} \Pr(A|B) & = \frac{\Pr(A\cap B)}{\Pr(B)} \\ \Pr(B|A) & = \frac{\Pr(A \cap B) } {\Pr(A)} \end{aligned}` $$ -- Type of person who is a Swiftie: `$$\begin{aligned}\Pr(\text{Seeing the Eras tour}| \text{Swiftie}) & = 0.1 \\\Pr(\text{Swiftie}| \text{Seeing the Eras tour}) & \approx 1\end{aligned}$$` --- # Law of total probability Suppose that we have a set of events `\(F_{1}, F_{2}, \ldots, F_{N}\)` such that the events are mutually exclusive and together comprise the entire sample space `\(\cup_{i=1}^{N} F_{i} = \Omega\)`. -- Then, for any event `\(E\)` `$$\Pr(E) = \sum_{i=1}^{N} \Pr(E | F_{i} ) \times \Pr(F_{i})$$` --- # Voter mobilization Infer `\(\Pr(\text{vote})\)` after mobilization campaign * `\(\Pr(\text{vote}|\text{mobilized} ) = 0.75\)` * `\(\Pr(\text{vote}| \text{not mobilized} ) = 0.25\)` * `\(\Pr(\text{mobilized}) = 0.6 ; \Pr(\text{not mobilized} ) = 0.4\)` * What is `\(\Pr(\text{vote})\)`? -- Sample space (one person) = `\(\{\)` (mobilized, vote), (mobilized, not vote), (not mobilized, vote) , (not mobilized, not vote) `\(\}\)` -- * Mobilization partitions the space -- * Apply the law of total probability `$$\begin{aligned}\Pr(\text{vote} ) & = \Pr(\text{vote}| \text{mob.} ) \times \Pr(\text{mob.} ) + \Pr(\text{vote} | \text{not mob} ) \times \Pr(\text{not mob}) \\ & = 0.75 \times 0.6 + 0.25 \times 0.4 \\ & = 0.55 \end{aligned}$$` --- # Chess tournament * Enter a chess tournament where your probability of winning a game is * `\(0.3\)` against half the players (type 1) * `\(0.4\)` against a quarter of the players (type 2) * `\(0.5\)` against the remaining quarter of the players (type 3) * What is the probability of winning against a randomly chosen opponent? -- Let `\(A_i\)` be the event of playing with an opponent of type `\(i\)` `$$\Pr (A_1) = 0.5, \quad \Pr (A_2) = 0.25, \quad \Pr (A_3) = 0.25$$` -- Let `\(B\)` be the event of winning `$$\Pr (B | A_1) = 0.3, \quad \Pr (B | A_2) = 0.4, \quad \Pr (B | A_3) = 0.5$$` -- Total probability theorem `$$\begin{aligned}\Pr (B) &= \Pr (A_1) \Pr (B | A_1) + \Pr (A_2) \Pr (B | A_2) + \Pr (A_3) \Pr (B | A_3) \\&= 0.5 \times 0.3 + 0.25 \times 0.4 + 0.25 \times 0.5 \\&= 0.375\end{aligned}$$` --- # Bayes' Rule .center[ ![[Modified Bayes' Theorem](https://xkcd.com/2059/)](https://imgs.xkcd.com/comics/modified_bayes_theorem.png) ] --- # Bayes' Rule * `\(\Pr(B|A)\)` may be easy to obtain * `\(\Pr(A|B)\)` may be harder to determine * Bayes' rule provides a method to move from `\(\Pr(B|A)\)` to `\(\Pr(A|B)\)` -- Bayes' Rule: For two events `\(A\)` and `\(B\)`, $$\Pr(A|B) = \frac{\Pr(A)\times \Pr(B|A)}{\Pr(B)} $$ -- $$ `\begin{aligned} \Pr(A|B) & = \frac{\Pr(A \cap B) }{\Pr(B) } \\ & = \frac{\Pr(B|A)\Pr(A) } {\Pr(B) } \end{aligned}` $$ --- # Chess tournament redux `$$\Pr (A_1) = 0.5, \quad \Pr (A_2) = 0.25, \quad \Pr (A_3) = 0.25$$` `$$\Pr (B | A_1) = 0.3, \quad \Pr (B | A_2) = 0.4, \quad \Pr (B | A_3) = 0.5$$` Suppose that you win. What is the probability `\(\Pr (A_1 | B)\)` that you had an opponent of type 1? -- `$$\begin{aligned}\Pr (A_1 | B) &= \frac{\Pr (A_1) \Pr (B | A_1)}{\Pr (A_1) \Pr (B | A_1) + \Pr (A_2) \Pr (B | A_2) + \Pr (A_3) \Pr (B | A_3)} \\&= \frac{0.5 \times 0.3}{0.5 \times 0.3 + 0.25 \times 0.4 + 0.25 \times 0.5} \\&= \frac{0.15}{0.375} \\&= 0.4\end{aligned}$$` --- # Identifying racial groups by name * Racial distribution of names * `\(\Pr (\text{black}) = 0.126\)` * `\(\Pr (\text{not black}) = 1 - \Pr (\text{black}) = 0.874\)` * `\(\Pr (\text{Washington} | \text{black}) = 0.00378\)` * `\(\Pr (\text{Washington} | \text{not black}) = 0.000060615\)` * What is the probability of being black conditional on having the name "Washington"? -- `$$\begin{aligned}\Pr(\text{black}|\text{Wash} ) & = \frac{\Pr(\text{black}) \Pr(\text{Wash}| \text{black}) }{\Pr(\text{Wash} ) } \\ & = \frac{\Pr(\text{black}) \Pr(\text{Wash}| \text{black}) }{\Pr(\text{black})\Pr(\text{Wash}|\text{black}) + \Pr(\text{nb})\Pr(\text{Wash}| \text{nb}) } \\ & = \frac{0.126 \times 0.00378}{0.126\times 0.00378 + 0.874 \times 0.000060616} \\ & \approx 0.9 \end{aligned}$$` --- # Let's Make a Deal
--- # Monty Hall problem > Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice? --- # Monty Hall problem * Suppose we have three doors `\(A, B, C\)` * Contestant guesses `\(A\)` * `\(\Pr(A) = 1/3 \leadsto\)` chance of winning without switch * If `\(C\)` is revealed to not have a car: -- `$$\begin{aligned}\Pr(B| C \text{ revealed} ) & = \frac{\Pr(B)\Pr(C \text{ revealed} | B)}{\Pr(B)\Pr(C \text{ revealed} | B) + \Pr(A) \Pr(C \text{ revealed} | A) } \\& = \frac{1/3 \times 1}{1/3 \times 1 + 1/3 \times 1/2 } = \frac{1/3}{1/2} = \frac{2}{3}\end{aligned}$$` -- `$$\begin{aligned}\Pr(A| C \text{ revealed} ) & = \frac{\Pr(A) \Pr(C \text{ revealed} | A)}{ \Pr(B)\Pr(C \text{ revealed} | B) + \Pr(A) \Pr(C \text{ revealed} | A) } \\& = \frac{1/3 \times 1/2}{1/3 \times 1 + 1/3 \times 1/2} = \frac{1}{3} \end{aligned}$$` --- # Monty Hall Moral: ALWAYS* SWITCH! (non-random revelation!) -- <img src="https://imgs.xkcd.com/comics/monty_hall_2x.png" style="display: block; margin: auto;" /> --- # Additional Bayes applications: False-positive puzzle -- > A test for a certain rare disease is assumed to be correct 95% of the time: if a person has the disease, the test results are positive with probability `\(0.95\)`, and if the person does not have the disease, the test results are negative with probability `\(0.95\)`. A random person drawn from a certain population has probability `\(0.001\)` of having the disease. Given that the person just tested positive, what is the probability of having the disease? --- # False-positive puzzle * `\(A\)` - event that the person has the disease * `\(B\)` - event that the test results are positive $$ `\begin{aligned} \Pr (A) &= 0.001 \\ \Pr (A^c) &= 0.999 \\ \Pr (B | A) &= 0.95 \\ \Pr (B | A^c) &= 0.05 \end{aligned}` $$ -- `$$\begin{aligned}\Pr (A|B) &= \frac{\Pr (A) \Pr (B|A)}{\Pr (A) \Pr (B|A) + \Pr (A^c) \Pr (B | A^c)} \\&= \frac{0.001 \times 0.95}{0.001 \times 0.95 + 0.999 \times 0.05} \\&= 0.0187\end{aligned}$$` -- ### Bonus: what is the probability that someone does NOT have the disease given that they have a positive result? -- *Ans*: `\((0.999*0.05)/(0.001*0.95+0.999*0.05)\)` OR `\(1-0.0187=0.9813\)` --- # Recap: Bayes' Rule * `\(\Pr(B|A)\)` may be easy to obtain * `\(\Pr(A|B)\)` may be harder to determine * Bayes' rule provides a method to move from `\(\Pr(B|A)\)` to `\(\Pr(A|B)\)` > Helps you to understand (estimate) the likelihood of some event, given some other has occurred. Typically can think of as placing context around something you have observed. Bayes' Rule: For two events `\(A\)` and `\(B\)`, $$\Pr(A|B) = \frac{\Pr(A)\times \Pr(B|A)}{\Pr(B)} $$ --- # Independence of probabilities * Two events `\(E\)` and `\(F\)` are independent if `$$\Pr(E\cap F ) = \Pr(E)\Pr(F)$$` * Independence is symmetric * Suppose `\(E\)` and `\(F\)` are independent. Then, `$$\begin{aligned}\Pr(E|F) & = \frac{\Pr(E \cap F) }{\Pr(F) } \\& = \frac{\Pr(E)\Pr(F)}{\Pr(F)} \\& = \Pr(E) \end{aligned}$$` -- > An alternative way to think about this is re: information...If knowing one variable's value / that it has occurred helps you make a better guess, then they probably are not independent. If it doesn't help you make a better guess, then they are likely independent (or you're a bad guesser). --- # Rolling a 4-sided die Consider an experiment involving two successive rolls of a 4-sided die in which all 16 possible outcomes are equally likely and have probability `\(1/16\)`. First: try to identify all outcomes -- what is our sample space? -- `$$\Omega = \{(1,1), (1,2), (1,3), (1,4),\\ (2,1), (2,2), (2,3), (2,4), \\ (3,1), (3,2), (3,3), (3,4),\\ (4,1), (4,2), (4,3), (4,4) \}$$` -- What would it mean for two events to be independent? -- For `\(P(A\cap B) = P(A)*P(B)\)` AND `\(P(A|B)=P(A)\)` --- # Rolling a 4-sided die Are the events `$$A_i = \{ \text{1st roll results in } i \}, \quad B_j = \{ \text{2nd roll results in } j \}$$` independent? -- $$ `\begin{aligned} \Pr (A_i \cap B_j) &= \Pr (\text{the outcome of the two rolls is } (i,j)) = \frac{1}{16} \\ \Pr (A_i) &= \frac{\text{number of elements in } A_i}{\text{total number of possible outcomes}} = \frac{4}{16} \\ \Pr (B_j) &= \frac{\text{number of elements in } B_j}{\text{total number of possible outcomes}} = \frac{4}{16} \end{aligned}` $$ --- # Rolling a 4-sided die Consider an experiment involving two successive rolls of a 4-sided die in which all 16 possible outcomes are equally likely and have probability `\(1/16\)` -------------------------------- Are the events below independent? `$$A = \{ \text{1st roll is a 1} \}, \quad B = \{ \text{sum of the two rolls is a 5} \}$$` -- `$$\Pr (A \cap B) = \Pr (\text{the result of the two rolls is } (1,4)) = \frac{1}{16}$$` `$$\Pr (A) = \frac{\text{number of elements of } A}{\text{total number of possible outcomes}} = \frac{4}{16}$$` -- <!--Event `\(B\)` consists of the outcomes `\((1,4), (2,3), (3,2), (4,1)\)` --> `$$\Pr (B) = \frac{\text{number of elements of } B}{\text{total number of possible outcomes}} = \frac{4}{16}$$` Test: `\(P(A|B) = P(A)\)`? -- `\(P(A|B) = \frac{2}{4}\)`, `\(P(B)=\frac{4}{16}\)`. NOPE! (also `\(P(A\cap B) \neq P(A)*P(B)\)`) --- # Rolling a 4-sided die Are these events independent? `\(A = \{ \text{1st roll is even} \}, \quad B = \{ \text{2nd roll is even} \}\)` -- `$$\Pr (A \cap B) = \Pr (\text{the result of the two rolls is } \{(2,2), (2,4), (4,2), (4,4)\} = \frac{4}{16}$$` `$$\begin{aligned}\Pr (A) &= \frac{\text{number of elements of } A}{\text{total number of possible outcomes}} \\ &= \{(2,1), (2,2),(2,3), (2,4), (4,1), (4,2), (4,3), (4,4)\} \\ & =\frac{8}{16}\end{aligned}$$` -- Similarly, event `\(B\)` consists of rolling a 2 or 4 for our second roll: `\(\{(1,2),(1,4),(2,2),(2,4), (3,2), (3,4), (4,2), (4,4)\}\)` -- `$$\Pr (B) = \frac{\text{number of elements of } B}{\text{total number of possible outcomes}} = \frac{8}{16}$$` Test: `\(P(A|B) = P(A)\)`? `\(P(A|B) = \frac{4}{8}\)`, `\(P(B)=\frac{8}{16}\)`. YES!! (also `\(P(A\cap B) = P(A)*P(B)\)`) --- # Rolling a 4-sided die Consider an experiment involving two successive rolls of a 4-sided die in which all 16 possible outcomes are equally likely and have probability `\(1/16\)` -------------------------------- Are these events independent? `$$A = \{ \text{maximum of the two rolls is 2} \}, \quad B = \{ \text{minimum of the two rolls is 2} \}$$` -- `$$\Pr (A \cap B) = \Pr (\text{the result of the two rolls is } (2,2)) = \frac{1}{16}$$` `$$\begin{aligned}\Pr (A) &= \frac{\text{number of elements in } A_i}{\text{total number of possible outcomes}} = \frac{3}{16} \\\Pr (B) &= \frac{\text{number of elements in } B_j}{\text{total number of possible outcomes}} = \frac{5}{16}\end{aligned}$$` -- Test: `\(P(A|B) = P(A)\)`? `\(\frac{1}{16} \neq \frac{3}{16}\)` NO --- class: middle, center, inverse # Independence --- # Independence and causal inference * Selection and observational studies * We often want to infer the effect of some treatment * Incumbency on vote return * College education and job earnings * Observational studies: observe what we see to make inference * Problem: units select into treatment * Simple example: enroll in job training if I think it will help * `\(\Pr (\text{job} | \text{training in study} \neq \Pr(\text{job} | \text{forced training})\)` * Background characteristic: difference between treatment and control groups * Experiments: make background characteristics and treatment status independent --- # Independence of a collection of events We say that the events `\(A_1, A_2, \ldots, A_n\)` are independent if `$$\Pr \left( \bigcap_{i \in S} A_i \right) = \prod_{i \in S} \Pr (A_i),\quad \text{for every subset } S \text{ of } \{1,2,\ldots,n \}$$` -- ## Example with three events $$ `\begin{aligned} \Pr (A_1 \cap A_2) &= \Pr (A_1) \Pr (A_2) \\ \Pr (A_1 \cap A_3) &= \Pr (A_1) \Pr (A_3) \\ \Pr (A_2 \cap A_3) &= \Pr (A_2) \Pr (A_3) \\ \Pr (A_1 \cap A_2 \cap A_3) &= \Pr (A_1) \Pr (A_2) \Pr (A_3) \end{aligned}` $$ --- # Independent trials * Sequence of independent trials <img src="../images/bernoulli.png" width="35%" style="display: block; margin: auto;" /> * Bernoulli trials - sequence of independent binary trials * Heads or tails * Success or fail * Rains or does not rain --- # Binomial probabilities * `\(n\)` independent tosses of a coin * Probability of heads is `\(p\)` * Independence means that the events `\(A_1, A_2, \ldots, A_n\)` are independent where `\(A_i = i \text{th toss is a heads}\)` --- # Binomial probabilities The binomial function is a series of `\(n\)` independent Bernoulli trials. Each trial has the same possible outcome (success or failure and the probablity of each is constant across the trials). -- `$$p(k) = \Pr(k \text{ heads come up in an } n \text{-toss sequence})$$` The probability of any given sequence that contains `\(k\)` heads is `\(p^k (1-p)^{n-k}\)` `$$p(k) = \binom{n}{k} p^k (1-p)^{n-k}$$` `$$\binom{n}{k} = \text{number of distinct } n \text{-toss sequences that contain } k \text{ heads}$$` `$$\binom{n}{k} = \frac{n!}{k! (n-k)!}, \quad k=0,1,\ldots,n$$` `$$i! = 1 \times 2 \times \cdots \times (i-1) \times i$$` --- ## Binomial formula `$$\sum_{k=0}^n \binom{n}{k} p^k (1-p)^{n-k} = 1$$` --- # Counting (AKA EXPECTED VALUE!!!!!!) * Calculate the total number of possible outcomes in a sample space * Probability of an event `\(A\)` with a finite number of equally likely outcomes, each of which has an already known probability `\(p\)` `$$\Pr (A) = p \times (\text{number of elements of } A)$$` --- # Counting principle Consider a process that consists of `\(r\)` stages. Suppose that: 1. There are `\(n_1\)` possible results at the first stage 1. For every possible result at the first stage, there are `\(n_2\)` possible results at the second stage 1. More generally, for any sequence of possible results at the first `\(i-1\)` stages, there are `\(n_i\)` possible results at the `\(i\)`th stage -- Then, the total number of possible results of the `\(r\)`-stage process is `$$n_1, n_2, \cdots, n_r$$` --- # Telephone numbers * Local number - 7-digit sequence not starting with 0 or 1 * How many distinct telephone numbers are there? -- `$$8 \times 10 \times 10 \times 10 \times 10 \times 10 \times 10 = 8 \times 10^6$$` --- # Permutations * `\(n\)` distinct objects * `\(k\)` some positive integer such that `\(k \leq n\)` * Count the number of different ways that we can pick `\(k\)` out of these `\(n\)` objects and arrange them in a sequence * `\(k\)`**-permutations** - number of distinct `\(k\)`-object sequences `$$\begin{aligned}n(n-1) \cdots (n-k-1) &= \frac{n(n-1) \cdots (n-k+1) (n-k) \cdots 2 \times 1}{(n-k) \cdots 2 \times 1} \\&= \frac{n!}{(n-k)!}\end{aligned}$$` --- # Counting letters * Number of words that consist of four distinct letters * Number of 4-permutations of the 26 letters in the alphabet `$$\frac{n!}{(n-k)!} = \frac{26!}{22!} = 26 \times 25 \times 24 \times 23 = 358,800$$` --- # Combinations * `\(n\)` people * Form a committee of `\(k\)` * How many different committees are possible? -- * Need to count the number of `\(k\)`-element subsets of a given `\(n\)`-element set * Combination - ordering does not matter * 2-permutations of the letters `\(A, B, C, D\)` `$$AB, BA, AC, CA, AD, DA, BC, CB, BD, DB, CD, DC$$` * Combinations of two out of these four letters are `$$AB, AC, AD, BC, BD, CD$$` * General formula `$$\frac{n!}{k!(n-k)!}$$` --- # Counting letters redux The number of combinations of two out of the four letters `\(A, B, C, D\)` is found by letting `\(n=4\)` and `\(k=2\)` -- `$$\binom{n}{k} = \binom{4}{2} = \frac{4!}{2!2!} = 6$$` --- # RECAP! * Sample spaces * Sets: union, intersect * Thinking about how to count elements * Permutations vs Combinations `\(n \choose k\)`: `\(\frac{n!}{k!(n-k)!}\)` (denominator depends on permutation vs combination) * Conditional probability and independence * Bayes' Rule: `\(\Pr(A|B) = \frac{\Pr(A)\times \Pr(B|A)}{\Pr(B)}\)`