8 Activity 5
The first calculation we’ll do is \(P(X < x)\) where \(X\) follows a normal distribution. The code for this is pnorm(x, mean=μ, sd=σ)
or just pnorm(x, μ, σ)
for short.
Exercise 8.1 The IQ scale is defined to have a mean of \(100\) and standard deviation of \(15\). Using the function pnorm(x, mean=100, sd=15)
, compute the following.
- What percent of people have an IQ below \(100\)? (i.e.
pnorm(100, mean=100, sd=15)
, remember to convert to a percent by moving the decimal over) - What percent of people have an IQ below \(85\)?
- What percent of people have an IQ above \(150\)? Remember: \(P(X > x) = 1 - P(X < x)\) so use
1 - pnorm
for the calculation. - You may have seen statements like “so-and-so has an IQ of some large number.” Using \(P(X > x) = 1 - P(X < x)\) compute the probability that someone would have an IQ of \(180\) or more.
In Q4: the e-08
at the end means 1/10^8
or 1/10,000,000
with 8 zeros. E.g. 3e-08
means “3 in 10 million.” This calculation demonstrates that these extraordinarily high numbers are unreliable as there are simply not enough data points to calibrate an IQ test to decide between a 1 in a million versus 1 in 10 thousand level.
IQ tests will usually have a range that they are calibrated to test. That range might be something like \(\mu \pm 3 \sigma\) (\(= 100 \pm 3 \cdot 15 = 55\) to \(145\)).
Let us also do a couple calculations where we change \(\mu\) and \(\sigma\). Remember: pnorm
if it’s the probability of “below \(x\)” and 1 - pnorm
if “above \(x\).”
Exercise 8.2
- Suppose a math department gives a common final exam for one of their classes. The exam has a mean of \(71\) and a standard deviation of \(6.5\). Approximately what percentage of students scored above \(60\) on this exam?
- You have data of the blood pressure for a certain patient group (e.g. people taking a certain drug). For that group, the mean systolic pressure is \(127 \; \mathrm{mmHg}\) with a standard deviation of \(4.6\).
- If a patient in this group has a systolic pressure of \(139 \; \mathrm{mmHg}\) what is the \(z\)-score for that measurement (using \(z = (x - \mu)/\sigma\))? Is that measurement unusual based on our \(z > 2\) threshold?
- Compute the probability of seeing a patient with a systolic pressure of \(139\) or above. Note: you can do this either with
1 - pnorm(z)
or1 - pnorm(x, μ, σ)
. Do the calculation both ways for practice.
8.1 Inverse normal calculations
As we just saw, pnorm
answers the question “what is the probability of being above or below this threshold?” The inverse of this is “what threshold corresponds to a given probability?” This is what qnorm(p, μ, σ)
function does.
\(p\) is always the proportion which are below the threshold. If you want the threshold for “this percent above”, first compute \(1 - p\) to get the proportion which is below. Alternatively, one can use the lower.tail=FALSE
parameter.
Exercise 8.3
- For the IQ test with \(\mu = 100, \sigma = 15\), what is the IQ score for which \(98\%\) of people are below? This is
qnorm(0.98, 100, 15)
. - What is the IQ score for which \(90\%\) of people are above? (First: if \(90\%\) are above, what percent are below?)
- For an exam with a mean of \(71\) and standard deviation of \(6.5\). What is the approximate \(25\)-th and \(75\)-th percentile for that exam? I.e. what exam score has \(25\%\) of people below it and which has \(75\%\) below it?
- For the group of patients with a systolic pressure mean of \(127\) and standard deviation of \(4.6\), what is the threshold for which only \(5\%\) are below and what is the threshold for which only \(5\%\) are above? (Again, \(5\%\) above means \(??\%\) below?)
8.2 Binomial distributions
Recall the mean and standard deviation of a binomial distribution \(X \sim \operatorname{Bin}(n, p)\) are
\[ \mu = np, \quad \sigma = \sqrt{npq} = \sqrt{np(1 - p)}. \]
Exercise 8.4 Suppose you flip \(100\) coins. What is the probability of getting between \(40\) and \(60\) heads?
- Using a normal distribution approximation, we first compute \(\mu, \sigma\). Then we use the formula \(P(40 < X < 60) = P(X < 60) - P(X < 40)\). Run the following cell and report the result.
Now suppose we flip \(10000\) coins and ask for \(P(4950 < X < 5050)\). In the previous code cell, change \(n\) to \(10000\) and change the \(60\) and \(40\) to \(5050, 4950\) and report the result.
We can also use the
pbinom
function directly. To how many decimal places does this agree with your answer to Q1?
- Again, change the \(100\)’s to \(10000\)’s and change \(60\) to \(5050\) and \(40\) to \(4950\). To how many decimal places does this agree with Q2?
In general, expect the normal approximation to be more accurate the larger \(n\) is.
Exercise 8.5 A multiple choice test has \(n = 100\) questions. Between the questions you know and random guessing, you have a \(p = 0.8\) chance of answering a question correctly.
- Compute \(\mu = np\) and \(\sigma = \sqrt{np(1-p)}\).
- Use those values of \(\mu\) and \(\sigma\) and a normal approximation, to compute the probability that you score \(85\) or above.
- To compute that with
pbinom
note that “\(85\) and above” is opposite of “\(84\) and below.” So you would use1 - pbinom(84, n, p)
. Run the following and compare with the previous value.
Lastly, we’ll look at using the qnorm
function and qbinom
functions.
Exercise 8.6 With the same setup: \(n = 100, p = 0.8\) and your values for \(\mu\) and \(\sigma\) from before.
- Use
qnorm
to compute the threshold for the maximum score you can expect to receive \(95\%\) of the time. I.e. the threshold \(x\) such that your score is \(< x\) \(95\%\) of the time.
- Do the same for
qbinom
and note thatqbinom
takes \(n=100\) and \(p=0.8\) directly rather than the values you’ve computed for \(\mu\) and \(\sigma\).