3  Activity 3

3.1 Working by hand

R has tools for generating random coin flips, dice rolls, drawings, choosing random decimals, and more. Before we get to that, let’s practice doing this by hand.

Exercise 3.1  

  1. Draw a 6 by 6 grid and label the rows and columns with the numbers 1 to 6. There should be enough room in each cell to maintain a tally.
  2. Roll a pair of dice 30 times. Decide which of the two dice will represent the row and which will represent the column. Tally the rolls in the table. For example, if you roll a 3 and a 5, make a tally in row 3, column 5.
  3. Draw a border around the part of the table representing the event: both dice have high values (4, 5 or 6). Give your answers as percentages with one place after the decimal:
    1. What is the relative frequency for this event (number of occurrences out of number of trials)?
    2. What is the theoretical probability for this event (number of cells out of total number of cells)?
  4. Draw a border around the part of the table representing the event: at least one die was a 6. What is the relative frequency for this event?
    1. What is the relative frequency for this event?
    2. What is the theoretical probability for this event?
  5. Let \(A\) be the event “both dice are high values (4, 5 or 6)” and let \(B\) be the event “at least one die is a 6.”
    1. Find the theoretical probability of \(P(A \text{ and } B)\) i.e. count the cells common to both sets you drew borders around.
    2. Do we have \(P(A \text{ and } B) \overset{?}= P(A)P(B)\) (using the theoretical probabilities computed in 3. and 4.)?
    3. Does this mean \(A\) and \(B\) are independent or dependent events?

3.2 Working with sample

In R, to sample whole numbers like dice rolls, we can use the sample() function. Here we sample \(6\) numbers from \(1\) to \(6\).

Exercise 3.2 Run this a few times until you can confidently answer: is this sampling with or without replacement.

Tip

Search engines and LLMs can be useful to finding the right options to use with our functions. Here, if we used a LLM to ask about rolling a die 10 times, it would likely tell us about the replace=TRUE option. You should now see repeated values if you run this a few times:

If you remember, in Activity 1 we we generated a barplot by rolling dice ourselves. Let’s use our sample function to speed that up.

Exercise 3.3  

  1. Run the above code and report the table.
  2. If we want to add two dice together we can do this by adding one sample function to another. Change the first line to read
dice_rolls = sample(1:6, 1000, replace=TRUE) + sample(1:6, 1000, replace=TRUE)

Then change the last line from count to barplot(count). Draw a rough sketch of the shape you see (you don’t need to faithfully represent the height of each bar, just the rough shape of the graph).

Note

In R, adding two lists of numbers adds them first and first, second and second, etc. E.g.

c(1, 3, 2, 5) + c(3, 2, 5, 1)
[1] 4 5 7 6

3.3 Using sample to fill our grid

Going back to our 6x6 grid. We want to take our dice rolls and record not their sum but their row and column.

One way to do this is to:

  1. generate random numbers between 1 and 36
  2. tally them with the table function
    • Note: we also use the factor(..., 1:36) function to ensure that our table also reports any events that occurred 0 times.
  3. turn that table into a 6 by 6 grid using the matrix function

Exercise 3.4  

  1. Run this a few times. With only 30 samples and 36 cells, we’re not going to see very large numbers. What is the largest number you see if you run this 5 times or so?
  2. Now set \(n\) to 10000 (i.e. n = 10000) and record that table.
Note

We expect to see each outcome with a probability of \(1/36\) each time we roll and since \(10000/36 \approx 277.8\), we expect the numbers to be around 278.

  1. Using the numbers from the n = 10000 table, what is the relative frequency of each of the events from Exercise 3.1:
    1. the event where both dice are 4s, 5s or 6s
    2. the event where at least one die is a 6
  2. Are your n = 10000 relative frequencies closer to the theoretical probabilities compared to the by-hand frequencies?

3.4 Discussion (mutually exclusive/independent)

Recall:

  1. Events are mutually exclusive if they cannot happen at the same time. I.e. \(P(A \text{ and } B) = 0\).
  2. Events are independent if the first event does not affect the second (e.g. Separate dice rolls). I.e. \(P(A \text{ and } B) = P(A) P(B)\).

Let’s have a look at a sample experiment:

# this makes the random number generator always generate the same numbers
set.seed(1234)

n = 30
dice = sample(1:36, n, replace=TRUE)
tally = table(factor(dice, 1:36))
matrix(tally, 6, 6)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    0    0    0    0    1
[2,]    1    1    2    2    1    1
[3,]    2    1    1    1    0    0
[4,]    3    0    2    2    1    1
[5,]    2    0    0    1    1    0
[6,]    1    0    0    0    0    2

Exercise 3.5  

  1. Let \(A\) be the event “the row is 1” and let \(B\) be the event “the column is 1.”
    1. Compute \(P(A), P(B), P(A \text{ and } B\) as relative frequencies (note: the denominator is \(n = 30\) here.
    2. Are these events empirically independent? Meaning do the relative frequencies satisfy the law \(P(A \text{ and } B) = P(A)P(B)\)?
    3. Are these events theoretically independent? Meaning whether the first die is a 1, does that have any influence on whether the second die is a 1?
  2. If events \(A\) and \(B\) are both mutually exclusive and independent, explain why either \(P(A) = 0\) or \(P(B) = 0\) (or both). Hint: look at what these say about \(P(A \text{ and } B)\).
Tip

The takeaway from the last exercise is that the terms “mutually exclusive” and “independent” are generally mutually exclusive themselves. Meaning generally if \(A\) and \(B\) are mutually exclusive then they are not independent and vice versa. The only way they can be both is if either \(P(A) = 0\) or \(P(B) = 0\).