2 Inclusion/Exclusion
2.1 For two sets
Example 2.1 Suppose there are \(100\) CS students an a school. Of these, \(50\) know Java and \(60\) know Python (everyone knows at least one of these two languages). If we add \(50 + 60\) we get the \(100\) total students plus \(10\) students who know both.
\[ 50 + 60 = 100 + 10. \]
Given two sets \(A\) and \(B\), we can compute \(|A \cup B|\) by first adding everything in \(A\) and adding everything in \(B\). This counts all of \(A \cup B\) except the items in \(A \cap B\) were counted twice. Thus
\[ |A| + |B| = |A \cup B| + |A \cap B|. \]
2.2 For three sets
We can do a similar process for three sets. Start by adding up \(|A| + |B| + |C|\) then subtract all the intersections of pairs then add back in \(|A \cap B \cap C|\). For items in each region of the Venn diagram, think about how many times an item is added/subtracted overall.
E.g. items in \(A\) and \(B\) but not \(C\) will be added in twice in the first step (\(|A| + |B|\)) and subtracted once in the second step (\(|A \cap B|\)) so they are counted once.
Items in \(A \cap B \cap C\) are added in \(3 - 3 + 1 = 1\) times.
2.3 Binomial coefficients and I/E
Suppose we are looking at a union of \(n\) sets: \(A_1 \cup \dots \cup A_n\). Consider an element \(x\) belonging exactly to \(A_1, \dots, A_m\) and no other sets. If we do the same procedure of adding \(|A_1| + \dots + |A_n|\) then we are counting \(x\) a total of \(m\) times. Then, when we subtract all the pairwise intersections, \(|A_i \cap A_j|\), we are subtracting from the count of \(x\) a total of \(\binom{m}{2}\) because there are \(\binom{m}{2}\) ways to choose two indices \(\{i, j\} \subset \{1, \dots, m\}\).
Thus, if we alternate: adding in single sets, subtracting pairs of intersections, adding triples, subtracting quadruples, etc. we are counting \(x\) a total of
\[ \binom{m}1 - \binom{m}2 + \binom{m}3 - \binom{m}4 + \cdots \tag{2.1}\] times.
Let’s have a closer look at this. We know the Binomial Theorem says that \[ (1 + x)^m = \binom{m}0x^0 + \binom{m}1x^1 + \binom{m}2x^2 + \dots + \binom{m}mx^m. \] Substituting \(x = -1\), we get
\[ 0 = \binom{m}0 - \binom{m}1 + \binom{m}2 - \dots + \binom{m}m (-1)^m. \]
And since \(\binom{m}0 = 1\), if we move all the other terms to the other side of the equation, we see that Equation 2.1 evaluates to \(1\).
For this problem we are counting the size of \(A_1 \cup \dots \cup A_n\) so each element is in at least one set (\(m \ge 1\)). This is important so that we can say that \(0^m = 0\). In the context of the binomial theorem, \((1 - 1)^0 = \binom{0}{0} = 1\). This will be important later when we want to count elements not belonging to any set.
The general I/E formula is then
\[\begin{align*} |A_1 \cup \dots \cup A_n| &= \sum_i |A_i| - \sum_{i < j} |A_i \cap A_j| + \sum_{i < j < k} |A_i \cap A_j \cap A_k| - \dots \\ &= \sum_{t = 1}^n (-1)^{t + 1} \sum_{i_1 < \dots < i_t} |A_{i_1} \cap \dots \cap A_{i_t}|. \end{align*}\]
Note: \((-1)^{t + 1}\) takes the values \(1, -1, 1, -1, \dots\) starting at \(t = 1\).
2.4 Avoiding properties
Inclusion/Exclusion appears most frequently in combinatorics not as a means to count \(|A_1 \cup \dots \cup A_n|\) directly but rather as a means to count everything not in \(A_1 \cup \dots \cup A_n\).
For example, let $[m] = {1,,m} and suppose we want to count the number of functions \(f : [m] \to [n]\) which don’t miss anything in the codomain. I.e. if \(A_i\) is the set of functions where \(f(x)\) never equals \(i\), then we want to count every function not in any set \(A_i\).
Let \(X\) be the total set of functions. Then to count the functions avoiding \(A_1 \cup \dots \cup A_n\) we do
\[\begin{align*} |X \setminus (A_1 \cup \dots A_n)| &= |X| - |A_1 \cup \dots A_n| \\ &= |X| - \sum_i |A_i| + \sum_{i < j} |A_i \cap A_j| - \cdots \end{align*}\]
The formula \(|A \setminus B| = |A| - |B|\) works only when \(B\) is contained entirely inside \(A\).
2.5 Simplifying notation.
Let \(\mathscr{P} = \{P_1, \dots, P_n\}\) be a collection of sets representing negative properties—conditions we want to avoid. For a subset \(S \subseteq \{1,\dots,n\}\), let \(N_\ge(S)\) be the number of items satisfying at least those properties in \(S\). I.e. \(\bigcap_{i \in S} P_i\). We will say “\(x\) satisfies \(S\)” for short. Also, if \(S = \{a,b,c\}\), let us write for example, \(N_\ge(a,b,c)\) instead of \(N_\ge(\{a,b,c\})\) for simplicity.
Let \(N_{=}(S)\) be the number of \(x\) satisfying exactly those properties in \(S\) and no properties not in \(S\). When \(\mathscr{P}\) represents properties we wish to avoid, then \(N_{=}(\varnothing)\) is the number of elements satisfying none of the properties.
Theorem 2.1 (Inclusion/Exclusion) \[\begin{align*} N_{=}(\varnothing) &= N_{\ge}(\varnothing) - \sum_i N_\ge(i) + \sum_{i < j} N_\ge(i,j) - \cdots \\ &= \sum_{S} (-1)^{|S|} N_\ge(S). \end{align*}\]
Proof. As in Section 2.3, we will break up the sum focusing on each element. So first, we will write \[ \sum_{S} (-1)^{|S|} N_\ge(S) = \sum_S (-1)^{|S|} \sum_{\substack{x \\ x \text{ satisfies } S}} 1 \]
Here we replace the number \(N_\ge(S)\) by a count of \(1\) for each element \(x\) which satisfies \(S\). This introduces a second sum into the picture and that will allow us to swap the order of summations. Currently, the second sum is over all \(x\) with respect to the relation “\(x\) satisfies \(S\).” When we put the sum over \(x\) outside, we still have the relation “\(x\) satisfies \(S\)” but rather than summing over all \(x\) with this property, we sum over all \(S\):
\[ \sum_S (-1)^{|S|} \sum_{\substack{x \\ x \text{ satisfies } S}} 1 = \sum_x \sum_{\substack{S \\ x \text{ satisfies } S}} (-1)^{|S|}. \]
We also move the \((-1)^|S|\) inside the second sum because that quantity depends on \(|S|\).
Next, just as we did in Section 2.3, let us say that \(x\) satisfies exactly \(P_{i_1}, \dots, P_{i_m}\) (\(m\) will depend on \(x\)) and let \(S_x = \{i_1, \dots, i_m\}\). This set is the largest set that \(x\) satisfies. Every other set that \(x\) satisfies will be a subset of \(S_x\).
We now break up our sum based on the size of those subsets \(S \subseteq S_x\), using the binomial coefficients to count the number of such \(S\):
\[ \sum_x \sum_{\substack{S \\ x \text{ satisfies } S}} (-1)^{|S|} = \sum_x \sum_{k = 0}^m \underbrace{\binom{m}{k} (-1)^k}_{\substack{|S| = k \text{ and } x \text{ satisfies } S \\ \iff |S| = k \text{ and } S \subseteq S_x}}. \]
This, by the Binomial Theorem, is the same as \[ \sum_x (1 - 1)^m. \] Remember here that \(m\) depends on \(x\).
As discussed in Note 2.1, \(0^m = 0\) if \(m \ge 1\) but if \(m = 0\) then \(0^0 = 1\). Saying \(m = 0\) means that \(x\) satisfies exactly \(0\) properties—which is what we are looking for. So the final simplification looks like \[ \sum_x 0^m = \sum_{\substack{x \\ m = 0}} 1 = N_{=}(\varnothing). \]
2.6 Application 1: Surjections
As in Section 2.4, let \(X\) be the set of all functions from \([m]\) to \([n]\) and let \(P_i\) be the set of functions where \(i\) is never an output: \(f(x) \neq i\) for any input \(x\).
Lemma 2.1
- The number of functions from an \(m\) element set to an \(n\) element set is \(n^m\).
- The number of functions from an \(m\) element set to an \(n\) element set that avoid \(k\) outputs is \((n - k)^m\).
Proof.
- There are \(n\) choices for \(f(1)\) and \(n\) choices for \(f(2)\), etc. So there are \(n^m\) choices in total for \(f(1), \dots, f(m)\).
- If we are avoiding \(k\) outputs then there are only \(n - k\) choices for each of \(f(1),\dots,f(m)\) so \((n - k)^m\).
With this in mind, we have \(N_\ge(S) = (n - k)^m\) if \(|S| = k\) (we avoid at least the specified \(k\) outputs). So by inclusion exclusion, the number of functions which avoid no outputs (i.e. surjections) is
\[\begin{align*} &N_\ge(\varnothing) - \sum_i N_\ge(i) + \sum_{i < j} N_\ge(i,j) - \cdots \\ &\hspace{4em} = n^m - \sum_i (n - 1)^m + \sum_{i < j} (n - 2)^m - \cdots. \end{align*}\]
And since there are \(\binom{n}{k}\) ways to choose \(k\) outputs to avoid, we can also write this as
\[ n^m - \binom{n}1 (n - 1)^m + \binom{n}2 (n - 2)^m - \dots = \sum_{k} \binom{n}k (-1)^k (n - k)^m. \]
The textbook calls this number \(S(m,n)\).
In the above application, \(N_\ge(S) = (n - k)^m\) only depended on the size \(k\) of \(S\). This is true in many but not all examples.
2.7 Application 2: Derangements
Let \(X\) be the set of all permutations of \([n]\). We wish to count the permutations with no fixed points. So let \(P_i\) be the property that \(i\) is a fixed point. Then \(N_{=}(\varnothing)\) is the number of permutations with zero fixed points. This number is called \(d_n\) in the textbook.
Lemma 2.2
- The number of permutations of \([n]\) is \(n!\)
- The number of permutations with at least \(k\) specified fixed points is \((n - k)!\)
Proof.
- Discussed in Chapter 2 of the book.
- To say that there are \(k\) specified fixed points means were are permuting the other \(n - k\) items. Similar to (a), there are \((n - k)!\) ways to permute \(n - k\) items.
Applying Inclusion/Exclusion, we thus have \[ d_n = n! - \binom{n}1 (n - 1)! + \binom{n}2 (n - 2)! - \dots = \sum_{k} \binom{n}k (-1)^k (n - k)!. \] Again, there are \(\binom{n}k\) ways to choose a set \(S\) of \(k\) fixed points and each of these has the same number \(N_\ge(S) = (n - k)!\).