2 Trees
Definition 2.1 A graph consists of a set of vertices \(V\) and a set of edges \(E\). Each edge is an unordered pair \(\{v, w\}\) representing a connection between the vertices \(v\) and \(w\).
Consider the graph where the set of vertices are \(V = \{1, 2, 3, 4, 5\}\) and the edges are \(\{1,2\}, \{2,3\}, \{3,4\}, \{4,1\}\) and \(\{1,5\}\). Graphs such of these are more easily depicted as pictures. We try as much as possible to talk about the picture rather than sets of numbers.
Per the definition, a graph is labelled (usually the vertices are \(1, \dots, n\)). Graph theory has a lot of applications and theory has a lot of applications and results in which these labels play no role and so we also have simplified pictures where we do not write down the vertex number (however they still exist in the background).
Definition 2.2 A tree is a special type of graph which satisfies two additional properties: 1. the graph must be connected. We will give a mathematical definition for this later, but for now, this is a picture of what a disconnected graph is:
- the graph cannot have any cycles. Again, we will just use pictures right now:
Trees are common structures in a lot of computer science applications. For instance, filetrees are examples of trees.
Many applications have a hierarchical picture to tree but this visual hierarchy is not part of our base definition (although we can always add a hierarchy on top of the definition). Here is that same tree but without the hierarchy as part of the picture
2.1 Properties of Graphs and Trees
The first property is called the handshaking lemma or degree sum formula.
Definition 2.3 The degree of a vertex, \(\deg(v)\) is the number of other vertices \(w\) for which there is an edge between \(v\) and \(w\). For instance, the vertex a below has a degree of 3 because it is connected to 3 other vertices: b, d, e.
2.1.1 Double Counting
An important idea in combinatorics is double counting where we count the same thing in two different ways. For instance, in the above graph we can count the edges just by looking at the picture (5 edges). Or…we can go vertex by vertex and count the edges at that vertex: a has 3, b has 2, c has 1, d and e both have 2. This gives 10 total. If we think carefully about what is going on here, this second method of vertex-by-vertex counts every edge twice. For instance the edge between a and b contributes both to the count for a and for b. This idea is called the handshaking lemma or degree-sum formula.
Theorem 2.1 If we go vertex by vertex and add up the number of edges at that vertex (i.e. the degree) every edge is counted exactly twice.
\[ \sum \deg(v) = 2|E|. \]
2.2 Leaf Vertices
In developing a developing an understanding of all the properties that trees have, one foundational fact is that every tree with at least two vertices has a leaf vertex where a leaf vertex is a vertex with degree 1.
In a filesystem, any file is a leaf where its unique edge is the edge connecting it to the parent folder. Additionally, any empty folder (that is not the topmost folder) is also a leaf.
I want to take a moment here also to point out that the tree is the data of which vertices are connected and in particular lengths and directions of edges are not part of the data. So we could draw the tree with the vertices placed elsewhere and it will still be the same tree with the same leaves.
2.2.1 Every tree has at least one leaf
It is a fact that every tree with 2 or more vertices has at least one (in fact at least two) leaf vertex. We will discuss the why of this (the proof) in a couple levels of formality.
The reason we say here “2 or more vertices” is because if we have only a single vertex that vertex would have no edges (degree 0) and would not be a leaf because a leaf needs to have exactly one edge.
2.2.1.1 Informal proof
Lets pick any vertex, maybe vertex 1. We have two possibilities 1. this vertex is a leaf (and we have what we were looking for) 2. this vertex has at least two neighbours and in the picture we will call those 2 and 3
Now repeat this reasoning: either 2 or 3 are leaves or they too have a new neighbour other than 1.
All of these new neighbours are distinct (e.g. the new neighbour for 2 is not the same as the new neighbour for 3). If lets say the new neighbour for 2 and 3 were the same well then that would create a cycle (not allowed)
So as long as we never hit a leaf vertex we can always find more vertices. But there are only a finite amount of vertices so we cannot continue this indefinitely. At some point we have to find a leaf vertex (and in fact, one on each side).
2.2.1.2 Induction
This idea of “just keep going and eventually we have to stop because there was only a finite number” has a name. It is known as the principle of mathematical induction. Suppose we show two things: 1. the minimal sized case(s) have a property 2. any big case can be made into a smaller case then we are done because any case can be made smaller and smaller until we eventually find what we are looking for and the process stops because everything is finite.
In terms of trees, the minimal case is a tree consisting of a pair of vertices and a single edge. Each vertex here is a leaf edge.
Now suppose we have a larger, more complex tree. Focus on a single edge somewhere in the tree, let us call its endpoints a and b.
If neither a nor b are leaves then we smush the two vertices together merging the entire edge into single vertex which we will label “ab.” This process is known as edge contraction.
By contracting the edge we create a smaller graph with one fewer vertex and one fewer edge.
This process may be carried out again and again. Choose a new edge, either we find a leaf vertex or we contract. Because we cannot contract indefinitely we say now “by induction” there must be a leaf vertex.
2.2.1.3 A little bit more formal
Going the next step, induction is often thought about as “going up” rather than “going down.” We worded the last argument as creating smaller and smaller graphs but another way to say that is
- Every tree with two vertices (there is only one such tree) has a leaf vertex
- Every tree with three vertices may be smushed (contracted) to a tree with two vertices thus every tree with three vertices has a leaf vertex.
- Every tree with four vertices may be contracted to a tree with three vertices so every tree with four vertices has a leaf vertex
- (Induction is where we observe in formal terms that this pattern extends to all whole numbers)
- Every tree with \(n + 1\) vertices has a leaf vertex because (in the step before) every tree with \(n\) vertices has a leaf vertex (because every tree with \(n - 1\) vertices has a leaf vertex because every tree with \(n - 2\) vertices has a leaf vertex…)
Induction is revisited in Keller and Trotter’s book, chapter 3.
2.2.2 Counting Leaves
Actually, before we go any further we need another fact
Theorem 2.2 For any tree, there is always exactly one more vertex than edge. I.e. \(|V| = |E| + 1\).
Proof. Have a look at what we did in the previous section for induction.
- A tree with 1 vertex has 0 edges
- A tree with 2 vertices has 1 edge
- A tree with 3 vertices has 2 edges
- A tree with \(n + 1\) vertices may be smushed into a tree with \(n\) vertices. This contraction process removes 1 vertex and 1 edge so the property of “\(|V| = |E| + 1\)” is preserved.
So by induction all trees have \(|V| = |E| + 1\).
Where the last few sections were about induction, this one will show some ideas for enumeration (i.e. counting stuff). Going all the way back up to the Handshake Lemma Theorem 2.1 let us break up that formula a bit in a way that lets us count leaf vertices.
To that end, let \(n_d\) be the number of vertices of degree \(d\). So \(n_1\) is the number of vertices of degree 1 (leaf vertices).
A common trick in combinatorics is if we have a function like \(\deg\) and we want to count where \(\deg = 1\) then we also count where \(\deg = 2, \deg = 3, \dots\) and then later isolate \(\deg = 1\).
In this tree, we have \(n_1 = 6, n_2 = 2, n_3 = 2, n_4 = 1\). The degree sum is
\[\begin{align*} \sum \deg(v) &= 1 + 1 + 1 + 1 + 1 + 1 + 2 + 2 + 3 + 3 + 4 \\ &= 1 \cdot 6 + 2 \cdot 2 + 3 \cdot 2 + 4 \cdot 1 \\ &= 1n_1 + 2n_2 + 3n_3 + 4n_4 = 2|E| = 20 \end{align*}\]
So separating the degree sum into \(n_1, n_2, \dots\) we get \[ \sum_v \deg(v) = \sum_d d n_d = 2|E|. \] Spelled out: \[ n_1 + 2n_2 + 3n_3 + 4n_4 + \cdots = 2|E|. \]
Now let us look at a similar sum: \[ n_1 + n_2 + n_3 + n_4 + \cdots \] So that is the number of vertices of degree 1, degree 2, degree 3, … well that is every vertex so this sum is \(|V|\). Or, as we learned in Theorem 2.2, \(|V| = |E| + 1\).
So we have two identities: \[\begin{align*} n_1 + 2n_2 + 3n_3 + 4n_4 + \cdots &= 2|E| \\ n_1 + \phantom{2}n_2 + \phantom{3}n_3 + \phantom{4}n_4 + \cdots &= |E| + 1. \end{align*}\] Subtract twice the second from the first and we have \[ -n_1 + \phantom{n_2} + n_3 + 2n_4 + \dots = -2. \]
Almost done! Lastly move the \(-n_1\) to the right and the \(-2\) to the left and we get our main result for this section.
Theorem 2.3 In any tree (with at least two vertices), the number of leaves is \[ n_1 = 2 + n_3 + 2n_4 + 3n_5 + 4n_6 + \cdots \]
This tells us a few interesting things all at once!
Corollary 2.1 Every tree with at least two vertices has at least two leaves: \(n_1 \ge 2\).
Corollary 2.2 Moreover, for every vertex of degree 3 there is an additional leaf. For every vertex of degree 4 there are 2 additional leaves and so on.
We can look at this last result in another way: take a vertex (let’s say of degree 4) and follow the 4 edges until they lead to leaves. So this degree 4 vertex gives 4 leaves at least (more if there are other branches to follow).