4 Boxplots and Histograms
To create a basic box and whisker plot, we use the boxplot()
command. As per usual, we define our data in a variable and then use that variable in the boxplot
function.
Remember that the lines of the boxplot correspond to the quartiles (minimum, 25% mark, 50% mark, 75% mark, maximum). A related and quite useful function is summary()
which will print out all of these values as numbers for us.
Compare the output of this command to where the lines of boxplot are positioned.
4.1 Histograms
Histograms are created with the hist()
function. For instance
Unlike boxplots, we usually want to configure our histogram, i.e. set a number of bins and/or declare the beginning and end of each bin.
We can do this by adding breaks=...
to the end of our histogram function. And the two most common types of breaks are to set a number of bins or to set where the boundaries of those bins are.
There is one more option we need to discuss which is the right=FALSE
option. By default, bins will include their right endpoint, so a bin that goes from \(10\) to \(20\) will include every data point equal to \(20\). Adding right=FALSE
moves all the \(20\) data points into the next bin over (bins will include the left point but not the right point). Compare:
There are \(6\) data points equal to \(20\). In the first histogram those \(6\) count for the middle bin, in the second histogram the count for the right bin.