Introduction to Statistics and Data

Sera Gunn

2026-01-28

Examples of Statistics

  • The average half-life of caffeine in adults is 5 hours
  • Approximately 8% of the US population lacked health insurance in 2023
  • The average person in the US spends 63 minutes a day eating
  • Individuals with autism spectrum disorder are 2.5 times more likely to be left-handed than those without ASD

Definitions

  • Data is a collection of observations and measurements about members of a group (e.g. their age, height, favourite hot beverage, …)
  • Population everyone in a group (e.g. all college students)
  • Parameter a numerical description of some characteristic of the population (e.g. average time spent sleeping)
  • Sample the subgroup of the population whose data we collect (e.g. sample 250 college students)
  • Statistic a numerical description of some characteristic of the sample

The practice of statistics

  • Statistics (the practice) comes in two flavours
    1. Descriptive statistics (collecting, organizing, presenting data)
    2. Inferential statistics (estimating and predicting parameters from sample statistics)

Example

Planned Parenthood wants to know what percent of their patients are on Medicaid. They look through their patient database and find 49% of patients are on Medicaid.

  • This is a census (meaning every patient’s data was collected)
  • Here the sample = the population and this statistic is a parameter

Example 2

Researchers want to know about sleep habits of college students. They question 600 students across several colleges and find that students are getting an average of 6 to 7 hours of sleep.

  • Population: college students
  • Sample: these 600 students
  • Statistic: 6 to 7 hours average sleep
  • Parameter: unknown

Random samples

A sample is called a random sample if every member of the population has an equal chance of being selected.

  • E.g. put everyone’s name in a hat and randomly pick 5 names
  • E.g. select every 10th person in a queue to ask questions to

Not random samples

“Random sample” specifically means it is the same probability for everyone

  • E.g. if we select 2 random MTH 113 sections and 2 random ENG 151 sections
    • maybe there are more ENG 151 sections (less likely to be picked)
    • maybe there are people in both classes (more likely to be picked)
  • It’s still random but it’s not a “random sample” per this definition

Simple random sample

The simplest way to sample people randomly is to just put all the names in a hat and pick.

A simple random sample is always a random sample but the reverse is not necessarily true

If we do anything else, it may be random but it isn’t “simple random sampling”

  • E.g. if we group people by class section it’s not SRS
  • E.g. if we select student IDs ending in 4 it’s not SRS or select every 10th person in a queue
  • E.g. if we sample 10 kids and 10 adults it’s not SRS

Technical definition

A simple random sample is a random sample where every group of the same size has the same chance of being selected.

E.g. if we want 10 kids and 10 adults then some groups of 20 will never be selected (e.g. 20 kids)

Once again

Literally anything other than “everyone’s name goes in a hat, 20 names come out” is not simple random sampling.

Example

Here are some ways to select 5 students

  • Select the first 5 alphabetically
    • not random
  • Select the first 5 by where they sit
    • not random
  • Select a random column/row (assume students are seated in groups of 5)
    • random sample but not simple random
  • Number the attendance sheet and pick 5 students by taking random numbers
    • simple random sample

Types of Data

Name Age Gender Amount of Caffeinated Beverages per day (mg) Hours slept last week
Alex 30 Woman 2 50
Ignacio 23 Man 3 52
Jinu 27 Man 1 54
Paris 26 Nonbinary 1 60
Zahra 19 Woman 0 42

Major categorizations

  • Quantitative or Qualitative
    • quantities are numbers, qualities are descriptions
    • however some numbers like student IDs are descriptive (i.e. qualitative)
  • Quantities can be ranked, compared (bigger, smaller)
  • Quantities can be discrete (whole numbers) or continuous (decimal numbers)
    • e.g. how many kids a person has is a whole number (discrete data)
    • e.g. temperatures are continuous data because they can have decimals (even if they’re rounded to a whole number)

Levels of measurement

What can we do with measurements/data?

  • Is it just a label and we can’t order or compare bigger/smaller? Nominal measurement
    • e.g. student ID
  • If we can order/rank the data from least to greatest but the exact difference between ranks isn’t meaningful Ordinal measurement
    • e.g. grades (A, B, C, D, F), hotel ratings, pain scale
  • If the differences are meaningful but it doesn’t make sense to say “twice as much” Interval measurement
    • e.g. date, temperatures in Celsius/Fahrenheit
  • If differences are meaningful and we can say “twice as much” Ratio measurement
    • e.g. time for an activity, prices, weights, lengths, age

Examples

  • Age
    • quantity, continuous, ratio
  • Gender
    • quality, nominal
  • Amount of caffeinated beverages
    • quantity, discrete, ratio
  • Hours slept
    • quantity, continuous, ratio

Things to consider when making a survey

  • How the question is asked
    • Do you think the United States should forbid public speeches against democracy? (21% said yes)
    • Do you think the United States should allow public speeches against democracy? (48% said no)
  • Sample size
  • Is the sample representative?
  • Correlation vs Causation
  • Conflicts of Interest