Introduction to R: Lecture 8

Topics: Base R, Data Types, and Data Structures

Sabrina Nardin, Summer 2025

Agenda

  1. Define Base R
  2. R Data Types and Data Structures
  3. A Closer Look at Vectors
  4. Subset Data Structures with Base R (extra content, no quiz)

These slides were last updated on July 24, 2025

1. Define Base R

What Is Base R?

When people say “base R”, they are referring to the core or basic features of the R language, developed before the tidyverse (hence the name as base R).

These include:

  • Programming Tools: if-else statements, loops, functions
  • Data Types: numeric, integer, character, logical, factor
  • Data Structures: vectors, matrices, lists, data frames
  • Basic Operations: indexing/subsetting, arithmetic/logical comparisons

Base R vs. the Tidyverse

Base R

  • Developed in the early 1990s as part of R’s core language
  • Includes built-in functions like mean(), length(), etc. and key programming tools like loops, etc.
  • Offers more direct control, but code can be longer or harder to read

 

Tidyverse

  • Formally introduced after 2016 as a set of user-friendly packages
  • Includes packages like ggplot2, dplyr, tidyr, forcats, tidyr, stringr, etc.
  • Prioritizes readability and consistency, making code cleaner and easier to read

How Base R and the Tidyverse Work Together

  • The tidyverse is built on base R — it doesn’t replace it
  • You can combine base R and tidyverse functions in your code
  • You’ll often see both styles used in real-world scripts and examples


Tip

Use the tidyverse for most of your daily work, but understand enough base R to follow and troubleshoot R code!


2. R Data Types and Data Structures

R Data Types and Data Structures

Data Types and Data Structures are part of base R. They are the fundamentals — the “ABC” of how R stores and organizes data.

  • Data Type — what kind of value it is
  • Data Structure — how values are stored and organized

We look at each and at how they relate.

R Data Types

Data Type: describes the kind of value you’re working with.


R has the following main data types (for more see here):

  • Numeric – numbers, either with or without decimals

    • Double with decimals (default): 3.14, 2.0
    • Integer whole numbers: 2L, 100L
  • Character – text or strings (always in quotes): "hello"

  • Logical – TRUE or FALSE values: TRUE, FALSE

  • Factor – categorical values with levels: "low", "medium"


R Data Structures

Data Structure: like a container that holds one or more values.


R has the following main data structures (for more see here):

  • Vector – one-dimensional, all elements same type
    v <- c(1, 2, 3)

  • Matrix – two-dimensional, all elements same type
    m <- matrix(1:6, nrow = 2)

  • Dataframe – two-dimensional, columns can have different types
    df <- data.frame(name = c("A", "B"), age = c(25, 30))

  • List – multi-dimensional, holds elements of any type, even mixed
    l <- list(num = 1:3, name = c("Sabrina", "Laura"))

How Data Types and Data Structures Relate

Every single element in R has a data type. That element is stored inside a larger data structure.

Examples:

x <- c(1, 2, 3)      # a vector (structure) of three elements, all numeric values (type) 
y <- list("a", 5)    # a list (structure) of two elements, one character value and one numeric value (type)


The data structure you use affects what data types you can store:

Not all data structures support every combination of data types. Some allow only one type, while others can store many types together.


How Data Types and Data Structures Relate

Structure Dimensions Allow Mixed Types? Notes
Vector 1D No All elements must be the same data type
Matrix 2D No All elements must be the same data type
Dataframe 2D Yes (by column) Each column is a vector; data types can vary by column
List Flexible Yes Each element can be anything: numbers, text, vectors, even data frames

💻 Practice

Run the code below to create different R objects. Then use class() + str() or glimpse() on each object to explore their data structure.

# vector
num_v <- c(1:9)
char_v <- c("hello", "ciao", "hey")
f_v <- factor(c("low", "medium", "high"))
test <- c("hello", 1, 2)

# matrix
num_m <- matrix(1:15, nrow = 3, ncol = 5, byrow = TRUE)
char_m <- matrix(c("a", "b", "c", "d"), nrow = 2, byrow = TRUE)

# dataframe or tibble
df <- data.frame(
  id = 1:3,
  name = c("Dave", "Ashley", "Rik"),
  age = c(15, 17, 20)) 

# list
l <- list(
  num_v = c(1:3),
  m = matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE),
  another_num_v = c(1,2,4),
  char_v = c("Sabrina", "Zach"),
  d = tibble(var_1 = c(1:4),
              var_2 = c(2:5)))

Why This Matters: Understanding When to Use Each Data Structure

Data Frames first!

So far, we’ve been working mostly with data frames — technically with tibbles, the tidyverse version of them. Data frames are the go-to structure for working with real-world, tabular data in R. Keep using them for most data analysis tasks in this course and beyond!

Strengths of data frames:

  • Can store columns of different data types (e.g., character, numeric, logical)
  • Work easily with dplyr, ggplot2, and other tidyverse packages
  • Make your data easier to manipulate, filter, summarize, and visualize

Why This Matters: Understanding When to Use Each Data Structure

Beyond Data Frames: Know Your Tools

But it’s helpful to understand the full set of R data structures, because each has its own strengths:

  • Vectors: The most basic structure in R; all elements must be the same type.
    You are analyzing survey responses and want to store all participants’ ages in a separate vector for further manipulation.

  • Matrices: Two-dimensional, all elements must be the same type.
    You have numeric data that you want to analyze using matrix algebra or linear models.

  • Lists: Flexible containers that can hold anything — even other lists.
    You are analyzing data and want to save multiple outputs (model results, plot, etc.) all in the same data structure.

  • Data Frames: Two-dimensional, columns can have different types.
    You are analyzing survey responses with dplyr, ggplot, or other tidyverse packages.

3. A Closer Look at Vectors

Example of filtering rows in a dataframe using logical vectors

# load libraries and data
library(tidyverse)
library(palmerpenguins)
data(penguins)

# use dplyr filter() to get penguins where body mass is greater than 4000 grams
filtered_penguins <- penguins %>% filter(body_mass_g > 4000)
head(filtered_penguins)

Example of filtering rows in a dataframe using logical vectors

Let’s analyze the previous code to see what’s happening under the hood: R defines a logical vector and applies it to the penguins dataframe:

# use base R to get the column we need: gives a logical vector
filter_vector <- penguins$body_mass_g > 4000

# check its structure
is.vector(filter_vector)
class(filter_vector) 

# use this vector to manually filter the dataframe using base R 
# df[rows, columns]: rows to keep, all columns
filtered_p <- penguins[filter_vector, ]

A particular vector: Scalar

In math a scalar is defined as a single real number but in R, a scalar is simply a vector of length one

Let’s try this code:

# set up a vector x of length 10
x <- sample(10)
x

# add 100 to x using the long way
x + c(100, 100, 100, 100, 100, 100, 100, 100, 100, 100)

# add 100 to x using the "R" way: vector recycling!
x + 100

Vector Recycling in R

Let’s see another example. We define two numeric vectors x1 and x2:

# x1 is sequence of numbers from 1 to 2
x1 <- seq(from = 1, to = 2)

# x2 is a sequence of numbers from 1 to 10
x2 <- seq(from = 1, to = 10)

# what happens if we add them?
x1 + x2

What happens? The shorter vector x1 is duplicated five times in order to match the length of the longer vector x2. The same behavior happens for other operations like subtraction, multiplication, logical comparison, etc.

Warning

If the shorter vector is not a multiple of the longer one, R will print a warning message!

Vector Recycling in R

Note

This behavior is called Vector Recycling and happens automatically in R: When two vectors of different lengths are used in an operation, R repeats (recycles) the elements of the shorter vector to match the longer vector!

Why It Matters: Vector recycling can cause unexpected results. Check if recycling is what you want R to do — if not, adjust the vector lengths manually, like this:

x1 <- c(1, 2, rep(0, 7))

x1 + x2

4. Subset Data Structures with Base R

Subsetting Data Structures in R

There are different ways to access parts of data structures in R.

We’ll cover:

  • Vectors
  • Matrices
  • Data Frames
  • Lists

Each structure uses square brackets [ ], but how you use them depends on the structure.

Subsetting Vectors

v <- c(10, 20, 30, 40)

v[2]         # 20 (2nd element)
v[c(1, 3)]   # 10 and 30
v[-1]        # all but the first
v[v > 25]    # elements greater than 25

Vectors are 1D — you just specify positions or logical tests in [ ]

Subsetting Matrices

m <- matrix(1:9, nrow = 3, byrow = TRUE)

m[1, 2]           # value at row 1, column 2
m[ , 3]           # entire 3rd column
m[2,  ]           # entire 2nd row
m[1:2, 2:3]       # top-right submatrix (rows 1–2, cols 2–3)
m[ , c(1, 3)]     # columns 1 and 3

Matrices are 2D: use [row, column]

Subsetting Data Frames

df <- data.frame(
  name = c("Alex", "Betty", "Chad", "J"),
  age = c(20, 21, 22, 23))

df[1, ]            # first row (all columns)
df[ , 1]           # first column by index
df[ , "name"]      # first column by label
df$name            # same as above
df[["name"]]       # same as above

df[1, 2]           # single value at row 1, column 2
df[2:3, "name"]    # values in rows 2 and 3 of column name

df[df$age > 20, ]            # filter rows where age > 20
df[df$name == "J", ]         # filter rows where name is "J"
df[df$age == max(df$age), ]  # filter rows with highest age

Data frames combine list and matrix behavior.

Subsetting Lists

my_list <- list(
  a = c(1, 2, 3),
  b = c("hello", "ciao"),
  c = TRUE
)

my_list[1]         # list with one element
my_list[[1]]       # contents of first element: 1 2 3
my_list[[1]][1]    # first value of that vector: 1
my_list["b"]       # list with one element named "b"
my_list$b          # same as my_list[["b"]], only works with names
my_list[["c"]]     # returns TRUE
  • Use [ ] to extract a sublist
  • Use [[ ]] or $ to extract actual elements

To print these slides as pdf

Click on the icon bottom-right corner > Tools > PDF Export Mode > Print as a Pdf