Introduction to R: Lecture 8

Topics: Base R, Data Types, and Data Structures

Sabrina Nardin, Summer 2025

Agenda

Define Base R
R Data Types and Data Structures
A Closer Look at Vectors
Subset Data Structures with Base R (extra content, no quiz)

These slides were last updated on July 24, 2025

1. Define Base R

What Is Base R?

When people say “base R”, they are referring to the core or basic features of the R language, developed before the tidyverse (hence the name as base R).

These include:

Programming Tools: if-else statements, loops, functions
Data Types: numeric, integer, character, logical, factor
Data Structures: vectors, matrices, lists, data frames
Basic Operations: indexing/subsetting, arithmetic/logical comparisons

Base R vs. the Tidyverse

Base R

Developed in the early 1990s as part of R’s core language
Includes built-in functions like mean(), length(), etc. and key programming tools like loops, etc.
Offers more direct control, but code can be longer or harder to read

Tidyverse

Formally introduced after 2016 as a set of user-friendly packages
Includes packages like ggplot2, dplyr, tidyr, forcats, tidyr, stringr, etc.
Prioritizes readability and consistency, making code cleaner and easier to read

How Base R and the Tidyverse Work Together

The tidyverse is built on base R — it doesn’t replace it
You can combine base R and tidyverse functions in your code
You’ll often see both styles used in real-world scripts and examples

Tip

Use the tidyverse for most of your daily work, but understand enough base R to follow and troubleshoot R code!

2. R Data Types and Data Structures

R Data Types and Data Structures

Data Types and Data Structures are part of base R. They are the fundamentals — the “ABC” of how R stores and organizes data.

Data Type — what kind of value it is
Data Structure — how values are stored and organized

We look at each and at how they relate.

R Data Types

Data Type: describes the kind of value you’re working with.

R has the following main data types (for more see here):

Numeric – numbers, either with or without decimals

Double with decimals (default): 3.14, 2.0

Integer whole numbers: 2L, 100L

Character – text or strings (always in quotes): "hello"

Logical – TRUE or FALSE values: TRUE, FALSE

Factor – categorical values with levels: "low", "medium"

R Data Structures

Data Structure: like a container that holds one or more values.

R has the following main data structures (for more see here):

Vector – one-dimensional, all elements same type
v <- c(1, 2, 3)

Matrix – two-dimensional, all elements same type
m <- matrix(1:6, nrow = 2)

Dataframe – two-dimensional, columns can have different types
df <- data.frame(name = c("A", "B"), age = c(25, 30))

List – multi-dimensional, holds elements of any type, even mixed
l <- list(num = 1:3, name = c("Sabrina", "Laura"))

How Data Types and Data Structures Relate

Every single element in R has a data type. That element is stored inside a larger data structure.

Examples:

x <- c(1, 2, 3)      # a vector (structure) of three elements, all numeric values (type) 
y <- list("a", 5)    # a list (structure) of two elements, one character value and one numeric value (type)

The data structure you use affects what data types you can store:

Not all data structures support every combination of data types. Some allow only one type, while others can store many types together.

How Data Types and Data Structures Relate

Structure	Dimensions	Allow Mixed Types?	Notes
Vector	1D	No	All elements must be the same data type
Matrix	2D	No	All elements must be the same data type
Dataframe	2D	Yes (by column)	Each column is a vector; data types can vary by column
List	Flexible	Yes	Each element can be anything: numbers, text, vectors, even data frames

💻 Practice

Run the code below to create different R objects. Then use class() + str() or glimpse() on each object to explore their data structure.

# vector
num_v <- c(1:9)
char_v <- c("hello", "ciao", "hey")
f_v <- factor(c("low", "medium", "high"))
test <- c("hello", 1, 2)

# matrix
num_m <- matrix(1:15, nrow = 3, ncol = 5, byrow = TRUE)
char_m <- matrix(c("a", "b", "c", "d"), nrow = 2, byrow = TRUE)

# dataframe or tibble
df <- data.frame(
  id = 1:3,
  name = c("Dave", "Ashley", "Rik"),
  age = c(15, 17, 20)) 

# list
l <- list(
  num_v = c(1:3),
  m = matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE),
  another_num_v = c(1,2,4),
  char_v = c("Sabrina", "Zach"),
  d = tibble(var_1 = c(1:4),
              var_2 = c(2:5)))

Why This Matters: Understanding When to Use Each Data Structure

Data Frames first!

So far, we’ve been working mostly with data frames — technically with tibbles, the tidyverse version of them. Data frames are the go-to structure for working with real-world, tabular data in R. Keep using them for most data analysis tasks in this course and beyond!

Strengths of data frames:

Can store columns of different data types (e.g., character, numeric, logical)
Work easily with dplyr, ggplot2, and other tidyverse packages
Make your data easier to manipulate, filter, summarize, and visualize

Why This Matters: Understanding When to Use Each Data Structure

Beyond Data Frames: Know Your Tools

But it’s helpful to understand the full set of R data structures, because each has its own strengths:

Vectors: The most basic structure in R; all elements must be the same type.
You are analyzing survey responses and want to store all participants’ ages in a separate vector for further manipulation.
Matrices: Two-dimensional, all elements must be the same type.
You have numeric data that you want to analyze using matrix algebra or linear models.
Lists: Flexible containers that can hold anything — even other lists.
You are analyzing data and want to save multiple outputs (model results, plot, etc.) all in the same data structure.
Data Frames: Two-dimensional, columns can have different types.
You are analyzing survey responses with dplyr, ggplot, or other tidyverse packages.

3. A Closer Look at Vectors

Example of filtering rows in a dataframe using logical vectors

# load libraries and data
library(tidyverse)
library(palmerpenguins)
data(penguins)

# use dplyr filter() to get penguins where body mass is greater than 4000 grams
filtered_penguins <- penguins %>% filter(body_mass_g > 4000)
head(filtered_penguins)

Example of filtering rows in a dataframe using logical vectors

Let’s analyze the previous code to see what’s happening under the hood: R defines a logical vector and applies it to the penguins dataframe:

# use base R to get the column we need: gives a logical vector
filter_vector <- penguins$body_mass_g > 4000

# check its structure
is.vector(filter_vector)
class(filter_vector) 

# use this vector to manually filter the dataframe using base R 
# df[rows, columns]: rows to keep, all columns
filtered_p <- penguins[filter_vector, ]

A particular vector: Scalar

In math a scalar is defined as a single real number but in R, a scalar is simply a vector of length one

Let’s try this code:

# set up a vector x of length 10
x <- sample(10)
x

# add 100 to x using the long way
x + c(100, 100, 100, 100, 100, 100, 100, 100, 100, 100)

# add 100 to x using the "R" way: vector recycling!
x + 100

Vector Recycling in R

Let’s see another example. We define two numeric vectors x1 and x2:

# x1 is sequence of numbers from 1 to 2
x1 <- seq(from = 1, to = 2)

# x2 is a sequence of numbers from 1 to 10
x2 <- seq(from = 1, to = 10)

# what happens if we add them?
x1 + x2

What happens? The shorter vector x1 is duplicated five times in order to match the length of the longer vector x2. The same behavior happens for other operations like subtraction, multiplication, logical comparison, etc.

Warning

If the shorter vector is not a multiple of the longer one, R will print a warning message!

Vector Recycling in R

Note

This behavior is called Vector Recycling and happens automatically in R: When two vectors of different lengths are used in an operation, R repeats (recycles) the elements of the shorter vector to match the longer vector!

Why It Matters: Vector recycling can cause unexpected results. Check if recycling is what you want R to do — if not, adjust the vector lengths manually, like this:

x1 <- c(1, 2, rep(0, 7))

x1 + x2

4. Subset Data Structures with Base R

Subsetting Data Structures in R

There are different ways to access parts of data structures in R.

We’ll cover:

Vectors
Matrices
Data Frames
Lists

Each structure uses square brackets [ ], but how you use them depends on the structure.

Subsetting Vectors

v <- c(10, 20, 30, 40)

v[2]         # 20 (2nd element)
v[c(1, 3)]   # 10 and 30
v[-1]        # all but the first
v[v > 25]    # elements greater than 25

Vectors are 1D — you just specify positions or logical tests in [ ]

Subsetting Matrices

m <- matrix(1:9, nrow = 3, byrow = TRUE)

m[1, 2]           # value at row 1, column 2
m[ , 3]           # entire 3rd column
m[2,  ]           # entire 2nd row
m[1:2, 2:3]       # top-right submatrix (rows 1–2, cols 2–3)
m[ , c(1, 3)]     # columns 1 and 3

Matrices are 2D: use [row, column]

Subsetting Data Frames

df <- data.frame(
  name = c("Alex", "Betty", "Chad", "J"),
  age = c(20, 21, 22, 23))

df[1, ]            # first row (all columns)
df[ , 1]           # first column by index
df[ , "name"]      # first column by label
df$name            # same as above
df[["name"]]       # same as above

df[1, 2]           # single value at row 1, column 2
df[2:3, "name"]    # values in rows 2 and 3 of column name

df[df$age > 20, ]            # filter rows where age > 20
df[df$name == "J", ]         # filter rows where name is "J"
df[df$age == max(df$age), ]  # filter rows with highest age

Data frames combine list and matrix behavior.

Subsetting Lists

my_list <- list(
  a = c(1, 2, 3),
  b = c("hello", "ciao"),
  c = TRUE
)

my_list[1]         # list with one element
my_list[[1]]       # contents of first element: 1 2 3
my_list[[1]][1]    # first value of that vector: 1
my_list["b"]       # list with one element named "b"
my_list$b          # same as my_list[["b"]], only works with names
my_list[["c"]]     # returns TRUE

Use [ ] to extract a sublist
Use [[ ]] or $ to extract actual elements

To print these slides as pdf

Click on the icon bottom-right corner > Tools > PDF Export Mode > Print as a Pdf