Topics: Base R, Data Types, and Data Structures
These slides were last updated on July 24, 2025
When people say “base R”, they are referring to the core or basic features of the R language, developed before the tidyverse (hence the name as base R).
These include:
mean()
, length()
, etc. and key programming tools like loops, etc.
ggplot2
, dplyr
, tidyr
, forcats
, tidyr
, stringr
, etc.Tip
Use the tidyverse for most of your daily work, but understand enough base R to follow and troubleshoot R code!
Data Types and Data Structures are part of base R. They are the fundamentals — the “ABC” of how R stores and organizes data.
We look at each and at how they relate.
R has the following main data types (for more see here):
Numeric – numbers, either with or without decimals
- Double with decimals (default):
3.14
,2.0
- Integer whole numbers:
2L
,100L
Character – text or strings (always in quotes):
"hello"
Logical – TRUE or FALSE values:
TRUE
,FALSE
Factor – categorical values with levels:
"low"
,"medium"
R has the following main data structures (for more see here):
Vector – one-dimensional, all elements same type
v <- c(1, 2, 3)
Matrix – two-dimensional, all elements same type
m <- matrix(1:6, nrow = 2)
Dataframe – two-dimensional, columns can have different types
df <- data.frame(name = c("A", "B"), age = c(25, 30))
List – multi-dimensional, holds elements of any type, even mixed
l <- list(num = 1:3, name = c("Sabrina", "Laura"))
Every single element in R has a data type. That element is stored inside a larger data structure.
Examples:
x <- c(1, 2, 3) # a vector (structure) of three elements, all numeric values (type)
y <- list("a", 5) # a list (structure) of two elements, one character value and one numeric value (type)
The data structure you use affects what data types you can store:
Not all data structures support every combination of data types. Some allow only one type, while others can store many types together.
Structure | Dimensions | Allow Mixed Types? | Notes |
---|---|---|---|
Vector | 1D | No | All elements must be the same data type |
Matrix | 2D | No | All elements must be the same data type |
Dataframe | 2D | Yes (by column) | Each column is a vector; data types can vary by column |
List | Flexible | Yes | Each element can be anything: numbers, text, vectors, even data frames |
Run the code below to create different R objects. Then use class()
+ str()
or glimpse()
on each object to explore their data structure.
# vector
num_v <- c(1:9)
char_v <- c("hello", "ciao", "hey")
f_v <- factor(c("low", "medium", "high"))
test <- c("hello", 1, 2)
# matrix
num_m <- matrix(1:15, nrow = 3, ncol = 5, byrow = TRUE)
char_m <- matrix(c("a", "b", "c", "d"), nrow = 2, byrow = TRUE)
# dataframe or tibble
df <- data.frame(
id = 1:3,
name = c("Dave", "Ashley", "Rik"),
age = c(15, 17, 20))
# list
l <- list(
num_v = c(1:3),
m = matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE),
another_num_v = c(1,2,4),
char_v = c("Sabrina", "Zach"),
d = tibble(var_1 = c(1:4),
var_2 = c(2:5)))
So far, we’ve been working mostly with data frames — technically with tibbles, the tidyverse version of them. Data frames are the go-to structure for working with real-world, tabular data in R. Keep using them for most data analysis tasks in this course and beyond!
Strengths of data frames:
dplyr
, ggplot2
, and other tidyverse packagesBut it’s helpful to understand the full set of R data structures, because each has its own strengths:
Vectors: The most basic structure in R; all elements must be the same type.
You are analyzing survey responses and want to store all participants’ ages in a separate vector for further manipulation.
Matrices: Two-dimensional, all elements must be the same type.
You have numeric data that you want to analyze using matrix algebra or linear models.
Lists: Flexible containers that can hold anything — even other lists.
You are analyzing data and want to save multiple outputs (model results, plot, etc.) all in the same data structure.
Data Frames: Two-dimensional, columns can have different types.
You are analyzing survey responses with dplyr, ggplot, or other tidyverse packages.
Let’s analyze the previous code to see what’s happening under the hood: R defines a logical vector and applies it to the penguins
dataframe:
# use base R to get the column we need: gives a logical vector
filter_vector <- penguins$body_mass_g > 4000
# check its structure
is.vector(filter_vector)
class(filter_vector)
# use this vector to manually filter the dataframe using base R
# df[rows, columns]: rows to keep, all columns
filtered_p <- penguins[filter_vector, ]
In math a scalar is defined as a single real number but in R, a scalar is simply a vector of length one
Let’s try this code:
Let’s see another example. We define two numeric vectors x1
and x2
:
# x1 is sequence of numbers from 1 to 2
x1 <- seq(from = 1, to = 2)
# x2 is a sequence of numbers from 1 to 10
x2 <- seq(from = 1, to = 10)
# what happens if we add them?
x1 + x2
What happens? The shorter vector x1
is duplicated five times in order to match the length of the longer vector x2.
The same behavior happens for other operations like subtraction, multiplication, logical comparison, etc.
Warning
If the shorter vector is not a multiple of the longer one, R will print a warning message!
Note
This behavior is called Vector Recycling and happens automatically in R: When two vectors of different lengths are used in an operation, R repeats (recycles) the elements of the shorter vector to match the longer vector!
Why It Matters: Vector recycling can cause unexpected results. Check if recycling is what you want R to do — if not, adjust the vector lengths manually, like this:
There are different ways to access parts of data structures in R.
We’ll cover:
Each structure uses square brackets [ ]
, but how you use them depends on the structure.
v <- c(10, 20, 30, 40)
v[2] # 20 (2nd element)
v[c(1, 3)] # 10 and 30
v[-1] # all but the first
v[v > 25] # elements greater than 25
Vectors are 1D — you just specify positions or logical tests in [ ]
m <- matrix(1:9, nrow = 3, byrow = TRUE)
m[1, 2] # value at row 1, column 2
m[ , 3] # entire 3rd column
m[2, ] # entire 2nd row
m[1:2, 2:3] # top-right submatrix (rows 1–2, cols 2–3)
m[ , c(1, 3)] # columns 1 and 3
Matrices are 2D: use [row, column]
df <- data.frame(
name = c("Alex", "Betty", "Chad", "J"),
age = c(20, 21, 22, 23))
df[1, ] # first row (all columns)
df[ , 1] # first column by index
df[ , "name"] # first column by label
df$name # same as above
df[["name"]] # same as above
df[1, 2] # single value at row 1, column 2
df[2:3, "name"] # values in rows 2 and 3 of column name
df[df$age > 20, ] # filter rows where age > 20
df[df$name == "J", ] # filter rows where name is "J"
df[df$age == max(df$age), ] # filter rows with highest age
Data frames combine list and matrix behavior.
my_list <- list(
a = c(1, 2, 3),
b = c("hello", "ciao"),
c = TRUE
)
my_list[1] # list with one element
my_list[[1]] # contents of first element: 1 2 3
my_list[[1]][1] # first value of that vector: 1
my_list["b"] # list with one element named "b"
my_list$b # same as my_list[["b"]], only works with names
my_list[["c"]] # returns TRUE
[ ]
to extract a sublist[[ ]]
or $
to extract actual elementsClick on the icon bottom-right corner > Tools > PDF Export Mode > Print as a Pdf