Topics: Writing Loops Part 2
These slides were last updated on July 31, 2025
Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex <fct> male, female, female, NA, female, male, female, male…
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
We can take the mean()
function, and apply it to each column:
penguins %>% summarize(avg_bill_length = mean(bill_length_mm, na.rm = TRUE))
penguins %>% summarize(avg_bill_depth = mean(bill_depth_mm, na.rm = TRUE))
penguins %>% summarize(avg_flipper_length = mean(flipper_length_mm, na.rm = TRUE))
penguins %>% summarize(avg_body_mass = mean(body_mass_g, na.rm = TRUE))
This works — but requires a lot of copy/paste! How can we do the same thing with a for loop?
First, initialize an empty vector to store results. Second, use a for loop to calculate the mean of each (numeric) column of this penguins dataframe.
R is a vectorized language, so many tasks can and should be done without writing a for loop.
Still, understanding loops is important:
When working with dataframes, R provides the following alternatives that are often better than writing a for loop:
Function Family | Package | Best Used For | Docs / Link |
---|---|---|---|
map_*() |
purrr |
Applying functions to elements of lists or columns | map docs |
across() |
dplyr |
Applying the same function to multiple columns inside mutate() or summarize() |
across docs |
*apply() |
base R | Row/column-wise operations on matrices or data frames | apply docs |
The asterisk *
is a placeholder for a family of related functions (e.g., map_dbl()
, map_chr()
, sapply()
, etc.)
The so-called “map functions” come from the purr
package in R: https://purrr.tidyverse.org/reference/map.html
There are several map_*()
functions each creates a different type of output (this is the same idea as in the for loop
when we specify the mode
of our output vector):
map()
makes a listmap_lgl()
makes a logical vectormap_int()
makes an integer vectormap_dbl()
makes a double vectormap_chr()
makes a character vectorLet’s see a few examples using the penguins
data.
Pick the appropriate map()
function (there are several!) and specify at least two main arguments:
Another popular option to replace a for loop is the across()
function from dplyr
What it does: applies the same operation (e.g., mean) to multiple columns at once
Best use: since it comes from dplyr
, it works seamlessly inside dplyr
verbs like mutate()
and summarize()
(these are its favorite verbs to work with!)
We looked at this example earlier (which calculates the mean of several columns in a data frame) and rewrote it using a for loop. Now we rewrite it again but this time using across()
:
penguins %>% summarize(avg_bill_length = mean(bill_length_mm, na.rm = TRUE))
penguins %>% summarize(avg_bill_depth = mean(bill_depth_mm, na.rm = TRUE))
penguins %>% summarize(avg_flipper_length = mean(flipper_length_mm, na.rm = TRUE))
penguins %>% summarize(avg_body_mass = mean(body_mass_g, na.rm = TRUE))
Rewrite previous example using across()
:
The function across()
has two main arguments:
.cols
: the columns to operate on. You can select them by position, name, or type (in this example, by type using where(is.numeric)
)..fns
: the function, or list of functions, to apply to each column (in this example, we use mean()
and .x
is a placeholder for the current column being processed)You can omit .cols
and .fns
if you pass the arguments in the correct order.
Add names to the newly computed means:
What does "avg_{.col}"
mean?
"avg_"
is a literal prefix that will be added to each new column name.{.col}
is a placeholder that will be replaced with the original column name.So if you’re applying mean()
to a column named bill_length_mm
, the result will be named avg_bill_length_mm
Finally, a third option to replace a for loop in R is using one of the apply()
functions from base R
What it does: applies the same operation (e.g., mean) to multiple columns at once
Best use: when working outside tidyverse verbs (this is from base R!) and can be applied across rows or columns (most common)
We looked at this example earlier (which calculates the mean of several columns in a data frame) and rewrote it using a for
loop. Now we rewrite it again but this time using apply()
:
penguins %>% summarize(avg_bill_length = mean(bill_length_mm, na.rm = TRUE))
penguins %>% summarize(avg_bill_depth = mean(bill_depth_mm, na.rm = TRUE))
penguins %>% summarize(avg_flipper_length = mean(flipper_length_mm, na.rm = TRUE))
penguins %>% summarize(avg_body_mass = mean(body_mass_g, na.rm = TRUE))
Rewrite previous example using apply()
:
Note the 2
stays for MARGIN = 2
(apply on the columns). If you write 1 it means rows. Type ?apply
in your Console for more info.
The apply()
is a family meaning there are several similar apply function you can use:
Function | Use Case | Input Type | Output Type | Why Use It |
---|---|---|---|---|
apply() |
Apply a function to rows or columns | Matrix, data frame | Vector or matrix | For row/column-wise operations |
lapply() |
Apply a function to each element | List, data frame | List | Keeps outputs as list (safe, flexible) |
sapply() |
Like lapply() but simplifies output |
List, data frame | Vector or matrix | Shorter output, easier to read |
vapply() |
Like sapply() , but type-safe |
List, data frame | Vector or matrix | Safer than sapply() —explicit type check |
# Use numeric columns from penguins dataset
num_data <- penguins[ , 3:6]
# apply(): function across columns (MARGIN = 2)
apply(num_data, 2, mean, na.rm = TRUE)
# lapply(): list output
lapply(num_data, mean, na.rm = TRUE)
# sapply(): simplified vector output
sapply(num_data, mean, na.rm = TRUE)
# vapply(): safe version of sapply()
vapply(num_data, mean, na.rm = TRUE, FUN.VALUE = numeric(1))
apply()
works best with matrix-like structureslapply()
is list-safe; sapply()
is more concisevapply()
avoids surprises by enforcing output type (here: return a single numeric value)We focused on “for loops” because they are the most common, but R, and many other programming languages, also supports “while loops”
How while loops works:
TRUE
, run the loop bodyFALSE
(then the while loop is over)Syntax:
while (condition to be evaluated) {
statement(s)
}
Example:
Take the previous code, but this time print counter
also at the end:
Why are the results different from the previous code?
Take the previous code, but this time we do not increment our counter
variable.
What is the output of this code?
What is the output of this code?
While Loops are useful when you don’t know in advance how many times to iterate — you want the loop to continue until a condition is met.
While loops require a “count variable” to be set outside the loop.
While loops are important but less common than for loops especially for the types of tasks we do in this course. For this reason, we don’t cover them in-depth.
Click on the icon bottom-right corner > Tools > PDF Export Mode > Print as a Pdf