Introduction to R: Lecture 11

Topics: Writing Loops Part 2

Sabrina Nardin, Summer 2025

Agenda

  1. Using For Loops with Dataframes
  2. Alternatives to For Loops in R
  3. While Loops

These slides were last updated on July 31, 2025

1. Using For Loops with Dataframes

Same Task: With and Without a For Loop

library(tidyverse)
library(palmerpenguins)
data(penguins)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

Calculate the mean value of several columns without a loop

We can take the mean() function, and apply it to each column:

penguins %>% summarize(avg_bill_length = mean(bill_length_mm, na.rm = TRUE))
penguins %>% summarize(avg_bill_depth = mean(bill_depth_mm, na.rm = TRUE))
penguins %>% summarize(avg_flipper_length = mean(flipper_length_mm, na.rm = TRUE))
penguins %>% summarize(avg_body_mass = mean(body_mass_g, na.rm = TRUE))

This works — but requires a lot of copy/paste! How can we do the same thing with a for loop?

Automate the same task with a loop

First, initialize an empty vector to store results. Second, use a for loop to calculate the mean of each (numeric) column of this penguins dataframe.

output <- vector(mode = "double", length = ncol(penguins))

for (i in seq_along(penguins)) {
  #print(i)
  #print(penguins[i])
  output[i] <- mean(penguins[[i]], na.rm = TRUE)
}

output

2. Alternatives to For Loops in R

Why We Even Learn Loops in R?

R is a vectorized language, so many tasks can and should be done without writing a for loop.

Still, understanding loops is important:

  • to build foundational logic and control flow skills
  • for non-vectorized tasks, like row-by-row operations
  • to debug more complex workflows

Three Main Alternatives to For Loops in R

When working with dataframes, R provides the following alternatives that are often better than writing a for loop:

Function Family Package Best Used For Docs / Link
map_*() purrr Applying functions to elements of lists or columns map docs
across() dplyr Applying the same function to multiple columns inside mutate() or summarize() across docs
*apply() base R Row/column-wise operations on matrices or data frames apply docs

The asterisk * is a placeholder for a family of related functions (e.g., map_dbl(), map_chr(), sapply(), etc.)

1. Replacing for loops with “map()” functions in the tidyverse

The so-called “map functions” come from the purr package in R: https://purrr.tidyverse.org/reference/map.html

There are several map_*() functions each creates a different type of output (this is the same idea as in the for loop when we specify the mode of our output vector):

  • map() makes a list
  • map_lgl() makes a logical vector
  • map_int() makes an integer vector
  • map_dbl() makes a double vector
  • map_chr() makes a character vector

Let’s see a few examples using the penguins data.

1. Replacing for loops with “map()” functions in the tidyverse

Pick the appropriate map() function (there are several!) and specify at least two main arguments:

  • what you are iterating over
  • what you are calculating
penguins %>% 
  select(where(is.numeric)) %>%
  map_dbl(mean, na.rm = TRUE)

2. Replacing for loops with “across()” in the tidyverse

Another popular option to replace a for loop is the across() function from dplyr

What it does: applies the same operation (e.g., mean) to multiple columns at once

Best use: since it comes from dplyr, it works seamlessly inside dplyr verbs like mutate() and summarize() (these are its favorite verbs to work with!)

2. Replacing for loops with “across()” in the tidyverse

We looked at this example earlier (which calculates the mean of several columns in a data frame) and rewrote it using a for loop. Now we rewrite it again but this time using across():

penguins %>% summarize(avg_bill_length = mean(bill_length_mm, na.rm = TRUE))
penguins %>% summarize(avg_bill_depth = mean(bill_depth_mm, na.rm = TRUE))
penguins %>% summarize(avg_flipper_length = mean(flipper_length_mm, na.rm = TRUE))
penguins %>% summarize(avg_body_mass = mean(body_mass_g, na.rm = TRUE))

2. Replacing for loops with “across()” in the tidyverse

Rewrite previous example using across():

penguins %>% 
  summarize(
    across(
      .cols = where(is.numeric),           # select columns by type
      .fns = ~ mean(.x, na.rm = TRUE)))   # apply mean, skipping NA

The function across() has two main arguments:

  • .cols: the columns to operate on. You can select them by position, name, or type (in this example, by type using where(is.numeric)).
  • .fns: the function, or list of functions, to apply to each column (in this example, we use mean() and .x is a placeholder for the current column being processed)

You can omit .cols and .fns if you pass the arguments in the correct order.

2. Replacing for loops with “across()” in the tidyverse

Add names to the newly computed means:

penguins %>% 
  summarize(
    across(
      .cols = where(is.numeric),                   
      .fns = ~ mean(.x, na.rm = TRUE),             
      .names = "avg_{.col}"                   
    )
  )

What does "avg_{.col}" mean?

  • "avg_" is a literal prefix that will be added to each new column name.
  • {.col} is a placeholder that will be replaced with the original column name.

So if you’re applying mean() to a column named bill_length_mm, the result will be named avg_bill_length_mm

3. Replacing for loops with “apply()” functions in base R

Finally, a third option to replace a for loop in R is using one of the apply() functions from base R

What it does: applies the same operation (e.g., mean) to multiple columns at once

Best use: when working outside tidyverse verbs (this is from base R!) and can be applied across rows or columns (most common)

3. Replacing for loops with “apply()” functions in base R

We looked at this example earlier (which calculates the mean of several columns in a data frame) and rewrote it using a for loop. Now we rewrite it again but this time using apply():

penguins %>% summarize(avg_bill_length = mean(bill_length_mm, na.rm = TRUE))
penguins %>% summarize(avg_bill_depth = mean(bill_depth_mm, na.rm = TRUE))
penguins %>% summarize(avg_flipper_length = mean(flipper_length_mm, na.rm = TRUE))
penguins %>% summarize(avg_body_mass = mean(body_mass_g, na.rm = TRUE))

3. Replacing for loops with “apply()” functions in base R

Rewrite previous example using apply():

# manually select only numeric column using base R syntax
penguins_numeric <- penguins[ , c("bill_length_mm", "bill_depth_mm", 
                                  "flipper_length_mm", "body_mass_g")]

# apply mean to these columns
apply(penguins_numeric, 2, mean, na.rm = TRUE)  

Note the 2 stays for MARGIN = 2 (apply on the columns). If you write 1 it means rows. Type ?apply in your Console for more info.

3. Replacing for loops with “apply()” functions in base R

The apply() is a family meaning there are several similar apply function you can use:

Function Use Case Input Type Output Type Why Use It
apply() Apply a function to rows or columns Matrix, data frame Vector or matrix For row/column-wise operations
lapply() Apply a function to each element List, data frame List Keeps outputs as list (safe, flexible)
sapply() Like lapply() but simplifies output List, data frame Vector or matrix Shorter output, easier to read
vapply() Like sapply(), but type-safe List, data frame Vector or matrix Safer than sapply()—explicit type check

3. Replacing for loops with “apply()” functions in base R

# Use numeric columns from penguins dataset
num_data <- penguins[ , 3:6]

# apply(): function across columns (MARGIN = 2)
apply(num_data, 2, mean, na.rm = TRUE)

# lapply(): list output
lapply(num_data, mean, na.rm = TRUE)

# sapply(): simplified vector output
sapply(num_data, mean, na.rm = TRUE)

# vapply(): safe version of sapply()
vapply(num_data, mean, na.rm = TRUE, FUN.VALUE = numeric(1))
  • apply() works best with matrix-like structures
  • lapply() is list-safe; sapply() is more concise
  • vapply() avoids surprises by enforcing output type (here: return a single numeric value)

3. While Loops

Definition of While Loops

We focused on “for loops” because they are the most common, but R, and many other programming languages, also supports “while loops”

How while loops works:

  • Evaluate a condition
  • If the condition is TRUE, run the loop body
  • Start over: re-evaluate the condition
  • Repeat until the condition is FALSE (then the while loop is over)

While Loop Syntax

Syntax:

while (condition to be evaluated) {
  statement(s)
}

Example:

counter <- 1

while(counter <= 4) {
  print(counter)
  counter <- counter + 1
}

While Loop Examples

Take the previous code, but this time print counter also at the end:

counter <- 1

while(counter <= 3) {
  print(counter)
  counter <- counter + 1
  print(counter)
}

Why are the results different from the previous code?

While Loop Examples

Take the previous code, but this time we do not increment our counter variable.

counter <- 1
while(counter < 3){
  print(counter)
}

What is the output of this code?

While Loop Examples

What is the output of this code?

counter <- 1
while(counter < 4){
  print(counter)
  multiply <- counter * 100
  print(multiply)
  counter <- counter + 1
  print(counter)
}

While Loop Uses

While Loops are useful when you don’t know in advance how many times to iterate — you want the loop to continue until a condition is met.

Example use cases:

  • Keep looping until you get three heads in a row from random coin flips
  • Keep accepting user input until you reach a target number of responses

While loops require a “count variable” to be set outside the loop.

While loops are important but less common than for loops especially for the types of tasks we do in this course. For this reason, we don’t cover them in-depth.

To print these slides as pdf

Click on the icon bottom-right corner > Tools > PDF Export Mode > Print as a Pdf