Introduction to R: Lecture 2

Topics: Git/GitHub with RStudio

Sabrina Nardin, Summer 2025

Agenda

  1. Motivation: Automation & Reproducibility
  2. Introduction to Git & GitHub
  3. Deeper Dive into Git & GitHub

1. Motivation

Use programming for data analysis, and use version control to automatically keep track of changes in your work?

Or use point-and-click software for data analysis, and save multiple copies of your work by hand to track changes?

Two Different Approaches

TASK: Write a report on the relationship between income and crime rates in Chicago.

APPROACH: Jane and Sally approach this task differently…

Jane: GUI Workflow

  1. Searches for data files online
  2. Cleans the data using Excel
  3. Analyzes the data in Excel (or similar)
  4. Writes her report in Google Docs
  5. Saves different versions of her work manually

Sally: Programmatic Workflow

  1. Searches for data files online
  2. Cleans the data using R
  3. Analyzes the data in R
  4. Writes her report in R Markdown
  5. Tracks changes automatically using Git

Two Main Advantages of a Programmatic Workflow

1. Automation

  • Uses programs (e.g., R) to perform tasks systematically
  • Reduces manual effort and human error
  • Enables fast and consistent repetition of analyses

2. Reproducibility

  • Scientific research should share both data and code used for analysis
  • Allows verification and reuse by others
  • Enables exact replication of results, even years later

2. Introduction to Git & GitHub

Often used together, but they are different tools

Git

  • Version control software
  • Runs locally on your computer
  • Tracks changes you make to files in a specific folder (that you decide), called local repository or repo

GitHub

  • Cloud-based platform
  • Runs online
  • Stores copies of your local Git repository online, and enables backup and collaboration, you push to and pull from GitHub

Let’s unpack all of this…

What is Version Control?

Version Control

Is a system that records every change you make to your files in a folder (what changed, who made it, and when) and lets you undo mistakes or restore previous versions of your files.

Git is a software that does version control.


Without Version Control

You have to track changes manually which be messy and hard to manage, like:

  • analysis-1.R
  • analysis-2.R
  • analysis-final.R
  • analysis-final-FINAL.R


How Version Control (Git) Works

  • You choose a folder on your computer: this becomes your local repository or local repo
  • A Version Control tool, like Git, records and saves changes you make to the files in that folder
  • Every change is saved with a time stamp, author info, and a message (that you type in)

You can revisit your project’s history and restore earlier versions of a file if needed!


How Version Control (Git) + GitHub Works

  • You can link your local Git repository to an online GitHub repository
  • So you can push your local changes to GitHub
  • And share your work with others


Git & GitHub Comparison

Feature Git GitHub
What it is Version control software Cloud platform for Git repos
Where it runs Locally (your computer) Online (web-based)
What it does Tracks changes in your local repo Stores & shares remote copies
Collaboration Not built-in Pull, issues, code reviews
Use in RStudio Git Tab Connect via setup or terminal
Repo Local repo = your folder Remote repo = copy on GitHub


Note: we use Git and GitHub via RStudio only. They are also used with other tools like GitHub Desktop, terminal, etc.


Okay, that was a lot of info. Let’s put it into practice!

We’ll go through it in two steps:

1. Configure your Setup: Set up Git, GitHub, and RStudio so everything works together

2. Try It Out in RStudio: Git & GitHub tutorial to practice using version control in RStudio

💻 Step 1: Configure your Setup

Complete

https://brinasab.github.io/csp-website/setup/setup-workbench.html

💻 Step 2: Use Git and GitHub in RStudio Tutorial

Once Step 1 is done, complete:

https://brinasab.github.io/csp-website/setup/setup-test.html


Instructions:

  • Work in pairs: one person (the most experienced with programming) read the instructions and guide the other, who executes the commands.

  • Keep track of your questions as you go and post them in this Google doc

  • Raise your hand if you need help!


Recap: Beginner-Friendly Git & GitHub Workflow

  1. Make & Save Changes Locally
    Edit your files locally (e.g., in RStudio or Workbench) and save them.

  2. Pull from GitHub
    Refresh your local copy to get the latest changes—helps avoid conflicts when collaborating.

  3. Stage Changes
    Select which files you want Git to track in the next snapshot.

  4. Commit with a Message
    Save a snapshot of your staged changes in your local Git repo. Add a short, meaningful message.

  5. Push to GitHub
    Upload your committed changes to the online repository.

Important

You will complete this workflow for all homework assignments!

Git Reminders

  • Stage and commit often
    Think of commits as snapshots of your work. Save, stage, and commit regularly during your workflow.

  • Write clear, useful commit messages
    Keep messages concise but descriptive. They should explain what changed and why. Many tips online for “commit message best practices.”

  • Push regularly, but not necessarily every time you commit Some people push every time they commit, while others push multiple commits at once. Experiment and find what works best for you.

  • Work locally, and push on GitHub
    Do not do the other way around (e.g., do not modify your GitHub repository directly; work on your local copy, and push changes to GitHub).

💻 Accept Homework 1

Accept Homework 1 from the course site and follow the instructions.

Note: this will only work if we have your GitHub username and you accepted the invite to our GitHub organization (which was sent to the email linked to your GitHub account)!

3. Deeper Dive into Git & GitHub


Come back here once you are familiar with the basic workflow! These slides won’t be included in the in-class quiz, but they might be handy if you run into problems in homework assigments.


What to Commit / Not to Commit

✅ What to Commit

  • Code files
  • Markdown or Quarto files
  • Small data files
  • README and documentation

🚫 What Not to Commit

  • Temporary files (e.g., .Rproj.user/)
  • Log or output files
  • Files with private information
  • Files larger than 100 MB


What Not to Commit goes in the “.gitignore” file

  • Tells Git what to ignore from your folder
  • Use a template (search for R-specific template, which works well for most class projects)


Git Large File Storage (Git LFS)

What if you need to track a file larger than 100 MB?

GitHub does not allow pushing files over 100 MB. If you have such a file, don’t try to push it directly.

Instead:

  • Use Git Large File Storage (Git LFS)
  • A separate tool that integrates with Git
  • Designed specifically for large files (e.g., datasets, media)
  • Note: Git LFS storage on GitHub often comes with usage limits and may require a paid plan

Git Conflicts

A Git conflict happens when Git doesn’t know which version of a file to keep.

Git gets confused because the same file was changed in two places.

Git conflicts are more common when working in shared repositories, and they shouldn’t occur when you’re working alone in your own repository (as you will in this course), but they still can happen…

When Do Conflicts Happen?

Collaborative Work
- You and a teammate edit the same file (or even the same line)
- You push changes without pulling the latest version from GitHub first
- Tip: Always run git pull before git push


Working Solo
- You make changes locally and also edit the same file directly on GitHub
- Git can’t tell which version to keep
- Tip: Always work in your local repository and use the workflow from the previous slide to push to Github

What Causes a Conflict?

A Git conflict may happen when Git finds two competing versions of the same file: one in your local repo (Git, on your computer) VS. one in the remote repo (GitHub, online)

The local and remote repos are expected to match. If they don’t, Git tries to merge them automatically.

Two Possible Outcomes

No conflict:
If the changes are in different parts of the file, Git merges automatically and no action is needed.

Conflict:
If the same part of the file was changed in both versions, Git cannot decide what to keep.
→ You must manually review and resolve the conflict.

What a Git Conflict Looks Like

When a conflict happens, Git marks the file with special lines to show the two versions:

<<<<<<< HEAD
This is your version (from your local repo)
=======
This is the other version (from GitHub or your teammate)
>>>>>>> main

To solve it, you need to use Git with the terminal (ask us for help if that occurs!)

Avoiding (Most) Git Conflicts

Rather than solving conflicts, we want to avoid them as much as possibile with good practices:

  • Make changes locally, then stage and commit (early and often)
  • Push regularly to keep your GitHub repo in sync
  • Always pull before you push to avoid conflicts

Key Sequence to Use:

git pull → make changes → git add + git commitgit push

Burn it All Down

Recap: What We Learned Today

  • Familiarized with Git & GitHub
  • How to use Git & GitHub within RStudio

Reminders

  • Homework 1 is now open and due next week. Check the Course Schedule on our website for the exact due date.
  • If you haven’t completed these yet, please do so ASAP, as you won’t be able to access and complete Homework 1 otherwise:
    • Register a GitHub account and submit your username (see Lecture 1)
    • Setup your computer (see today’s lecture)

To print these slides as pdf

Click on the icon bottom-right corner > Tools > PDF Export Mode > Print as a Pdf