HW02: Exploring Data

Overview

Due: Thursday, July 17th by 11:59 PM (Chicago Time)

In the first assignment you have demonstrated knowledge of your software setup, Git/GitHub via RStudio, and R Markdown.

The goal of this second assignment is to practice transforming and exploring data with dplyr and ggplot2.

Accessing and Cloning Your hw02 Repository

Important

This will work only if you have submitted your GitHub username (see Lecture 1) and accepted the email invitation to join our GitHub organization (check the email linked to your GitHub account). Make sure you’ve completed the setup under Software Setup and confirmed everything is working properly.

Accessing the repo:

  1. Go at this link to accept the invitation and create your private hw02 repository on GitHub. Your repository will be named hw02-<USERNAME> and will be ready within a few seconds. If this does not work, see the note in red above.

  2. Once the repository has been created, click the link provided to access it.

Cloning the repo into your Workbench:

  1. Log into Workbench and start a new project:
    File > New Project > Version Control > Git

  2. In the Repository URL field:

    • Paste the URL of your GitHub repository. To find the URL, go to your hw02 repo on GitHub, click the green Code button, and copy the SSH or HTTPS link (use the method you configured during setup: Workbench users must select SSH).
    • The project name should auto-fill. If not, type hw02-<USERNAME> manually.
  3. Choose where to save the project directory locally.
    Tip: Use a dedicated folder such as homeworks to keep things organized and avoid moving folders after cloning them

  4. Check the box “Open in new session” (optional but recommended for keeping projects separate and organized)

  5. Click Create Project. This will:

    • Create a new folder on your machine
    • Initialize a Git repository linked to GitHub
    • Open an RStudio Project in a new session

General Homework Workflow

See Homework 1 for details. The workflow is the same for all assignments (e.g., accept the repo, edit the files, save, then Pull – Stage – Commit – Push).

Assignment Description

The mass shooting dataset

The United States experiences far more mass shooting events than any other developed country in the world. Policymakers, politicians, the media, activists, and the general public acknowledge the widespread prevalence of these events. However, effective policies to prevent such incidents should be grounded in empirical data.

In July 2012, in the aftermath of a mass shooting in a movie theater in Aurora, Colorado, Mother Jones published a report on mass shootings in the United States since 1982. Importantly, they provided the underlying data set as an open-source database for anyone interested in studying and understanding the topic.

This dataset is already installed on Workbench.

Answer the questions

Your repository includes specific and open-ended questions. Answer all of them.

Note
  • The questions—especially the open-ended ones—are designed to help you think like a university student and future researcher. They won’t walk you through every step but encourage you to apply what you’ve learned and work independently.

  • Plan to complete this over multiple sessions. Start early, and remember to save, commit, and push your work regularly.

  • Do not use AI tools to generate R code from scratch. You may use it to debug (after trying on your own) or look up how functions work (check course materials first). Only submit code you fully understand.

Formatting Guide

Formatting graphs

While you are practicing Exploratory Data Analysis, your final graphs should be appropriate for sharing with outsiders. That means your graphs should have:

  • A title
  • Labels on the axes (type ?labs in your Console for details)

Consider adopting your own color scales, taking control of your legends (if any), playing around with themes, and generally customizing your graphs to improve their visual appeal and clarity.

Formatting tables

When presenting tabular data (using dplyr::summarize()), consider using the kable() function from the knitr package to format the table for the final document. Here is how you use this function.

The code below displays a basic table summarizing where gun deaths occurred:

# calculate total gun deaths by location
count(mass_shootings, location_type)
# A tibble: 6 × 2
  location_type     n
  <chr>         <int>
1 Airport           1
2 Military          6
3 Other            49
4 Religious         6
5 School           18
6 Workplace        45

This is the same code but we added kable() to format the table, add a caption, and label the columns. Observe what it does:

count(mass_shootings, location_type) %>%
  kable(
    caption = "Mass shootings in the United States by location",
    col.names = c("Location", "Number of incidents")
  )
Mass shootings in the United States by location
Location Number of incidents
Airport 1
Military 6
Other 49
Religious 6
School 18
Workplace 45

Run ?kable in your Console for additional options

Submit the Assignment

To submit your assignment on Canvas, follow these steps:

  1. Before the deadline, push the final version of your work to your GitHub repository. Tip: don’t wait until the end to make your first commit—commit, stage, and push your work regularly as you go.

    Your GitHub repository should include (make sure to stage-commit-push all these files):

    • mass-shootings.Rmd: main file you will work on
    • mass-shootings.md: you will generate this file from the .Rmd by simply knitting it, like you did for HW01 (do not modify the output format)
    • mass-shootings_files/: this folder contains all the graphs that you generated in your .Rmd
  2. When you’re ready to submit:

    • Copy the URL of your GitHub repository. It will look something like: https://github.com/csp-summer25/hw02-yourusername
    • Submit that URL on Canvas under HW02. Do not upload any files to Canvas. We only need the repository link.

Assessment

All homework assignments are evaluated using a rubric (for more details, see the Assessment section of the Syllabus). Below are further guidelines for this specific homework to help you assess your work before submitting it.

In the past, “Excellent” or “Very Good” work included submissions that:

  • Completed all parts of the assignment correctly and accurately.
  • Included code that:
    • Ran without errors
    • Followed good style (clean, readable, well-organized)
    • Was well-documented but not over-commented
    • Used appropriate complexity and matched the prompt and course content
    • Demonstrated understanding of key concepts and functions
  • Included graphs and tables that were:
    • Well-chosen for the variable types
    • Clearly labeled, visually polished, and customized beyond defaults
    • Interpreted in a clear and concise analysis
  • Showed strong use of:
    • Required packages, showing curiosity to use them beyond the basics
    • R Markdown syntax (e.g., no rendering errors, appropriate use of syntax, etc.)
  • Followed good GitHub practices:
    • Multiple informative commits showing progress over time
    • All required files present and displaying correctly, including rendered graphs