Working with data

Overview

  • Originally presented: Day 2, June 2, 2026
  • Lead: Lead: Elizabeth Prom-Wormley
  • Topics: assumptions, QC, imputation, PLINK and R basics, phenotype distributions, and transformations

Additional Reference Reading

  • Anderson et al 2010 Nat Protoc Data quality control in genetic case-control association studies
  • Winkler et al 2014 Nat Protoc; Quality control and conduct of genome-wide association meta-analyses
  • Uffemann et al 2021 Nature Reviews Methods Primers; Genome-wide association studies (great overall summary with great tables, boxes and a list of references to specific topics)

Lectures

This lecture series can be viewed as a .

Quality control


Imputation


Plink 101

Ìý

Ìý

This series of lectures can be viewed as a .

Some students really enjoy using Swirl to learn R

Students are also welcome to walk through a 4-part introduction with videos and the accompanying scripts/data. This series of videos is appropriate for learners who haven’t had prior exposure to R (or limited exposure) and who want to prepare to successfully participate in hands-on-activities throughout the workshop. By the end of the videos, learners will be able to produce basic summary statistics with phenotypic data that would typically be conducted prior to GWAS.

R Basics: Downloading and installing R and RStudio (optional)

Installing R and RStudio is NOT necessary to complete the workshop exercises


Finding, opening, and reviewings files in R


Data management in R


Graphics and basic statistics in R

Practicals