introduction_to_gwas

Material for the Course “Introduction to genome-wide association studies (GWAS)”

Instructors: Filippo Biscarini, Oscar Gonzalez-Recio, Christian Werner

This course will introduce students, researchers and professionals to the steps needed to build an analysis pipeline for Genome-Wide Association Studies (GWAS). The course will describe all the necessary steps involved in a typical GWAS study, which will then be used to build a reusable and reproducible bioinformatics pipeline.

Each day the course will start at 14:00 and end at 20:00 (CET). As a general rule, we’ll have a longer break (30 minutes) at 16:00 and two shorter breaks (10-15 minutes) later on during the day (to be decided flexibly depending on the sessions).

Day 1

Lecture 0 General Introduction / Overview of the Course [Filippo, Oscar, Christian]
- General Introduction
- GWAS Workflow (short)
Lecture 1 GWAS Overview: Case Studies / Examples from Literature [Oscar]
- GWAS Overview
Lecture 2 Introduction to GWAS: Linkage Disequilibrium and Linear Regression [Oscar]
- Introduction to GWAS
Lab 1 (Demonstration) GWAS: Basic Models (Linear and Logistic Regression) [Oscar]
- R code. Exercise on Simple Linear Regression
- Rmarkdown Code. Exercise on Simple Logistic Regression
Lab 2 - Description of Datasets [Christian]
- Description of Datasets
Course Manual
GWAS Workflow

Day 2

Lecture 3 The Multiple Testing Issue [Oscar]
- Multiple Testing
- R code. Exercise on multiple testing correction
Lecture 4 Statistical Power, Population Stratification and Experimental Design [Oscar]
- Statistical Power and Population Stratification
- R code. Exercise on statistical power
Lecture 5 Initial Data Analysis, Exploratory Data Analysis and Data Pre-Processing [Christian]
- Brief Genotyping overview
- IDA, EDA & Data Pre-Processing
Lab 3 GWAS: a first simple exercise for you! [Christian, Filippo]

Day 3

Lab 4 Data filtering and mean/median imputation in R [Filippo]
Lab 5 GWAS: The Stand-Alone Script(s) for the Full Model [Filippo]
Lecture 6 KNN Imputation
- KNN Imputation
Lab 6 (Demonstration) KNNI Imputation [Filippo] [OPTIONAL]
- knni_illustration.Rmd
- [data_for_KNNI_illustration]
- knni_tidymodels.R
- [02_knni.sh] [support script]
- [hamming.R] [support script]
- [knni.R] [support script]
Lecture 7 Working in the shell [Christian]
- Linux and the Shell [OPTIONAL]
- Common Data Types and Formats
Lecture 8 Imputation of Missing Genotypes [Christian]
- Imputation
Lab 7 Imputation of Missing Genotypes using Beagle [Christian]

Day 4

Lecture 9 Brief Intermission:
- R code PCA & Population Structure
- Imputed rice genotypes
Lab 8 Revising the Steps involved in GWAS [Filippo]
Lab 9 Introducing the Exercise [Filippo]
- Collaborative Exercise
Collaborative Exercise: let’s build our own GWAS workflow on new data. Pig (Sus scrofa) data. [Filippo, Oscar, Christian]
- Part 1: Individual/Group Break-Out Sessions to give it a try independetly
- Part 2: Whole-Group Revision of the Exercise: step-by-step (1.get_data; 2.filter; 3.imputation; 4.GWAS)
- exercise tips
Bonus exercise [Optional] (Parus major data)

Day 5

Lecture 10 A light Touch on Post-GWAS Analysis: Inferring Functionality [Oscar]
- slides
- R code. Exercise on R, and FUMA
Lecture 11 GWAS Model Extensions and Applications: [Filippo, Christian, Oscar]
- 12.1 GWAS Model Extensions_Dominance_and_other_genotype_Codifications
- 12.2 GWAS Model Extensions_Polyploids
  - [R code GWASpoly (vignette)]
- 12.3 GWAS Model Extensions_Trait_Types: categorical, longitudinal
- 12.4 GWAS Model Extensions Multi-Trait Multi-Locus models & software
- 12.5 A bioinformatic pipeline for GWAS
  - slides
- 12.6 Additional software for GWAS
  - gemma
  - regenie
- ROH-based and Resampling Methods as alternative approaches
- Applications of GWAS: Mendelian Randomization
Final Quiz on what we learned about GWAS! [Filippo, Oscar, Christian]
Conclusions and Wrap-Up Discussion on GWAS [Filippo, Oscar, Christian]

Organization of the Code for the practical Sessions

the GWAS workflow in R
preparatory_steps: download and prepare the data
preprocessing: filter the data
imputation: imputing missing genotypes
gwas: run the GWAS models
power_and_significance: designing GWAS experiments
steps: identifying the individual steps involved in a GWAS study
pipeline: assembling the individual steps into a bioinformatics pipeline for GWAS
collaborative exercise: trying out what we learnt on new data