introduction_to_gwas
Material for the Course “Introduction to genome-wide association studies (GWAS)”
Instructors: Filippo Biscarini, Oscar Gonzalez-Recio, Christian Werner
This course will introduce students, researchers and professionals to the steps needed to build an analysis pipeline for Genome-Wide Association Studies (GWAS). The course will describe all the necessary steps involved in a typical GWAS study, which will then be used to build a reusable and reproducible bioinformatics pipeline.
Each day the course will start at 14:00 and end at 20:00 (CET).
As a general rule, we’ll have a longer break (30 minutes) at 16:00 and two shorter breaks (10-15 minutes) later on during the day (to be decided flexibly depending on the sessions).
Day 1
- Lecture 0 General Introduction / Overview of the Course [Filippo, Oscar, Christian]
- Lecture 1 GWAS Overview: Case Studies / Examples from Literature [Oscar]
- Lecture 2 Introduction to GWAS: Linkage Disequilibrium and Linear Regression [Oscar]
- Lab 1 (Demonstration) GWAS: Basic Models (Linear and Logistic Regression) [Oscar]
- Lab 2 - Tidyverse Introduction [Christian]
- Lab 3 - Description of Datasets [Christian]
- Course Manual
- GWAS Workflow
Day 2
- Lecture 3 The Multiple Testing Issue [Oscar]
- Lecture 4 Statistical Power, Population Stratification and Experimental Design [Oscar]
- Lecture 5 Initial Data Analysis, Exploratory Data Analysis and Data Pre-Processing [Christian]
- Lab 3 GWAS: a first simple exercise for you! [Christian, Filippo]
Day 3
- Lab 4 Data filtering and mean/median imputation in R [Filippo]
- [filter_genotype_data.R]
- [mean_imputation.R]
- [median_imputation.R]
- Lab 5 GWAS: The Stand-Alone Script(s) for the Full Model [Filippo]
- [gwas_rrblup.R]
- [gwas_statgengwas.R]
- [gwas_sommer.R]
- Lecture 6 KNN Imputation
- Lab 6 (Demonstration) KNNI Imputation [Filippo]
- [knni_illustration.Rmd]
- [data_for_KNNI_illustration]
- [knni_tidymodels.R]
- [02_knni.sh] [support script]
- [hamming.R] [support script]
- [knni.R] [support script]
- Lecture 7 Data Types & Formats
- [Common Data Types and Formats]
- Lab 7 EDA & IDA and preprocessing with Plink [Christian]
- [Linux and the Shell]
- [Unix Cheatsheet]
- Lecture 8 Imputation of Missing Genotypes [Christian]
- Lab 8 Imputation of Missing Genotypes using Beagle [Christian]
Day 4
- Lecture 9 Brief Intermission:
- [R code PCA & Population Structure]
- [Imputed rice genotypes]
- Lab 9 Revising the Steps involved in GWAS [Filippo]
- [slides]
- [1.get_data.sh]
- [2.step_filtering.sh]
- [3.step_imputation.sh]
- [4.gwas.sh]
- Lab 10 Introducing the Exercise [Filippo]
- Collaborative Exercise: let’s build our own GWAS workflow on new data [Filippo, Oscar, Christian]
- Part 1: Individual/Group Break-Out Sessions to give it a try independetly
- Part 2: Whole-Group Revision of the Exercise
- Lecture 10 Bioinformatics Pipelines: a super-elementary Introduction [Filippo]
- [A bioinformatics pipeline for GWAS]
- Lab 11 Building a Pipeline with Snakemake [Filippo]
- Lab 12 The GWAS pipeline for Continuous Phenotypes [Filippo]
- Plug-In for Mean or KNN Imputation
- The GWAS pipeline for Binary Phenotypes (Guided Exercise) [Filippo]
Day 5
- Lecture 11 A light Touch on Post-GWAS Analysis: Inferring Functionality [Oscar]
- [slides]
- [R code. Exercise on R, and FUMA]
- Lecture 12 GWAS Model Extensions: [Filippo]
- [12.1 GWAS Model Extensions_Dominance_and_other_genotype_Codifications]
- [12.2 GWAS Model Extensions_Polyploids]
- [12.3 GWAS Model Extensions_Trait_Types]
- [12.4 GWAS Model Extensions_Multi-Trait-Locus, software]
- [R code GWASpoly (vignette)]
- [R code GWAS for categorical Traits]
- [R code GWAS for categorical Traits - Examples]
- [R code GWAS for longitudinal Traits]
- [R code GWAS for multi-trait and multi-locus Models]
- Final Quiz on what we learned about GWAS! [Filippo, Oscar, Christian]
- Conclusions and Wrap-Up Discussion on GWAS [Filippo, Oscar, Christian]
Organization of the Code for the practical Sessions
- the GWAS workflow in R
- preparatory_steps: download and prepare the data
- preprocessing: filter the data
- imputation: imputing missing genotypes
- gwas: run the GWAS models
- power_and_significance: designing GWAS experiments
- steps: identifying the individual steps involved in a GWAS study
- pipeline: assembling the individual steps into a bioinformatics pipeline for GWAS
- collaborative exercise: trying out what we learnt on new data