Plink2 tutorial

By Ruzhang Zhao & Jingning Zhang, Department of Biostatistics, Johns Hopkins University

 

This document is a simple tutorial about how plink2 tool is used.


Tool Usage

  1. For local usage: Go to the folder with plink2 and use it with ./plink2
  2. For JHPCE usage: plink2 tool path is /dcl01/chatterj/data/tools/plink2

 


Example files download

Example file

 


Files: bed vs pgen

File differences:

  1. plink 1 binary files (.bed .fam .bim)

    --bfile to read .bed .bim .fam files.

  2. plink 2 binary files (.pgen .psam .pvar)

    --pfile or --bpfile to read .pgen .psam .pvar

  3. Convert bed files to pgen files

 


General Filtering Criterion

  1. --maf 0.01 (Filter out variants with low minor allele frequency)
  2. --hwe 1e-08 (Hardy-Weinberg Equilibrium filteration)
  3. --mach-r2-filter 0.8 2 (Filter out variants based on MaCH imputation quality metric)
  4. --filter-males (e.g. only keep male samples)

Example of apply the criterion to get filtered files

More filtering criterion can refer to the plink2 manual.

 


GWAS Example

The example is run for Glucose and HBA1C

  1. --extract : exclude SNPs not listed.

  2. --keep : exclude samples not listed.

  3. --snps-only : only use the SNPs in bed files or pgen files.

  4. --pheno : phenotype data, need to be corresponding to the sample id.

  5. --covar : covariance adjustment, sometimes adjust for the population stratification with genetic principal components.

  6. --rm-dup exclude-all : exclude all duplicated SNPs

  7. --glm : basic association analysis on quantitative and/or case/control phenotypes. (please use ./plink2 --help glm for more information or just refer to plink2 manual.)

    hide-covar : not report the results on covariance

    --covar-variance-standardize : standardized covariance, for example, age will be problematic without standardization.

    (Notice the there is no "--" before hide-covar but there is "--" before covar-variance-standardize. These are two different kinds of commands following ./plink2)

  8. Example code for perform GWAS use example files

  9. The format of the data snps_test.txt (the considered SNPs), eid_test.eid (the considered sample id), pheno_test.txt (sample phenotype), covar_test.txt (sample covariates).

 


Extract part of the big genotype data (used for fine mapping)

Sometimes, we need to focus on a certain region of SNPS. For example, we want to capture the region around the SNP rs4475691, like 1kb window.

In details,

--snp <variant ID> --window <total window size, in kb>

--snp specifies a single variant to load by name. If it's combined with --window, all variants with physical position no more than half the specified kb distance (decimal permitted) from the named variant are loaded as well.

 


 

where prs_score.txt can be found in the Example file folder. The score columns denote the columns for scores. For example, in prs_score.txt , there are 3 columns representing different scores. The first three columns of prs_score.txt are SNPid(e.g. rsid), effect allele and alternative allele.