Plink2 tutorial

By Ruzhang Zhao & Jingning Zhang, Department of Biostatistics, Johns Hopkins University

This document is a simple tutorial about how plink2 tool is used.

Tool Usage

For local usage: Go to the folder with plink2 and use it with ./plink2
For JHPCE usage: plink2 tool path is /dcl01/chatterj/data/tools/plink2

Example files download

Example file

Files: bed vs pgen

File differences:

plink 1 binary files (.bed .fam .bim)
--bfile to read .bed .bim .fam files.
plink 2 binary files (.pgen .psam .pvar)
--pfile or --bpfile to read .pgen .psam .pvar

Convert bed files to pgen files


./plink2 --bfile test --make-pgen --out test_pfile

General Filtering Criterion

--maf 0.01 (Filter out variants with low minor allele frequency)
--hwe 1e-08 (Hardy-Weinberg Equilibrium filteration)
--mach-r2-filter 0.8 2 (Filter out variants based on MaCH imputation quality metric)
--filter-males (e.g. only keep male samples)

Example of apply the criterion to get filtered files


./plink2 --bfile test --maf 0.01 --hwe 1e-08 --mach-r2-filter 0.8 2 --make-bed --out test_filter

More filtering criterion can refer to the plink2 manual.

GWAS Example

The example is run for Glucose and HBA1C

--extract : exclude SNPs not listed.
--keep : exclude samples not listed.
--snps-only : only use the SNPs in bed files or pgen files.
--pheno : phenotype data, need to be corresponding to the sample id.
--covar : covariance adjustment, sometimes adjust for the population stratification with genetic principal components.
--rm-dup exclude-all : exclude all duplicated SNPs
--glm : basic association analysis on quantitative and/or case/control phenotypes. (please use ./plink2 --help glm for more information or just refer to plink2 manual.)
hide-covar : not report the results on covariance
--covar-variance-standardize : standardized covariance, for example, age will be problematic without standardization.
(Notice the there is no "--" before hide-covar but there is "--" before covar-variance-standardize. These are two different kinds of commands following ./plink2)

Example code for perform GWAS use example files


./plink2  --extract snps_test.txt --keep eid_test.eid  --bfile test --snps-only --pheno pheno_test.txt --covar covar_test.txt --glm hide-covar --covar-variance-standardize --out GWAS_example

The format of the data snps_test.txt (the considered SNPs), eid_test.eid (the considered sample id), pheno_test.txt (sample phenotype), covar_test.txt (sample covariates).

Extract part of the big genotype data (used for fine mapping)

Sometimes, we need to focus on a certain region of SNPS. For example, we want to capture the region around the SNP rs4475691, like 1kb window.


./plink2 -bfile test --snp rs4475691 --window 1 --snps-only --make-bed --out region_example

In details,

--snp <variant ID> --window <total window size, in kb>

--snp specifies a single variant to load by name. If it's combined with --window, all variants with physical position no more than half the specified kb distance (decimal permitted) from the named variant are loaded as well.

Polygenic Risk Score (PRS) computation by plink


xxxxxxxxxx
plink2 --bfile 'bedfile' --score 'scorefile' cols=+scoresums,-scoreavgs --score-col-nums 'scorecolumns' --out 'outfile'


x
./plink2 --bfile test --score prs_score.txt cols=+scoresums,-scoreavgs --score-col-nums 4-6  --out prs_out

where prs_score.txt can be found in the Example file folder. The score columns denote the columns for scores. For example, in prs_score.txt , there are 3 columns representing different scores. The first three columns of prs_score.txt are SNPid(e.g. rsid), effect allele and alternative allele.