By Ruzhang Zhao, Department of Biostatistics, Johns Hopkins University
This document is a simple tutorial about how plink2 tool is used.
plink 1 binary files (.bed .fam .bim)
--bfile to read .bed .bim .fam files.
plink 2 binary files (.pgen .psam .pvar)
--pfile or --bpfile to read .pgen .psam .pvar
Convert bed files to pgen files
Example of apply the criterion to get filtered files
More filtering criterion can refer to the plink2 manual.
The example is run for Glucose and HBA1C
--extract : exclude SNPs not listed.
--keep : exclude samples not listed.
--snps-only : only use the SNPs in bed files or pgen files.
--pheno : phenotype data, need to be corresponding to the sample id.
--covar : covariance adjustment, sometimes adjust for the population stratification with genetic principal components.
--rm-dup exclude-all : exclude all duplicated SNPs
--glm : basic association analysis on quantitative and/or case/control phenotypes. (please use
./plink2 --help glm for more information or just refer to plink2 manual.)
hide-covar : not report the results on covariance
--covar-variance-standardize : standardized covariance, for example, age will be problematic without standardization.
(Notice the there is no "--" before hide-covar but there is "--" before covar-variance-standardize. These are two different kinds of commands following ./plink2)
Example code for perform GWAS use example files
The format of the data
snps_test.txt (the considered SNPs),
eid_test.eid (the considered sample id),
pheno_test.txt (sample phenotype),
covar_test.txt (sample covariates).
Sometimes, we need to focus on a certain region of SNPS. For example, we want to capture the region around the SNP rs4475691, like 1kb window.
--snp specifies a single variant to load by name. If it's combined with --window, all variants with physical position no more than half the specified kb distance (decimal permitted) from the named variant are loaded as well.