By Ruzhang Zhao & Jingning Zhang, Department of Biostatistics, Johns Hopkins University
This document is a simple tutorial about how plink2 tool is used.
./plink2
/dcl01/chatterj/data/tools/plink2
File differences:
plink 1 binary files (.bed .fam .bim)
--bfile to read .bed .bim .fam files.
plink 2 binary files (.pgen .psam .pvar)
--pfile or --bpfile to read .pgen .psam .pvar
Convert bed files to pgen files
./plink2 --bfile test --make-pgen --out test_pfile
Example of apply the criterion to get filtered files
./plink2 --bfile test --maf 0.01 --hwe 1e-08 --mach-r2-filter 0.8 2 --make-bed --out test_filter
More filtering criterion can refer to the plink2 manual.
The example is run for Glucose and HBA1C
--extract : exclude SNPs not listed.
--keep : exclude samples not listed.
--snps-only : only use the SNPs in bed files or pgen files.
--pheno : phenotype data, need to be corresponding to the sample id.
--covar : covariance adjustment, sometimes adjust for the population stratification with genetic principal components.
--rm-dup exclude-all : exclude all duplicated SNPs
--glm : basic association analysis on quantitative and/or case/control phenotypes. (please use ./plink2 --help glm
for more information or just refer to plink2 manual.)
hide-covar : not report the results on covariance
--covar-variance-standardize : standardized covariance, for example, age will be problematic without standardization.
(Notice the there is no "--" before hide-covar but there is "--" before covar-variance-standardize. These are two different kinds of commands following ./plink2)
Example code for perform GWAS use example files
./plink2 --extract snps_test.txt --keep eid_test.eid --bfile test --snps-only --pheno pheno_test.txt --covar covar_test.txt --glm hide-covar --covar-variance-standardize --out GWAS_example
The format of the data snps_test.txt
(the considered SNPs), eid_test.eid
(the considered sample id), pheno_test.txt
(sample phenotype), covar_test.txt
(sample covariates).
Sometimes, we need to focus on a certain region of SNPS. For example, we want to capture the region around the SNP rs4475691, like 1kb window.
./plink2 -bfile test --snp rs4475691 --window 1 --snps-only --make-bed --out region_example
In details,
--snp <variant ID> --window <total window size, in kb>
--snp specifies a single variant to load by name. If it's combined with --window, all variants with physical position no more than half the specified kb distance (decimal permitted) from the named variant are loaded as well.
xxxxxxxxxx
plink2 --bfile 'bedfile' --score 'scorefile' cols=+scoresums,-scoreavgs --score-col-nums 'scorecolumns' --out 'outfile'
x./plink2 --bfile test --score prs_score.txt cols=+scoresums,-scoreavgs --score-col-nums 4-6 --out prs_out
where prs_score.txt
can be found in the Example file folder. The score columns denote the columns for scores. For example, in prs_score.txt
, there are 3 columns representing different scores. The first three columns of prs_score.txt
are SNPid(e.g. rsid), effect allele and alternative allele.