W-test: dataset adaptive association test for main and interaction
effects in GWAS data |
|
||
|
|||
|
Guide - C++ - Linux |
Guide for C++ software Download the software (wtest.exe),
test files (genotype.txt, phenotype.txt) in a local
folder. Step1: Double click the executable file. The following window will appear.
Type “1”, and then press ENTER to read in dataset.
Step 2. Read in data.
Type the input genotype file name, for example, “genotype.txt”, press “ENTER”,
and type the input phenotype file name “phenotype.txt”, press “ENTER”.
Step 3. The program will
start calculating parameters h and f. If you would like to use the default
setting, press “n” (no), and ENTER. If
you want to do manual setting, press “y”, and ENTER. The program will prompt to ask you input
the number of bootstrap times (B), and number of variables (p). Application note: - The program will
use all the subjects in the input genotype file, and permutated phenotypes to
estimate h and f. - The default
setting is the estimate h and f by bootstrapping 200 times, and use 50 SNPs
(variables). For pairwise interaction
effect, 50 SNPs will give 1225 pairs. - For main effect
calculation, it is suggested that set p = 1000, and bootstrap times (B) is
greater than or equal to 200. - The criteria to
choose these parameters are that the total number of subsets (singletons or
pairs) is around 1000, subject number is above 1000, and bootstrap times (B)
> 200. Step 4A. After loading the
datasets and estimating h and f, the program is automatically returned to the
main menu with 3 Options. For main effect calculation, press “2” (Option
3.2), and ENTER. The program will ask you to indicate an output p-value
threshold. All markers with p-value smaller than the threshold will be
output. If p-value = 1, then all markers results will be output. For this example, I typed p=0.05, and all
markers with evaluated p-value less than 0.05 are output. Then
type an output file name, for example, “E1.txt”. The results will be stored in your folder
with this file name. The h and f used are printed on screen, for each k.
The
output file has the format: SNP k W p.value SNP46 3 75.5813 3.87025e-017 SNP32 3 33.0002 6.82485e-008 SNP17 3 18.1688 0.000113423
The
1st column is the SNP names of your input file. The second column is the
non-empty categories number. 3rd column is the W-value, and the 4th
column is the p-value of the W, calculated from the Chi-squared distribution
with f degrees of freedom. In this example, f = 1.97. Close to the
theoretical value (k-1). Step 4B. For pairwise
interaction calculation, type “3” and press ENTER in the main menu (Option
3). You have 3 choices to do the pairwise calculation, select one of them by
typing “1”, “2”, or “3”, and then press ENTER.
Application note: - Choice 1: calculate pairwise interaction exhaustively
using all markers in the input data. The maximum allowable dimension of the
data depends on the computing power and memory of your computer. For a common
desktop PC, the suggested dimension is N x P <2E9, where N is the number
of subjects, and P is the number of SNPs. For such a large data calculation,
the output p-value threshold can be chosen to be smaller, say,
0.0001 to avoid oversized file. For
exploration stage of the data, the user is suggested to estimate the
computing time and output file size by Choice 2, before evoking exhaustive
calculation. Option
3.1: Calculate pairwise interaction exhaustively using all input markers Type output file name, e.g. “E2.txt”, press
ENTER. The program will print the h and f used on
screen, as well as the
total time used for the calculation.
The
output file has the format: “E2.txt”
Pair effect 1st SNP main effect 2nd SNP main
effect SNP1
SNP2 W k
p.pair W.snp1 k1 p.snp1 W.snp2 k2 p.snp2 SNP39 SNP46
153.5 9 3.6E-29
5.1 2 0.08 75.6 2 3.87E-17 SNP46 SNP47
119.7 9 3.7E-22
75.5 2 3.8E-17
3.5249 2 0.171624 … The
first 2 columns are the pairs name, column 3-5 are the pair’s W-value, k, and
p-value. Column
6-8 is the main effect information of the 1st SNP’s [W, k, and p-value]. Column
9-11 is the [W, k, and p-value] for the 2nd SNP in the pair. Option 3.2: Calculate
pairwise interaction exhaustively from main effect results. This
choice is for users who want to calculate interactions within a selected list
of main effect markers, instead of exhaustive calculation of all data. It is
often applied when there is some pre-screening step, for example, user wants
to calculate pairwise effect only for the SNPs that main effect p-values are
less than 0.05. Then the user can supply this list, in the same file format
of the “Option 1- main effect calculation”, and perform the pairwise
calculation. Example: In
the previous “E1.txt”, we output the markers with p-value < 0.05. There
are three SNPs in the output file. We want to calculate the pairwise
interaction within this file.
The
program will output the 3 possible pariwise combination in the designated
output “E2a.txt”:
Pair effect 1st SNP main effect 2nd SNP main
effect SNP1
SNP2 W k
p.pair W.snp1 k1 p.snp1 W.snp2 k2 p.snp2 SNP46 SNP32
108.9 9 6.2E-20 75.58 2
3.8E-17 33.0 2 6.8E-08 SNP32 SNP17
108.6 9 7.4E-20 33.0 2
6.8E-08 18.1 2 1.1E-04 SNP46 SNP17
100.4 9 3.4E-18 75.58 2
3.8E-17 18.17 2 1.1E-04
Option 3.3: Calculate pairwise interaction on given pairs of markers This option allows user to calculate the pairwise interaction on any pairs in a batch file. With the following format, example file: pairs.to.calculate.txt SNP1 SNP2 SNP39 SNP4 SNP46 SNP47 Note that the first row is COLUMN NAME, not snps name.
Type the input and output file names in
command window:
The
output file contains: SNP1
SNP2 W k
p.pair W.snp1 k1 p.snp1 W.snp2 k2 p.snp2 SNP46 SNP47
119.7 9 3.7E-22
75.5 2 3.8E-17
3.5249 2 0.1716 SNP39 SNP4
11.42 9 0.179 5.09 2 0.0786 1.2469 2 0.5360 ---- End of the C
program tutorial ----------
If you find any problems of the program, please drop a
message to: maggiew@cuhk.edu.hk |
|
|
|||
|
|
||
|
Copyright |
||