W-test: dataset adaptive association test for main and interaction effects in GWAS data

  JC School of Public Health and Primary Care (JC SPHPC)  |  Faculty of Medicine |  The Chinese University of Hong Kong (CUHK)  |  Hong Kong SAR

 

 

 

Method

 

Download

 

Guide
      - R

      - C++

      - Linux

 

CCRB-Statgene

 

 

Guide for C++ software

 

Download the software (wtest.exe), test files (genotype.txt, phenotype.txt) in a local folder.

 

Step1: Double click the executable file. The following window will appear. Type “1”, and then press ENTER to read in dataset.

 

  

  

 

 Step 2. Read in data. Type the input genotype file name, for example, “genotype.txt”, press “ENTER”, and type the input phenotype file name “phenotype.txt”, press “ENTER”.

 

  

 

   

 

Step 3. The program will start calculating parameters h and f. If you would like to use the default setting, press “n” (no), and ENTER.  If you want to do manual setting, press “y”, and ENTER.  The program will prompt to ask you input the number of bootstrap times (B), and number of variables (p). 

 

Application note: 

- The program will use all the subjects in the input genotype file, and permutated phenotypes to estimate h and f. 

- The default setting is the estimate h and f by bootstrapping 200 times, and use 50 SNPs (variables).  For pairwise interaction effect, 50 SNPs will give 1225 pairs.

- For main effect calculation, it is suggested that set p = 1000, and bootstrap times (B) is greater than or equal to 200. 

- The criteria to choose these parameters are that the total number of subsets (singletons or pairs) is around 1000, subject number is above 1000, and bootstrap times (B) > 200.

 

Step 4A. After loading the datasets and estimating h and f, the program is automatically returned to the main menu with 3 Options. For main effect calculation, press “2” (Option 3.2), and ENTER. The program will ask you to indicate an output p-value threshold. All markers with p-value smaller than the threshold will be output. If p-value = 1, then all markers results will be output.  For this example, I typed p=0.05, and all markers with evaluated p-value less than 0.05 are output.

 

Then type an output file name, for example, “E1.txt”.  The results will be stored in your folder with this file name. The h and f used are printed on screen, for each k.

 

     

 

 

The output file has the format:

 

SNP    k    W            p.value

SNP46  3    75.5813     3.87025e-017

SNP32  3    33.0002     6.82485e-008

SNP17  3    18.1688     0.000113423

 

The 1st column is the SNP names of your input file. The second column is the non-empty categories number. 3rd column is the W-value, and the 4th column is the p-value of the W, calculated from the Chi-squared distribution with f degrees of freedom.

 

  In this example, f = 1.97. Close to the theoretical value (k-1).

 

Step 4B. For pairwise interaction calculation, type “3” and press ENTER in the main menu (Option 3). You have 3 choices to do the pairwise calculation, select one of them by typing “1”, “2”, or “3”, and then press ENTER.

 

 

 

Application note: 

- Choice 1: calculate pairwise interaction exhaustively using all markers in the input data. The maximum allowable dimension of the data depends on the computing power and memory of your computer. For a common desktop PC, the suggested dimension is N x P <2E9, where N is the number of subjects, and P is the number of SNPs. For such a large data calculation, the output p-value threshold can be chosen to be smaller, say, 0.0001 to avoid oversized file.  For exploration stage of the data, the user is suggested to estimate the computing time and output file size by Choice 2, before evoking exhaustive calculation.

 

  Option 3.1: Calculate pairwise interaction exhaustively using all input markers

  Type output file name, e.g. “E2.txt”, press ENTER.

  The program will print the h and f used on screen, as well as the total time used for the calculation.

 

 

The output file has the format: “E2.txt”

 

                       Pair effect                              1st SNP main effect                           2nd SNP main effect         

SNP1  SNP2   W      k  p.pair   W.snp1  k1      p.snp1   W.snp2   k2    p.snp2

SNP39 SNP46  153.5  9  3.6E-29  5.1 2     0.08      75.6    2     3.87E-17

SNP46 SNP47  119.7  9  3.7E-22  75.5      2     3.8E-17  3.5249   2     0.171624

 

 

The first 2 columns are the pairs name, column 3-5 are the pair’s W-value, k, and p-value.

Column 6-8 is the main effect information of the 1st SNP’s [W, k, and p-value].

Column 9-11 is the [W, k, and p-value] for the 2nd SNP in the pair.

 

 

Option 3.2: Calculate pairwise interaction exhaustively from main effect results.

 

This choice is for users who want to calculate interactions within a selected list of main effect markers, instead of exhaustive calculation of all data. It is often applied when there is some pre-screening step, for example, user wants to calculate pairwise effect only for the SNPs that main effect p-values are less than 0.05. Then the user can supply this list, in the same file format of the “Option 1- main effect calculation”, and perform the pairwise calculation.

 

Example:

 

In the previous “E1.txt”, we output the markers with p-value < 0.05. There are three SNPs in the output file. We want to calculate the pairwise interaction within this file.

 

 

 

The program will output the 3 possible pariwise combination in the designated output “E2a.txt”:

 

                      Pair effect                            1st SNP main effect                           2nd SNP main effect         

SNP1  SNP2   W      k  p.pair   W.snp1  k1      p.snp1   W.snp2   k2    p.snp2

SNP46 SNP32  108.9  9   6.2E-20  75.58   2      3.8E-17  33.0    2     6.8E-08

SNP32 SNP17  108.6  9   7.4E-20  33.0    2      6.8E-08  18.1    2     1.1E-04

SNP46 SNP17  100.4  9   3.4E-18  75.58   2      3.8E-17  18.17   2     1.1E-04

 

   Option 3.3: Calculate pairwise interaction on given pairs of markers

 

 This option allows user to calculate the pairwise interaction on any pairs in a batch file. With the following format, example file: pairs.to.calculate.txt

 

SNP1  SNP2

SNP39 SNP4

SNP46 SNP47

 

   Note that the first row is COLUMN NAME, not snps name. 

 

   Type the input and output file names in command window:

 

 

 

 The output file contains:

 

SNP1  SNP2   W      k  p.pair   W.snp1  k1      p.snp1   W.snp2   k2    p.snp2

SNP46 SNP47  119.7  9  3.7E-22  75.5      2     3.8E-17  3.5249   2     0.1716

SNP39 SNP4   11.42  9   0.179    5.09     2     0.0786   1.2469   2     0.5360

 

 

 

---- End of the C program tutorial ----------

 

 

 

  If you find any problems of the program, please drop a message to: maggiew@cuhk.edu.hk

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Copyright