P-Value Combination

Many statistical hypothesis testing methods produce p-values. A p-value is the probability of observing that a test statistic is as or more extreme than the one computed from the data assuming that the relevant null hypothesis is true. The null hypothesis usually represents no association or relationship, and so, smaller p-values represent more evidence that the null is not true. The scale of evidence here is based on a type of probabilistic modus tollens, or argument by contradiction. P-values are very often misinterpreted as the probability of the null hypothesis or as the probability of a false positive.

The P-Value Combination process combines p-values from multiple units (such as SNPs or genes) into a single p-value for each group (such as gene or pathway, respectively) that comprises those units. For genome-wide association studies (GWAS), this process can be performed twice: once to combine SNP p-values into a single gene p-value, then a second time to combine gene p-values into a pathway p-value.

What do I need?

One Input Data Set, containing the p-values of an arbitrary set of features (for example, genes, probesets, exons, markers) is required for this process. A second, optional, data set is the Annotation Data Set. This data set contains information, such as gene identity or chromosomal location, for each of the markers. An annotation data set is required only if the input data set does not contain the relevant annotation information.

For detailed information about the files and data sets used or created by JMP Life Sciences software, see Files and Data Sets.

Output/Results

Running the P-Value Combination process results in the generation of one output data set with two columns. One column lists the name for each of the units (genes, SNPs, and so on) in the experiment. The second column lists the p-values combined across each of the units.