Process Description

One-Way ANOVA

The analysis of large genomics data sets using mixed models or classic ANOVA can be computationally intensive, often taking hours to complete. Frequently, what is needed is a rapid assessment of the data, in which the presence of specific patterns or relationships can be observed. Once such patterns are discerned, the data set can be filtered, and the procedure appropriately designed to allow for a more complete yet more computationally intensive analysis.

The One-Way ANOVA process provides a rapid means for such an initial assessment of the data. It differs from the standard ANOVA process in a number of important ways. A comparison of the capabilities of both processes is shown in the table below.

Feature	ANOVA	One-Way ANOVA
SAS Analytical Process Used	PROC MIXED	SAS DATA Step
Model	Complex	One-Way Classification, Optionally Blocked
Ability to distinguish effects of variables independently and in combination	Yes	No
Speed	Moderate	Fast

Instead of looking at all possible effects and their interactions separately, One-Way ANOVA considers all of the combinations of different effects as distinct groups. Because of this, and unlike the standard ANOVA process, it does not and cannot examine each variable and combination of variables independently of each other. However, because its scope is more limited, One-Way ANOVA can handle very large data sets quickly and efficiently, identifying preliminary items of interest that can be further defined in subsequent and more thorough analyses.

What do I need?

Two data sets are required for this process.

The first, the Input Data Set, contains all of the numeric data to be analyzed. For most cases, the use of normalized data, in which global effects, such as dye, chip to chip variation, and so on, have been removed, is recommended. This data set must be a tall data set.

The second data set is the Experimental Design Data Set (EDDS). This required data set tells how the experiment was performed, providing information about the columns in the input data set. Note that one column in the EDDS must be named ColumnName and the values contained in this column must exactly match the column names in the input data set. Two other columns in this data set, Array, and Experiment, correspond to an index variable and the one-way experimental variable, respectively.

An Annotation Data Set can also be specified. This data set contains information, such as gene identity, accession numbers, chromosomal location, and so on, for each of the rows in the input data set. This data set is also in the tall format; where each row corresponds to a different gene.

For detailed information about the files and data sets used or created by JMP Genomics software, see Files and Data Sets.

Output/Results

The output generated by this process is summarized in a Tabbed report. Refer to the One-Way ANOVA output documentation for detailed descriptions and guides to interpreting your results.