Process Description

Distribution Analysis

The Distribution Analysis process displays univariate distribution results for variables in a SAS data set. It also optionally computes and overlays density estimates for the variables. Examination of the results of this process enables you to assess the quality of your data and evaluate the effectiveness of different quality control and normalization methods at preparing your data for analysis.

What do I need?

One Input Data Set is required for this process. This data set contains all of the numeric data to be analyzed. It must be in the tall format where each sample corresponds to one row and each column corresponds to a separate experimental condition or array.

The drosophilaaging_norm.sas7bdat data set is a normalized data set derived from the Drosophila Aging Experiment described in Drosophila Aging Experimental Data, and serves as an example. It has 49 columns and 100 rows.

An Experimental Design Data Set (EDDS) is optional. This data set tells how the experiment was performed, providing information about the columns in the input data set. Note that one column in the EDDS must be named ColumnName and the values contained in this column must exactly match the column names in the input data set.

The drosophilaaging_exp.sas7bdat EDDS serves as an example, and is shown below. Note that the ColumnName column lists the column names in the input data set. This column is a concatenation of several other columns detailing the experimental design.

The drosophilaaging_norm.sas7bdat and drosophilaaging_exp.sas7bdat data sets are included in the Sample Data folder that comes with JMP Genomics.

For detailed information about the files and data sets used or created by JMP Genomics software, see Files and Data Sets.

Output/Results

Refer to the Distribution Analysis output documentation for detailed descriptions of the output of this process.