The Control Set Normalization process normalizes data by subtraction of (the
mean or
median value of) control sample measurements or arrays, from experimental samples. The subtraction can be done either within subject (for example, replicate arrays per unique subjects) or across a set or sets of experimental subjects and control subjects.
Two data sets are required for this process. The first, the Input Data Set, contains all of the numeric data to be analyzed. The
drosophilaaging.sas7bdat data set from the
Drosophila aging experiment of Jin, et al. (2001) that is described in
Drosophila Aging Experimental Data serves as an example, and is shown below. It has 48 data columns and 100 rows. Note that this is a
tall data set; each
probe corresponds to one row whereas each column corresponds to a separate experimental condition.
The second data set is the Experimental Design Data Set (EDDS). The
drosophilaaging_exp.sas7bdat EDDS serves as an example, and is shown below. This required data set tells how the experiment was performed, providing information about the columns in the input data set. Note that one column in the EDDS must be named
ColumnName and the values contained in this column must exactly match the column names in the input data set.
Refer to the Control Set Normalization output documentation for detailed descriptions of the output of this process.