The first data set, the
Input Data Set
, contains all of the numeric data to be normalized.
The
lungcancer_2.sas7bdat
data set, shown below, represents supplementary data collected subsequent to the processing of the initial
lungcancer_1.sas7bdat
data set using the
Batch Normalization
process. A total of 24 arrays representing animals treated or not treated (Control) with various dosages of different potentially anti-cancer agents and assessed for gene
expression
using 5000 different
probesets
. Data was collected from two additional studies.
Note that this is a
tall
data set; each
probe
corresponds to one row whereas each column corresponds to a separate experimental condition.
The second data set is the
Experimental Design Data Set (EDDS)
. This required data set tells how the experiment was performed, providing information about the columns in the input data set. Note that one column in the EDDS must be named
ColumnName
, and the values contained in this column must exactly match the column names in the input data set. The
exp_design_2.sas7bdat
experimental design data set is shown below.
The third required data set is the
batch profile input data set
. This data set provides the parameters used for the initial data set. These parameters are used for normalizing the subsequent data and estimating a new batch profile. The
lungcancer_1_bns.sas7bdat
batch profile data set is shown below.
The
lungcancer_2.sas7bdat
data set and
exp_design_1.sas7bdat
EDDS are contained in the
SampleData\Microarray\Affymetrix Lung Cancer
directory. The
lungcancer_1_bns.sas7bdat
batch profile data set was generated when the initial
lungcancer_1.sas7bdat
data set was processed as described in
Batch Normalization
.
Refer to the
Batch Scoring
output documentation for detailed descriptions of the output of this process.