Batch Scoring

The Batch Scoring process normalizes batch effect for input data based on a specified batch profile data set. This process is often used to normalize data from subsequent studies using a profile established for one or more prior studies.

Important: You must run the Batch Normalization process before running Batch Scoring to generate the required Batch Profile data set used in this process.

What do I need?

Three data sets are required for this process.

The first data set, the Input Data Set, contains all of the numeric data to be normalized. The lungcancer_2.sas7bdat data set, shown below, represents supplementary data collected subsequent to the processing of the initial lungcancer_1.sas7bdat data set using the Batch Normalization process. A total of 24 arrays representing animals treated or not treated (Control) with various dosages of different potentially anti-cancer agents and assessed for gene expression using 5000 different probesets. Data was collected from two additional studies. Note that this is a tall data set; each probe corresponds to one row whereas each column corresponds to a separate experimental condition.

The second data set is the Experimental Design Data Set (EDDS). This required data set tells how the experiment was performed, providing information about the columns in the input data set. Note that one column in the EDDS must be named ColumnName, and the values contained in this column must exactly match the column names in the input data set. The exp_design_2.sas7bdat experimental design data set is shown below.

The third required data set is the batch profile input data set. This data set provides the parameters used for the initial data set. These parameters are used for normalizing the subsequent data and estimating a new batch profile. The lungcancer_1_bns.sas7bdat batch profile data set is shown below.

The lungcancer_2.sas7bdat data set and exp_design_1.sas7bdat EDDS are contained in the SampleData\Microarray\Affymetrix Lung Cancer directory. The lungcancer_1_bns.sas7bdat batch profile data set was generated when the initial lungcancer_1.sas7bdat data set was processed as described in Batch Normalization.

For detailed information about the files and data sets used or created by JMP Life Sciences software, see Files and Data Sets.

Output/Results

Refer to the Batch Scoring output documentation for detailed descriptions of the output of this process.