The Gene Set Scoring process transforms a tall data set with
genes as rows into a tall data set with
pathways or categories as rows. The output data set consists of relevance scores for each pathway or category, computed based on individual deviations from a reference value for each gene. This data set can then be used as input to processes from the
Quality Control,
Pattern Discovery,
Row-by-Row Modeling, and
Predictive Modeling menus in order to perform category-based inference.
A third data set. the Annotation Data Set, is
optional unless the input data set contains no annotation data. This data set contains information such as gene identity or chromosomal location, for each of the markers. The
u95a_trim.sas7bdat annotation data set identifies identities and biological processes for the genes targeted by the probesets. A portion of this data set is illustrated below. This data set is a tall data set; each row corresponds to a different marker.
The affylatin_norm_probeset.sas7bdat input data set,
affylatin_exp.sas7bdat EDDS, and
u95a_trim.sas7bdat annotation
data set are included in the JMP Genomics
Sample Data folder.
Refer to the Gene Set Scoring output documentation for detailed descriptions of the output of this process.