The
Gene Set Scoring
process
transforms a tall data set with
genes
as rows into a tall data set with
pathways or categories
as rows. The output data set consists of relevance scores for each pathway or category, computed based on individual deviations from a reference value for each gene. This data set can then be used as input to processes from the
Quality Control
,
Pattern Discovery
,
Row-by-Row Modeling
, and
Predictive Modeling
menus in order to perform category-based inference.
A third data set. the
Annotation Data Set
, is
optional
unless the input data set contains no annotation data. This data set contains information such as gene identity or chromosomal location, for each of the markers. The
u95a_trim.sas7bdat
annotation data set
identifies identities and biological processes for the genes targeted by the probesets. A portion of this data set is illustrated below. This data set is a tall data set; each row corresponds to a different marker.
The
affylatin_norm_probeset.sas7bdat
input data set,
affylatin_exp.sas7bdat
EDDS, and
u95a_trim.sas7bdat
annotation
data set are included in the JMP Genomics
Sample Data
folder.
Refer to the
Gene Set Scoring
output documentation for detailed descriptions of the output of this process.