Process Description

Merge Gene Sets

The Merge Gene Sets process merges a .gmt file that has been downloaded from MSigDB or GSEA (Subramanian, Tamayo, et al. 2005) with either an input data set or annotation data set (which is merged with the input data set) containing either gene or probeset IDs, depending on the type of identifier in the gene set file.

Note: This process is considered experimental.

What do I need?

Either two or three input files are required to successfully run the Merge Gene Sets process:

•

The Input Data Set that contains either gene or probeset IDs. If the input data set does not contain these variables, the IDs can be merged in from an Annotation Data Set, using the Merge process, prior to the merge with the gene set file. The affylatin_norm_gene_symbol.sas7bdat input data set, used in this example, was generated by merging the affylatin_norm_probeset.sas7bdat data set, which contains normalized expression data from the Affymetrix Latin Square Data, with the u95a_trim.sas7bdat annotation data set, which identifies names and biological processes for the genes targeted by the probesets,. The affylatin_norm_gene_symbol.sas7bdat input data set contains the Probe_Set_ID column and the 59 sample columns from the affylatin_norm_probeset.sas7bdat data set and the annotation data, including gene symbols, from the u95a_trim.sas7bdat annotation data set. Each row represents one probeset. Each sample column lists intensity data for the 100 genes corresponding to the probesets. The affylatin_norm_gene_symbol.sas7bdat input data set is shown below:

Note: If the input data set does not contain annotation information, a separate Annotation Data Set is required. This data set contains information such as gene identity or chromosomal location, for each of the markers.

•

A GSEA gene set file. This required tab-delimited .gmt file provides a list of relevant gene sets. Gene set files can be downloaded from http://www.broadinstitute.org/gsea/msigdb/.

The affylatin_norm_probeset.sas7bdat input data set, and u95a_trim.sas7bdat annotation data set used to generate the affylatin_norm_gene_symbol.sas7bdat input data set are included in the JMP Genomics Sample Data folder. The msigdb.v3.1.symbols.gmt gene set file, shown below, representing the complete MSigDB database (5452 gene sets), was downloaded from the MSigDB website and saved to the MSigDB folder created in the JMP Genomics Sample Data folder.

For detailed information about the files and data sets used or created by JMP Genomics software, see Files and Data Sets.

Output/Results

Refer to the Merge Gene Sets output documentation for detailed descriptions of the output generated by this process and guides to interpreting your results.