Process Description

Variable Gene Selection

The Variable Gene Selection process helps to select a subset of genes that exhibit high cell-to-cell variation in a Single-Cell RNA-Seq data set. It provides two methods, Dispersion and Variance-stabilizing transformation (VST), to calculate variability of the genes.

What do I need?

One data set is required.

An Input Data Set that contains all of the numeric data to be analyzed. The PMBC_dense.sas7bdat data set serves as an example, and is partially shown below. It has 32739 columns and 2700 rows. Note that this is a wide data set. Each row represents a single cell, identified by a bar code. Each column represents a gene with the numbers of copies of transcript for each cell.

Output/Results

Three data sets are generated by this process. These include raw UMI count data, log-normalized data, and std-standardized data. Refer to the Variable Gene Selection output documentation for descriptions of these data sets.