Process Description

Transpose Tall to Wide

Most of the processes in JMP Genomics assume that the input SAS data set has a particular data structure. JMP Genomics distinguishes between tall and wide SAS data sets. A tall SAS data set has samples as columns, and molecular entity (such as marker, gene, clone, protein, or metabolite) as rows. A wide SAS data set is the transpose of a tall data set, having the samples as rows and molecular entity as columns. When specifying the input SAS data set for a process, it is important to know the required form. Most of the processes associated with genetic analyses require a wide structure, whereas most of those for microarray and proteomics analyses use a tall structure.

The Transpose Tall to Wide process transposes a SAS data set in tall format, along with an accompanying experimental design table, into a data set in wide format in which the rows and columns are transposed.

Caution and Tip: Wide data sets can become unwieldy when there are thousands of columns. In this case, it can be more efficient to first use other analytical processes to reduce the number of rows in a tall input data set. For example, use Statistics for Columns to reduce the number of rows in the original data set by computing a summary statistic, K-Means Clustering to cluster the rows into similar groups, or Merge/DATA Step to extract a subset of the rows.

What do I need?

Two data sets are required to transpose tall to wide:

The tall Input Data Set to be transposed, such as the affylatin_norm.sas7bdat data set (found in the LifeSciences\Sample Data\Microarray\Affymetrix Latin Square directory included with JMP Genomics, and described in Affymetrix Latin Square Data) shown below. This is a tall data set with individual arrays listed in 59 data columns and probesets listed in the 1604 data rows.

The Experimental Design Data Set (EDDS). This required data set tells how the experiment was performed, providing information about the columns of the primary experimental data. The affylatin_exp.sas7bdat EDDS (found in the LifeSciences\Sample Data\Microarray\Affymetrix Latin Square directory included with JMP Genomics, and described in Affymetrix Latin Square Data) is shown below.

For detailed information about the files and data sets used or created by JMP Genomics software, see Files and Data Sets.

Output/Results

The wide SAS data set generated by this process is listed in a Results dialog that is displayed in a new window.

8      Click to view the output wide data set.

The affylatin_norm_wid.sas7bdat output wide data set is shown below.

The data has been transposed; there are now 1604 data columns and 59 data rows. In addition, the _wid suffix has been appended to the transposed filename.