PCA for Population Stratification

Running this process using the GeneticMarkerExample sample setting generates the tabbed Results window shown below. Refer to the PCA for Population Stratification process description for more information. Output from the process is organized into tabs. Each tab contains one or more plots, data panels, data filters, and so on. that facilitate your analysis.

•

PCA 2D Row Scores: The PCA 2D Rows Scores tab shows 2-D scatterplot matrix of correlations between principal components and other visualizations of the PCs, with points/lines colored and/or labeled by the Color and Label Variables, respectively. This tab is generated only when the Display principal components plots check box has been checked

•

PCA 3D Row Scores: The PCA 3D Rows Scores tab shows a 3-D scatterplot of three of the principal components, with points colored and/or labeled by the Color and Label Variables, respectively. This tab is generated only when the Display principal components plots check box has been checked

•

Scree Plot: This tab displays a plot of the eigenvalue for the ith component versus i to show the proportion of variation explained by the principal components. This tab is generated only when the Display principal components plots check box has been checked

•

Summary Chart: When there are multiple annotation groups (chromosomes or genes, for example), this tab displays the number of significant markers in each annotation group for each test. Separate bar charts are shown for each BY group when any BY variables are specified. This tab is open by default.

•

Manhattan Plot: When there are multiple annotation groups (chromosomes or genes, for example), this tab displays a scatter plot of the p-values across all annotation groups.

•

All P-Value Plots: When there are multiple annotation groups (chromosomes or genes, for example), a separate Results tab with an overlay plot of p-value by chromosome location is created for each annotation group. If the Calculate trend odds ratios check box was checked, this tab also contains a Volcano Plot of p-value by log odds ratio for all markers.

•

All P-Value Plots: When there are multiple annotation groups (chromosomes or genes, for example), the All P-Value Plots tab shows all the p-value plots from the Annotation Group Results tabs in a single display.

Note: When an annotation group variable is not specified or there is only one annotation group, the tab is named P-Value Plot and contains an overlay plot of p-value by chromosome location for all markers.

•

All Trends Odds Ratio Plots: : If the Calculate trend odds ratioss check box was checked and there are multiple annotation groups (chromosomes or genes, for example), this tab shows all the odds ratio volcano plots.

•

Volcano Plot(s): This tab displays a scatter plot of p-value by the Estimate of Minor Allele Genotype Effect for all markers, colored by Annotation Group, when the trend test is performed. When the Output genotype LS means and diffs box is checked, this tab includes scatter plots of p-value by the LS diffs between genotypes 0 and 1, and genotypes 0 and 2.

•

SAS Output : This is a text-based output directly from SAS/STAT PROC PRINCOMP and provides detailed statistics on the principal components analysis. Refer to the documentation for SAS PROC PRINCOMP for more information.

•

Create Subset Genotype and Annotation Data Sets: Select points from the p-value plots and click Create Subset Genotype and Annotation Data Sets to open the Subset and Reorder Genetic Data process to create the subset data sets.

•

Plot Trait by Genotype: Select markers from the p-value plots and click Plot Trait by Genotype to view each marker's genotype distribution for each of the Trait Variables values.

•

View Venn Diagram of Significant Markers by Trait for the Test Below: Click either Genotype or Trait to view a Venn diagram showing significant association between markers and multiple traits as determined by the specific association test.

•

PCA Data Set: This data set contains the eigenvectors for each of the principal components. The name of this data is set is given by the Output File Prefix, or input data set name if none given, with the suffix _pca. Click Open to view the data set.

•

EigenCorr Data Set: this data set contains the correlation statistics between each principal component and trait variable and is generated when the Perform EigenCorr to select PCs check box is checked. The name of this data is set is given by the Output File Prefix, or input data set name if none given, with the suffix _pce. Click Open to view the data set.

•

Merged Data Set: When the Create merged PCA output data set check box is checked, this data set contains the columns from the PCA output data set merged with the input data set. The name of this data is set is given by the Output File Prefix, or input data set name if none given, with the suffix _pcm.

•

Trend Parameter Estimate Data Set: This data set contains the estimates and test statistics for the fixed effects included in each regression model testing for association, including the numeric marker genotype treated as a continuous variable, and is generated when the Trend test is performed. The name of this data is set is given by the Output File Prefix, or input data set name if none given, with the suffix _pet. Click Open to view the data set.

•

P-value Data Set: This data set contains all the columns from the annotation data set, plus the test statistics and p-values from the tests performed. This data set can be used as the annotation data set for subsequent processes to accumulate results from multiple processes into a single data set. The name of this data is set is given by the Output File Prefix, or input data set name if none given, with the suffix _sta. Click Open to view the data set.

•

Click Reopen Dialog to reopen the completed process dialog used to generate this output.

•

Click Create Report to generate a pdf- or rtf-formatted report containing the plots and charts of selected tabs.

•

Click Close All to close all graphics windows and underlying data sets associated with the output.