The Population Measures process creates a symmetric matrix of dissimilarities between specified groups within a study.
Genetic distance computation is based on differences in group
allele frequencies calculated using PROC ALLELE for each
marker variable. Groups are determined by the input values of the specified
population variable. Multiple genetic distance matrices can be output for separate sets of marker variables if an annotation
By Group variable is specified.
This process also produces Wright's F statistics (Wright, 1951) measuring the degree of relatedness between different types of allele pairs. Cockerham (1969, 1973) defines these same quantities in an analysis of variance (
ANOVA) framework. For a population hierarchy defined by the Population Variable, the measures computed as Pop Theta, Within Pop f, and Overall F correspond to Wright's F
_ST, and, when HWE is not assumed, F
_IS and F
_IT. A weighted average of these measures over all loci is also reported as an overall estimate as well as measures for individual loci are reported. The estimates of these parameters are calculated using an ANOVA structure along with a method-of-moments approach.
A second, optional, data set that can be used in this process is the
Annotation Data Set. This data set contains information, such as gene identity or chromosomal location, for each of the markers.
A Heat Map and Dendrogram showing the
hierarchical clustering of the genetic distances calculated in this process is shown below.
The hapmap_subset_pmd.sas7bdat matrix (shown below), used to generate this heat map, is suitable for further analyses. Use the action button to load the output matrix directly into the
Multidimensional Scaling process.
Finally, two output data sets (not shown), listing individual and overall F-statistics, are generated. The results across the region spanned by the markers, are displayed in the
Overlay Plot shown below.