The
Population Measures
process creates a symmetric matrix
of dissimilarities between specified groups within a study.
Genetic distance
computation is based on differences in group
allele
frequencies calculated using PROC ALLELE for each
marker variable
. Groups are determined by the input values of the specified
population
variable. Multiple genetic distance matrices can be output for separate sets of marker variables if an annotation
By Group
variable is specified.
This process also produces Wright's
F
statistics (Wright, 1951) measuring the degree of relatedness between different types of allele pairs. Cockerham (1969, 1973) defines these same quantities in an analysis of variance (
ANOVA
) framework. For a population hierarchy defined by the Population Variable, the measures computed as Pop Theta, Within Pop f, and Overall F correspond to Wright's F
_ST
, and, when HWE is not assumed, F
_IS
and F
_IT
. A weighted average of these measures over all loci is also reported as an overall estimate as well as measures for individual loci are reported. The estimates of these parameters are calculated using an ANOVA structure along with a method-of-moments approach.
A second, optional, data set that can be used in this process
is the
Annotation Data Set
. This data set contains information, such as gene identity or chromosomal location, for each of the markers.
A
Heat Map and Dendrogram
showing the
hierarchical clustering
of the genetic distances calculated in this process is shown below.
The
hapmap_subset_pmd.sas7bdat
matrix (shown below), used to generate this heat map, is suitable for further analyses. Use the action button to load the output matrix directly into the
Multidimensional Scaling
process.
Finally, two output data sets (not shown), listing individual and overall
F-statistics, are generated. The results across the region spanned by the markers, are displayed in the
Overlay Plot
shown below.