The K Matrix Compression process clusters a symmetric relationship or estimated kinship matrix in order to reduce the size of the
random effect that accounts for relatedness in the
Q-K Mixed Model analytical process. The compressed matrix can be used to define the
covariance structure in a down-stream
mixed models for testing SNP-trait
association. Compression can be done interactively through the JMP Clustering Platform where you can decide cluster membership via a cutoff level in
hierarchical clustering based on visual inspection. Alternatively, clustering via PROC CLUSTER can be automated to produce a compressed
K matrix based on a specified number of clusters. The optimized compression method scans through varying levels of compression to find the compression level that optimizes the fit of the mixed model to a specified
trait (with the
SNP effect excluded). This algorithm is described in Zhang et al. (Nature Genetics, 2010).
Warning: In contrast to the
Q-K Mixed Model process, which requires the square root of the K matrix to be used in the
model, the
K Matrix Compression process requires the K matrix before taking the square root. This process computes the square root of the compressed K matrix (via Singular Value Decomposition) so that the columns are appropriately formatted for input as random effect
variables in the
Q-K Mixed Model process.
Note: To run optimized compression, a trait variable must be specified from the
SNP Input Data Set tab. If the SNP,
phenotype data and K matrix are all in the Input K matrix data set, a separate SNP input data set is not required. However, the K matrix input data set should be specified again as the SNP Input Data.