Imputed SNP (Tall Format) Input Engine

Processes | Import | Imputed SNP (Tall Format) Input Engine

Imputed SNP (Tall Format) Input Engine

The Imputed SNP (Tall Format) Input Engine imports a set of files created by a SNP imputation program, such as IMPUTE (Marchini et al., 2007) or BEAGLE (Browning and Browning, 2009). This process outputs three different SAS genotype data sets that can be used for subsequent analyses.

•

The output genotype threshold data set is a wide data set listing the most likely genotype for each marker. Alternatively, for markers for which no genotype's probability meets the specified threshold, a missing value symbol (.) is listed.

•

The output genotype probabilities data set lists genotype probabilities in a stacked format.

•

The output Annotation Data Set lists the map position and alleles for each of the SNP markers.

Consult the Imputed SNP Import Tutorial (Genomics > Import > Other Genetics > Imputed SNP Import Tutorial) for help on what options to use for your particular files. You should also refer to Data Sets Used in JMP Genomics Processes for information about data set formats.

What do I need?

At least one genotype probability file is required for this process. This file must be in the tall format, where sets of genotype probability columns correspond to individuals and SNPs are in rows. With the options provided, files from programs can be imported and analyzed.

A second, optional file is the sample file. This text file, which contains information about the sample in the genotype probability file(s), must be a space-delimited file with column names in the first row and data beginning on the third row, with rows of samples ordered the same as the columns of samples in the genotype file(s). During the input process, columns from this file are merged with the genotype columns.

The following example uses the example.gen and the example.sample files included in the Sample Data folder, which are example files from the IMPUTE program. They are provided courtesy of Jonathan Marchini at University of Oxford.

For detailed information about the files and data sets used or created by JMP Life Sciences software, see Files and Data Sets.

Output/Results

The output data sets generated by this process are listed in a Results window. Refer to the Imputed SNP (Tall Format) Input Engine output documentation for detailed descriptions.