Process Description
Affymetrix SNP CHP Input Engine
Genotype data gathered from Affymetrix microarrays is typically collected and stored in raw data files formatted as manufacturer-specific .CHP files. Before the information contained in these files can be analyzed using JMP Genomics, it must be extracted and organized into two SAS data sets:
• | one wide data set containing the genotypes for each of the markers in each of the individuals in the study, and |
• | an Experimental Design Data Set (EDDS), that contains information about the experimental design. |
An optional third data set containing the sorted annotation information can also be generated.
The Affymetrix SNP CHP Input Engine enables you to import data and other information contained in Affymetrix .chp files into these SAS data sets.
What do I need?
Before you can successfully import the raw data into SAS data sets that can be used for analysis in JMP Genomics, you must locate and gather different sources of information:
• | The folder containing the raw data files. These .chp files, each corresponding to an individual microarray, contain the hybridization intensities and specific information about the format of the chip. |
• | The Experimental Design File (EDF) for the experiment. The EDF lists specific information about the design of the experiment. The EDF is typically a text file or Excel spread sheet and must be created before the data can be imported. |
• | One or more specific library files, available for download from Affymetrix, that contain information used to associate individual data points extracted from the .chp files with corresponding probesets, might be required for importing older .chp files that were formatted before the introduction of the AGCC format used by the Affymetrix Expression Console. |
• | An Annotation Data Set. This data set provides annotation information, such as gene names, function, physical location, and association, for each of the markers used in the analysis. This data set must have a variable named Probe_Set_ID that is used to correctly order the SNPs in the output data set. |
Tip: The appropriate annotation file can be downloaded from the Affymetrix website using the NetAffx Download Engine process found under the Genomics > Import > Affymetrix menu. Once you have downloaded the appropriate annotation .csv file, select Genomics > Import > Affymetrix > Annotation CSV to import this file into a SAS data set.
The following example uses a subset of the SNPv6 demo data set provided by Affymetrix. The compressed files were downloaded from the Affymetrix website, unzipped and saved to a new folder named SNP-CHP, located in the Sample Data folder that is included with JMP Genomics. Included .chp files are listed below.
Additional files include the GenomeWideSNP_6.cdf CDF and the genomewidesnp_6_na23_annot.sas7bdat annotation file.
The first step in importing the data contained in the .chp files was to generate an Experimental Design File (EDF), using information contained in the .arr files. This action involved the parsing of the .arr files using the Affymetrix ARR File Parser process. The resulting EDF.jmp file was opened in JMP and the required ColumnName column was generated using the Create ColumnName process. The values in this column were generated by concatenating the values in the FileName and File columns. Finally, the modified EDF file was saved in the SNP-CHP folder as a .sas7bdat file.
For detailed information about the files and data sets used or created by JMP Genomics software, see Files and Data Sets.
Output/Results
The output data sets generated by this process are listed in a Results window. Refer to the Affymetrix SNP CHP Input Engine output documentation for detailed descriptions.