An Annotation Data Set contains biological or chemical information and properties about genes, SNPs, probes, probesets, or peptides. This annotation information comes from various online Bioinformatics resources, including government agencies, academic organizations and commercial entities. It is used to create a custom Annotation Data Set for your analysis.The structure of an annotation data set and the information that it provides can vary depending on the nature of the experiment, the source of the data and the application that generated it. The table below lists information commonly contained in an Annotation Data Set. Keep in mind that different providers might name annotation information differently.
An accession number is a unique identifier given to a biological polymer sequence (such as DNA or a protein) when it is submitted to a sequence database (GenBank, EMBL, DDBJ). A unique identifier assigned to a gene record in Entrez Gene. It is an integer and is species specific. For genomes that had been represented in LocusLink, the Gene ID is the same as the Locus ID. A unique identifier assigned to a single nucleotide polymorphism (SNP) when it is submitted to the SNP database. Also known as an 'rs' ID. For genetics, each row in the Annotation Data Set represents a marker or SNP used in the analysis, with variable typically containing the following information: a name or identifier for each marker, the chromosome or candidate gene on which it is located, its location (in terms of kilobases or centiMorgans, for example), and an accession number that can be used to retrieve more information about the locus from a publicly available online database. This data set can be specified on the Annotation tab found on most of the process dialogs where the columns can be assigned to various roles:
• Annotation Label Variable - the name or ID variable that is used to label markers in the output
• Annotation Group Variable - the variable, such as chromosome, that can be used to group the analyses and output
• Annotation Location Variable - the variable containing marker locations to be used to accurately represent distances between markers in p-value plots
• Accession Number Variable - the variable containing GenBank accession number or dbSNP reference sequence ID for example, to be used to create buttons on p-value plots that provide direct access to the Web site for the selected marker from the appropriate online databaseThis tab also allows conditional inclusion of markers in your analysis based on particular values of variables from the Annotation Data Set. The criteria can be entered in the Filter to Include Variables field in accordance with SAS syntax for WHERE statements.For detailed information about the files and data sets used or created by JMP Life Sciences software, see Files and Data Sets.The method used for this specification can vary depending on whether JMP is connected to SAS on your local machine or connected to SAS on a server. You should refer to the Specifying Folders, Files, and Data Sets documentation for detailed information.
Click to open the data set in JMP for inspection.