Experimental Design File

One of the first data sets you encounter as you begin to use JMP Genomics is the experimental design file (EDF). The EDF provides JMP Genomics with important information about how an experiment is carried out. It defines experimental variables such as treatment conditions and covariates and provides the basis for organizing and analyzing your data.

An EDF is required by many of the input engines for the construction of a SAS data set from the raw data files. In fact, the name and location of the EDF is frequently the first parameter that must be defined. An EDF also serves as a precursor to another file known as the experimental design data set (EDDS) that is typically required for analysis of tall SAS data sets.

An additional advantage to using an EDF is that you can have all of the experimental variables collected in one table that can be reused or modified as needed. An EDF is an excellent way to consolidate, store, and share the critical factors in an experiment, rather than trying to attach them to the raw data later one-by-one or adding them into the names of the raw data files. Since an EDF can be used to record corresponding experimental factors of your genomics experiment, you might want to construct one during the initial planning of your experiments.

EDFs for JMP Genomics must conform to the following conventions:

One column must have header name Array, Chip, or Spectrum. An optional second column must be named Channel or Dye. The data entries in these two columns must uniquely identify the rows of the file. The Create Array Index process helps you generate this column.

One column must have header name File or FileName. The entries in this column must contain the names of the raw data files that are associated with each row. The Check File Names process helps you check the accuracy of the filenames.

One column must have header name ColumnName¹. The entries in this column must correspond to the names of the SAS variables in the tall data set that is to be associated with this experimental design. SAS variable names have certain restrictions. The Create ColumnName process helps you generate this column.

When your raw data files have more than one column to be read as raw data, you must include a column named Intensity. The entries in this column list the names of the columns in the raw data files that are to be read.

When your raw data files have a column corresponding to background signal that you want to subtract from the raw data, located in the specified Intensity column, include a column named Background. The entries in this column contain the names of the columns in the raw data file that correspond to the background columns.

When you want to input other extra columns for data that are shared across all the raw files, such as coordinates of molecular entities on arrays, you can include columns named _X_varname in your EDF, where varname is the name to be assigned to these columns in the tall data set to be created. The entries in this column contain the names of the columns in the raw data file that correspond to the extra data to be read.

The file must be in one of the following formats: tab-delimited with .txt extension, comma-delimited with .csv extension, Microsoft Excel with .xls extension, or a SAS data set, with .sas7bdat extension.

EDFs can be built in a variety of ways. The simplest method assumes you have a file in which individual raw data files are identified along with the experimental conditions under which they were generated.

Typical experimental conditions can include variables such as treatment, dosage, time, cell line, animal, sex, age, and so on. Such a file can be created using JMPs DOE capabilities. This file is read into JMP and modified such that it functions as an EDF.

If the design information is spread across separate tables, you can use JMPs Tables > Join command to merge the tables to create the design file. Should you not have such a design file, JMP Genomics includes tools, such as the Create Design File Template that you can use to create a new EDF. Alternatively, the JMP G. Wizard (click Genomics > Import > Getting Started > Getting Started Wizard) automatically creates an EDF.

An EDF is normally saved in a format other than SAS, such as a comma-separated values (.csv) file, tab delimited text (.txt) file, or Microsoft Excel (.xls) spreadsheet. A typical JMP data file (.jmp) does not work as an EDF. If creating a new design from scratch, you might need to add one or more columns to the table in order for it to be a valid EDF. You can then use the JMP File > Save command to save it as a text file or as an Excel spreadsheet.

For detailed information about the files and data sets used or created by JMP Life Sciences software, see Files and Data Sets.

The method used for this specification can vary depending on whether JMP is connected to SAS on your local machine or connected to SAS on a server. You should refer to the Specifying Folders, Files, and Data Sets documentation for detailed information.

There is one exception to this rule: The EDF for the SAM Input Engine, BAM Input Engine, Eland Input Engine, and the Affymetrix Tiling CEL Input Engine import processes can contain a column named SampleName in addition to or in place of ColumnName. The SampleName column is used for combining data from individual samples that are spread across multiple files.