This data set represents a small subset of the
Drosophila
aging experiment data from (Jin, Riley
et al
. 2001). The experiment consisted of 24 two-color cDNA
microarrays
, 6 for each experimental combination of 2 lines (Oregon and Samarkand), 2 sexes (Female and Male), and 2 ages (1 week and 6 weeks). The
Cy3
and
Cy5
dyes were flipped for two of the 6 replicates for each
genotype
and sex combination. The design is a split-plot, with Age and Dye as subplot
factors
, and Line and Sex as whole-plot factors. A total of 4256 clones were spotted on the arrays, but this example uses a subset containing 100 randomly selected genes from the original data set.
Each experiment was repeated in triplicate using Affymetrix chips cut from different wafers. The last four digits of the wafer numbers are 1521, 1532 and 2353. Wafer 2353, chip c was defective, so is not included in the data set. For wafers 1521 and 1532, 20
.cel
files were generated, and for wafer 2353, 19
.cel
files were generated. Each group contains a pool of non-specific
RNA
as well as a set of 14 distinct human
transcripts
spiked in at known concentrations of 0, 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024 pM.
These data are computer-simulated. The data are in wide form. 1000 rows correspond to individuals and 130 columns correspond to various data on these individuals. These data contain family, genotype, and
phenotype
information. The disease column contains the
binary trait
of primary interest, where 1 indicates individuals affected with the disease and 0 indicates unaffected individuals. There are also four quantitative
traits
and sixty markers, with two possible
alleles
(designated 1 and 2), per marker, for each individual. The marker data occur in pairs, so that the genotype at the first marker comprises columns
ma1
and
ma2
,
ma3
and
ma4
the second marker genotype, and so on. The analyses performed on this data set are aiming to locate the gene or genes that affect susceptibility to this disease.
Accompanying this data set is a map data set that provides information about the 60 markers, which are spread across two hypothetical candidate gene regions. The
variable
representing on which candidate gene the marker resides can be used to group analyses, and the Location variable is useful for accurately displaying distances in
base
pairs between markers along the
x
-axis of plots containing various
association
p-values
.
Two hundred families, each containing an affected sib-pair and the siblings' parents, were genotyped at 20 markers from a single
chromosome
in simulated data provided by Gonçalo Abecasis at the University of Michigan Center for Statistical Genetics. MERLIN was used to estimate
identical-by-descent
(IBD) allele-sharing probabilities at these markers for all pairs of related individuals. The 400 offspring are also measured for a quantitative trait of interest.
These data came from a study of the effects of nicardipine on patients suffering from recent aneurismal subarachnoid hemorrhages (Haley,
et al. 1993a, 1993b). 906 patients were included in this randomized double-blind placebo-controlled study; 449 patients received nicardipine while 457 received the placebo. Patients in each group were balanced with regard to prognostic
factors
for overall outcome. Nicardipine and the placebo were delivered continuously at 0.15 mg/Kg/hr for up to 14 days and patients were followed for up to 120 days following administration of the drugs. Results are formatted according to the
CDISC
Study Tabulation Model (
SDTM
) and Analysis Data Model (
ADaM
).
This data set was obtained by surface-enhanced laser desorption/ionization (SELDI). This method allows an investigator to detect and resolve multiple proteins bound to protein chip arrays (Merchant and Weinberger 2000). This approach was used by Qu
et al
. (2002) to discriminate prostate cancer from non-prostate cancer patients. The promise of this approach is that a panel of multiple biomarkers can be used to distinguish important phenotypes such as cancer status. However, great care must be taken to pre-process and analyze the data appropriately to ensure generalizability of results.