Next-Gen Sequencing
Several processes are available for generating or binning counts, inferring gene structure, and creating and importing variant call format (VCF) files from next-generation sequencing data.
Count SAS Data Generation
The first three processes import a set of files and generate count data, which is combined into SAS data sets containing chromosome, location, and sequence identity with respect to a reference sequence.
Process |
Input file format |
Input file extension |
Sequence Alignment Map (SAM) |
.sam |
|
Compressed Binary Sequence Alignment Map (BAM) |
.bam |
|
Eland |
.txt |
Binning and Summarization
The following two processes are used for additional condensation and summarization of next-generation sequencing data.
Process |
Choose this process for... |
Binning intensities or read counts stored in rows of a tall SAS data set Tip: This can be useful to reduce the number of rows in a large data set in preparation for downstream plotting and modeling. |
|
Summarizing position-level intensity data into exon and intron bins as defined by an isoform definition file in UCSC format Tip: Output from a process such as SAM Input Engine can be used as input for this process. |
VCF File and SAS Data Set Generation from Other Sources
The remaining processes focus on the detection of single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs, also known as deletion insertion polymorphisms (DIPs)), generating VCF or SAS files.
Process |
Choose this process for... |
Generating variant call format (VCF) files from SNPs/INDELs called (using SAMtools/BCFtools) from BAM files |
|
Importing CLC bio SNP or DIP Detection Table .csv files into SAS data set(s) |
|
Importing Complete Genomics files into SAS data set(s) |
|
Importing variant call format (VCF) files into SAS data set(s) |
|
Importing 10x Genomics Single-Cell RNA Sequencing data to SAS data sets. |
See Import for other subcategories.