{r setup, echo=FALSE}库(LearnBioconductor) stopifnot(BiocInstaller::biocVersion() == "3.0")BiocStyle::markdown() knitr::opts_chunk$set(tidy=FALSE)#通用序列分析工作流程Martin Morgan, Sonali Arora
2014年10月28日## RNA-SEQ参见[讲义笔记](b02.1_rnaseq.html)和[实验室](b02.1_rnaseqlab.html)。RNA-SEQ差异表达已知_genes_ - 最简单的场景 - 实验设计:简单,复制;追踪协变量并了解批量效应 - 测序:读取的中等长度和数量;单个或配对端(虽然透不匹配端)。- 对齐:基本拼接感知对齐器,例如_bowtie2_,_star_。可行的_bioconductor_方法:`r biocpkg(“rsubread”)`,`r biocpkg(“rbowtie”)`(特别是通过`r biocpkg(“quasr”)`包)。- 减少:`genomicranges :: summarizeoverlaps()`或外部工具,使用来自`txdb的基因模型。*`包或gff / gtf文件。最终结果:计数矩阵。- 分析:`r Biocpkg(“deseq2”)`,`r biocpkg(“edger”)`,以及其他软件。RNA-SEQ差异表达式的已知_transcripts_ - 流行的非_r_工作流程:_RBOWTIE2_,_tophat_,_cufflinks_,_cuffdiff_。 - _Biocondutor_ options - `r Biocpkg("DEXSeq")`: differential _exon_ use. - `Rsubread::subjunc()` for aligning without requiring known gene models. - `r Biocpkg("cummeRbund")`: working with _cufflinks_ output. Single-cell expression - `r Biocpkg("monocle")` ## ChIP-seq See my recent [slides](//www.andersvercelli.com/help/course-materials/2014/CSAMA2014/4_Thursday/lectures/ChIPSeq_slides.pdf) outlining ChIP-seq and relevant _Bioconductor_ software. - Experimental design / wet lab: important to effectively enrich genomic DNA via ChIP, otherwise hard to distinguish signal peaks from background - Sequencing: moderate length and number of single-end reads very adequate. - Alignment: Basic aligners sufficient - Reduction - External software; many tools depending on application, e.g., _MACS_. - Product: BED and / or WIG files of called peaks - Analysis & Comprehension - `r Biocpkg("ChIPQC")` for quality control. - `r Biocpkg("rtracklayer")` to input BED and WIG files to standard _Bioconductor_ data structures. - `r Biocpkg("ChIPpeakAnno")`, `r Biocpkg("ChIPXpres")` for annotating peaks in relation to genes. - `r Biocpkg("DiffBind")` to assess differential representation of peaks in a designed experiment. - `r Biocpkg("AnnotationHub")` for accessing (some) consortium-level summary data. ## Copy Number See the [Copy Number Workflow](./B02.2.3_CopyNumber.html) document. ## Variants See Michael Lawrence's variant calling with [VariantTools](//www.andersvercelli.com/help/course-materials/2014/BioC2014/Lawrence_Tutorial.pdf). and Val Obenchain's manipulation and annotation of called variants with [VariantAnnotation](//www.andersvercelli.com/help/workflows/variants/). - Sequencing: requires high-quality reads with high per-nucleotide depth of coverage -- longer, paired-end sequencing. - Alignment: requires effective aligners; _BWA_, _GMAP_, ... - `r Biocpkg("gmapR")` wraps the GMAP aligner in _R_. - Reduction: typically to VCF files summarizing variants and / or population-level variation. _GATK_ and other non-_R_ tools commonly used. - `r Biocpkg("VariantTools")` includes facilities for calling variants. - `r Biocpkg("h5vc")` targets a different intermediate step: summarize base counts at each position in the genome; use this as a starting point for calling variants, and to evaluate false positives, etc. - Analysis & comprehension - `r Biocpkg("VariantAnnotation")`, `r Biocpkg("ensemblVEP")` for querying / inputing VCF files, and for annotation of variants ("is this a coding variant?", etc.). - `r Biocpkg("SomaticSignatures")` for working with somatic signatures of single-nucleotide variatns. ## Epigenomics See the short [introduction](//www.andersvercelli.com/help/course-materials/2014/Epigenomics/MethylationArrays.html) and [lab](//www.andersvercelli.com/help/course-materials/2014/Epigenomics/MethylationArrays-lab.html) centered around Illumina 450k methylation arrays and the `r Biocpkg("minfi")` package. - Analysis & comprehension: `r Biocpkg("bsseq")`, `r Biocpkg("BiSeq")` for processing and analysis; `r Biocpkg("bumphunter")` as basic tool for identifying CpG features. ## Microbiome - Experimental design: typically population-level surveys with moderate (10's-100's) of samples. - Wet lab & sequencing: often target phylogenetically-informative genes, requiring longer (overlapping) paired-end reads. Many existing studies used 454 technology, which has a different sequencing error model than Illumina (e.g., homopolymers are a common error, instead of trailing nucleotide quality deterioration). - Reduction: Pre-processing (e.g., knitting together overlapping paired-end reads) and taxonomic classification / placement in third-party software, e.g., _QIIME_, _pplacer_. End result: count table summarizing represenation of distinct taxa in each sample. - `r Biocpkg("rRDP")` provides an _R_ / _Bioconductor_ interface to the RDP classifiere. - Analysis: _R_ / _Bioconductor_ and many insights from microarray / RNA-seq analysis well suited to count table, but common pipelines have re- or dis-invented the wheel. - `r Biocpkg("phyloseq")` provides very nice tools for general analysis.