- 标题:“WORMSHOP:W3 - RNASEQ”输出:Biocstyle :: html_document:toc:true gignette:>%\ vignetteindexentry {workshop:w2 - rnaseq}%\ vignetteengine {knitr :: Rarkmdown} ---```{r style,echo = false,结果='asis'} biocstyle :: markdown()选项(width = 100,max.print = 1000)knitr :: opts_chunk $ set(eval = as.logical(sys.getenv()KNITR_EVAL“,”真“)),缓存= AS.LOGICY(SYS.GETENV(”KNITR_CACHE“,”TRUE“)))`````{R设置,echo = false,消息= false,警告= false} suppressPackageStartUpMessages({图书馆(Airway)库(Deseq2)})```作者:Martin Morgan(mtmorgan@fredhutch.org.
日期:2015年9月7日
返回[Workshop大纲](开发人员会议 - 工作坊.html)
本文档中的材料需要_r_版本3.2和_biocidodder_版本3.1``` {r configure-test} stopifnot(getRversion()> ='3.2'&& getRversion()<'3.3',Biocinstaller :: Biocversion()> =“3.1“)```#差异表达的统计分析 - ”Deseq2“1.实验设计2.湿式实验室制备3.高通量测序4.对齐 - 全基因组或转录组5.摘要 - 计数读取重叠兴趣区:“基因组态::汇总()`6. **统计分析** - [DESEQ2] [],[EDGER] [] 7.理解更广泛的材料 - [Edger] []和[Limma] []小插图。- [deseq2] []小插图。- [Airway] [] Vignette用于对齐和摘要阶段。- [RNA-SEQ工作流程](htth://biocondudard.org/help/workflows/rnaseqgene/)提供了对Airway数据集的更扩展分析。#挑战和解决方案起始点 - 重叠每个感兴趣区域的读取_counts_的矩阵 - 计数提供统计信息 - 较大的计数表明读取读取的置信度更大的信心。标准化措施(例如,RPKM)忽略此信息,因此失去统计权力。Normalization - Differences in read counts per sample for purely technical reasons - Simple scaling by total read count inadequate -- induces correlations with low-count reads - General solution: scale by a more robust measure of size, e.g., log geometric mean, quantile, ... Error model - Poisson 'shot' noise of reads sampled from a genome. E.g., longer genes receive more aligned reads compared to shorter genes with identical expression. - Additional biological variation due to differences between genes and individuals - Common modeling assumptions: _negative binomial_ variance - Dispersion parameter Limited sample size - A handful of samples in each treatment - Many 1000's of statistical tests - Challenge -- limited statistical power - Solution -- borrow information - Estimate variance as weighted average of _per gene_ variance, and _average variance_ of all genes - Per-gene variances are estimated precisely, though with some loss of accuracy - Example of _moderated_ test statistic Multiple testing - Need to adjust for multiple comparisons - Reducing number of tests enhances statistical power - Filter genes to exclude from testing using _a priori_ criteria - Not biologically interesting - Not statistically interesting _under the null_, e.g., insufficient counts across samples # Work flow ## Data representation Three types of information - A `matrix` of counts of reads overlapping regions of interest - A `data.frame` summarizing samples used in the analysis - `GenomicRanges` describing the regions of interest `SummarizedExperiment` coordinates this information - Coordinated management of three data resources - Easy integration with other _Bioconductor_ software ![](our_figures/SE_Description.png) ```{r airway} library("airway") data(airway) airway ## main components of SummarizedExperiment head(assay(airway)) colData(airway) rowRanges(airway) ## e.g., coordinated subset to include dex 'trt' samples airway[, airway$dex == "trt"] ## e.g., keep only rows with non-zero counts airway <- airway[rowSums(assay(airway)) != 0, ] ``` ## DESeq2 work flow 1. Add experimental design information to the `SummarizedExperiment` ```{r DESeqDataSet} library(DESeq2) dds <- DESeqDataSet(airway, design = ~ cell + dex) ``` 2. Peform the essential work flow steps ```{r DESeq-workflow} dds <- DESeq(dds) dds ``` 3. Extract results ```{r DESeq-result} res <- results(dds) res ``` [DESeq2]: //www.andersvercelli.com/packages/DESeq2 [limma]: //www.andersvercelli.com/packages/limma [edgeR]: //www.andersvercelli.com/packages/edgeR [airway]: //www.andersvercelli.com/packages/airway