欧洲杯冠军投注-2021欧洲杯体育投注开户-欧洲杯2021体育彩票

- 标题：“WORMSHOP：W3 - RNASEQ”输出：Biocstyle :: html_document：toc：true gignette：>％\ vignetteindexentry {workshop：w2 - rnaseq}％\ vignetteengine {knitr :: Rarkmdown} ---```{r style，echo = false，结果='asis'} biocstyle :: markdown（）选项（width = 100，max.print = 1000）knitr :: opts_chunk $ set（eval = as.logical（sys.getenv（）KNITR_EVAL“，”真“）），缓存= AS.LOGICY（SYS.GETENV（”KNITR_CACHE“，”TRUE“）））`````{R设置，echo = false，消息= false，警告= false} suppressPackageStartUpMessages（{图书馆（Airway）库（Deseq2）}）```作者：Martin Morgan（mtmorgan@fredhutch.org.）
日期：2015年9月7日
返回[Workshop大纲]（开发人员会议 - 工作坊.html）
本文档中的材料需要_r_版本3.2和_biocidodder_版本3.1``` {r configure-test} stopifnot（getRversion（）> ='3.2'&& getRversion（）<'3.3'，Biocinstaller :: Biocversion（）> =“3.1“）```＃差异表达的统计分析 - ”Deseq2“1.实验设计2.湿式实验室制备3.高通量测序4.对齐 - 全基因组或转录组5.摘要 - 计数读取重叠兴趣区：“基因组态::汇总（）`6. **统计分析** - [DESEQ2] []，[EDGER] [] 7.理解更广泛的材料 - [Edger] []和[Limma] []小插图。- [deseq2] []小插图。- [Airway] [] Vignette用于对齐和摘要阶段。- [RNA-SEQ工作流程]（htth：//biocondudard.org/help/workflows/rnaseqgene/）提供了对Airway数据集的更扩展分析。＃挑战和解决方案起始点 - 重叠每个感兴趣区域的读取_counts_的矩阵 - 计数提供统计信息 - 较大的计数表明读取读取的置信度更大的信心。标准化措施（例如，RPKM）忽略此信息，因此失去统计权力。Normalization - Differences in read counts per sample for purely technical reasons - Simple scaling by total read count inadequate -- induces correlations with low-count reads - General solution: scale by a more robust measure of size, e.g., log geometric mean, quantile, ... Error model - Poisson 'shot' noise of reads sampled from a genome. E.g., longer genes receive more aligned reads compared to shorter genes with identical expression. - Additional biological variation due to differences between genes and individuals - Common modeling assumptions: _negative binomial_ variance - Dispersion parameter Limited sample size - A handful of samples in each treatment - Many 1000's of statistical tests - Challenge -- limited statistical power - Solution -- borrow information - Estimate variance as weighted average of _per gene_ variance, and _average variance_ of all genes - Per-gene variances are estimated precisely, though with some loss of accuracy - Example of _moderated_ test statistic Multiple testing - Need to adjust for multiple comparisons - Reducing number of tests enhances statistical power - Filter genes to exclude from testing using _a priori_ criteria - Not biologically interesting - Not statistically interesting _under the null_, e.g., insufficient counts across samples # Work flow ## Data representation Three types of information - A `matrix` of counts of reads overlapping regions of interest - A `data.frame` summarizing samples used in the analysis - `GenomicRanges` describing the regions of interest `SummarizedExperiment` coordinates this information - Coordinated management of three data resources - Easy integration with other _Bioconductor_ software ![](our_figures/SE_Description.png) ```{r airway} library("airway") data(airway) airway ## main components of SummarizedExperiment head(assay(airway)) colData(airway) rowRanges(airway) ## e.g., coordinated subset to include dex 'trt' samples airway[, airway$dex == "trt"] ## e.g., keep only rows with non-zero counts airway <- airway[rowSums(assay(airway)) != 0, ] ``` ## DESeq2 work flow 1. Add experimental design information to the `SummarizedExperiment` ```{r DESeqDataSet} library(DESeq2) dds <- DESeqDataSet(airway, design = ~ cell + dex) ``` 2. Peform the essential work flow steps ```{r DESeq-workflow} dds <- DESeq(dds) dds ``` 3. Extract results ```{r DESeq-result} res <- results(dds) res ``` [DESeq2]: //www.andersvercelli.com/packages/DESeq2 [limma]: //www.andersvercelli.com/packages/limma [edgeR]: //www.andersvercelli.com/packages/edgeR [airway]: //www.andersvercelli.com/packages/airway