{r setup, echo=FALSE}库(UseBioconductor) stopifnot(BiocInstaller::biocVersion() == "3.1")BiocStyle::markdown() knitr::opts_chunk$set(tidy=FALSE)“#生物导体导论_马丁·摩根,Hervé Pagès。
2月4日,2015 ##背景:_R_ -向量:' logical() ', ' integer() ', ' numeric() ', ' character() ',…“矩阵()”,“数组()”——“列表()”,“data.frame()”,…统计概念:' NA ', ' factor() ', ' ~ '公式,…S3类-非正式的类系统;' list() '带有' class() '属性;线性类层次结构,单分派。-泛型' foo ' (body: ' UseMethod() ')和方法' foo。A ' -帮助:' ?foo”、“? foo。A '发现:' methods() ', ' methods(class=<…>)' - Example ' ' {r S3} x <- rnorm(1000) y <- x + rnorm(1000, .5) df <- data.frame(x=x, y=y) fit <- lm(y ~ x, df) class(fit) methods(class=class(fit)) methods(anova) plot(y ~ x, df) abline(fit, col="red", lwd=2)S4类-通过' setClass() '的正式类,多重继承,多重分派-泛型' foo '和相关方法(' showMethods("foo") ') -帮助:?foo”、“方法?foo,”、“类?”——发现:“showMethods (" foo ")”、“showMethods(=“A”类,=搜索())”——例子”“{r S4} suppressPackageStartupMessages({库(IRanges)})启动< - as.integer (runif(1000年,1,1 e4)) < - as.integer宽度(runif(长度(开始),50岁,100))红外< - IRanges(开始,宽度=宽度)覆盖(ir) findOverlaps (ir) showMethods(“覆盖”) ``` ```{r showMethods, eval=FALSE} showMethods(classes=class(ir), where=search()) ``` Notes - Package authors are at liberty to document classes and methods as they see fit, e.g., all methods on the same page as their class - Methods are defined independently of class, so available methods can depend on loaded packages, e.g., compare to previous ```{r S4-methods} suppressPackageStartupMessages({ library(GenomicRanges) }) showMethods("coverage") ``` ## Principles 1. Statistical - Volume, technology, experimental design 2. Extensive - Software, annotation - Core and community contributions - Leading edge 3. Interoperable - Common data structures, e.g., `GRanges` 4. Reproducible - Integrated data containers, e.g., `SummarizedExperiment` - Vignettes & "old school" scripts 5. Accessible -- affordable, transparent, usable - `example(findOverlaps)` - `browseVignettes("IRanges")` ## Infrastructure Sequences - `DNAString` / `DNAStringSet` ```{r} suppressPackageStartupMessages({ library(Biostrings) }) data(phiX174Phage) m <- consensusMatrix(phiX174Phage)[1:4,] polymorphic <- colSums(m > 0) > 1 endoapply(phiX174Phage, `[`, polymorphic) ``` Genomic Ranges - `GRanges` ![GRanges](our_figures/GRanges.png) - `GRangesList` ![GRanges](our_figures/GRangesList.png) Integrating sample, range and assay data - `SummarizedExperiment` ![SummarizedExperiment](our_figures/SummarizedExperiment.png) ## Key packages `r Biocpkg("Biostirings")` -- Sequences - `r Biocpkg("BSgenome")` -- Whole-geneome - `r Biocpkg("ShortRead")` -- Short read / fastq `r Biocpkg("GenomicRanges")` -- Ranges - Builds on `r Biocpkg("IRanges")`; currently includes `SummarizedExperiment` - `r Biocpkg("GenomicAlignments")` -- aligned reads - `r Biocpkg("GenomicFeatures")` -- feature-based annotation `r Biocpkg("BiocParallel")` -- Parallel processing - `r Biocpkg("GenomicFiles")` -- Collections of 'genomic' (e.g., BAM, BED, WIG, ...) files ## Work flows ![SequencingEcosystem](our_figures/SequencingEcosystem.png) [biocViews](//www.andersvercelli.com/packages/release/BiocViews.html#___Software) for discovery. RNA-seq - Genes -- `r Biocpkg("edgeR")`, `r Biocpkg("DESeq2")` - Transcripts -- `r Biocpkg("DEXSeq")`, `r Biocpkg("BitSeq")`, `r Biocpkg("SGSeq")` ChIP-seq - QC -- `r Biocpkg("ChIPQC")` - Differential binding -- `r Biocpkg("DiffBind")`, `r Biocpkg("csaw")` - Annotation -- `r Biocpkg("ChIPseeker")` Variants - Calling -- `r Biocpkg("VariantTools")`, `r Biocpkg("h5vc")`, `r Biocpkg("Rariant")` - Manipulation and annotation -- `r Biocpkg("VariantAnnotation")`, `r Biocpkg("ensemblVEP")`, `r Biocpkg("VariantFiltering")` Copy number - 45 packages tagged with "CopyNumberVariation" in [biocViews](//www.andersvercelli.com/packages/devel/BiocViews.html#___CopyNumberVariation); also terms "DNASeq", "ExomeSeq", "WholeGenome" - Represent duplicated regions as genomic ranges; integrates very easily in _Bioconductor_ annotation work flows. Methylation - Bump hunting -- `r Biocpkg("minfi")` - Visualization -- `r Biocpkg("epivizR")` (much more than epigenomics!) Expression and other arrays - Pre-processing -- `r Biocpkg("oligo")` - Differential representation -- `r Biocpkg("limma")`