%\ vignetteengine {knitr :: knitr}%\ vignetteIndexentry {3。甲基化阵列 - 实验室}#甲基化阵列 - 2014年实验室表观组织
作者:Martin Morgan (mtmorgan@fhcrc.org)
knitr::opts_chunk$set(cache=TRUE)::opts_chunk$set(cache=TRUE)``` ```{r, echo=FALSE} suppressPackageStartupMessages({require(minfi) require(minfiData)})本案例研究快速浏览了[minfi](//www.andersvercelli.com/packages/release/bioc/html/minfi.html) Bioconductor软件包。主要目标是熟悉生物导体对象和方法的使用;有关Illumina阵列甲基化分析的更多背景信息,请参阅[小插图](//www.andersvercelli.com/packages/release/bioc/vignettes/epivizr/inst/doc/minfi.pdf)附带的minfi软件包。首先附加' minfi '和' minfiData '包。使用' browseVignettes("minfi") '来访问附加背景信息的小插图。``` `{r, eval=FALSE} ``` ``任何工作流程的第一步都是读取数据。在以下位置提供了一个样本数据集。 ```{r} baseDir <- system.file("extdata", package = "minfiData") baseDir dir(baseDir) dir(file.path(baseDir, "5723646052")) ``` Of course your own data would be at another location, and you might enter the path (with 'tab completion') instead of using `system.file()`. A typical organization is that each 'slide' (containing 12 arrays) is stored in a separate directory. The top-level directory contains a `.csv` file describing the samples; inside each slide directory are IDAT files representing the output of the Illumina scanner. Next read in the sample sheet, and then the raw probe-level data. Take a moment to use R to explore the sample sheet. Read the 'man' page for `read.450k.sheet` and `read.450k.exp` to see what options are available. ```{r} ## 'pData' targets <- read.450k.sheet(baseDir) head(targets) ## 'raw' probe-level data RGset <- read.450k.exp(base = baseDir, targets = targets) ``` As a basic quality assessment, visualize the distribution of beta values across each array, coloring the density functions by sample. Are there any concerns about the data? ```{r} ## Basic QA -- comparable densities across samples? densityPlot(RGset, sampGroups = RGset$Sample_Group, main = "Beta", xlab = "Beta") ``` A _technical artifact_ is that probe intensities differ depending on their sequence composition, so it is necessary to perform a 'background correction'. Also, the distribution of probe intensities differ from one another as a consequence of sample preparation steps, e.g., slightly different initial amounts of DNA from one sample compared to another. Basic steps in microarray analysis (many of these steps are shared by other high-throughput assays) are therefore _background correction_ and between-array _normalization_. Use the `preprocessIllumina()` command to perform these steps. Are there other normalization strategies available in this package? ```{r} ## background correction and normalization ## like Illumina Genome Studio (other approaches available) MSet.norm <- preprocessIllumina(RGset, bg.correct = TRUE, normalize = "controls", reference = 2) ``` Once data are background-corrected and normalized, it is possible to compare the vector of methylation values of each sample. Use the `mdsPlot()` function to visualize the multidimensional relationship between samples in reduced dimensions. Use arguments of `mdsPlot()` to name and highlight the different sample groups. ```{r} ## How similar are the samples to one another? mdsPlot(MSet.norm, numPositions = 1000, sampGroups = MSet.norm$Sample_Group, sampNames = MSet.norm$Sample_Name) ``` Take a portion of the data (the first 100,000 probes), retrieve the logit-transformed beta values, and then use `dmpFinder()` CpGs where methylation status is associated with sample group. From the help page, references, and your own knowledge, any ideas about `shrinkVar`? ```{r} ## Identify probes with methylation status differing between groups mset <- MSet.norm[1:100000,] ## logit(beta) M <- getM(mset, type = "beta", betaThreshold = 0.001) dmp <- dmpFinder(M, pheno=mset$Sample_Group, type="categorical") head(dmp) ``` Visualize differential methylation using `plotCpg()` ```{r} plotCpg(mset, cpg=rownames(dmp)[1], pheno=mset$Sample_Group) ``` Probes interrogate genomic locations. Use `mapToGenome()` to translate the probe identifiers to genomic coordinates. This transforms `mset` into an object of class `SummarizedExperiment`. A `SummarizedExperiment` is similar to an expression set, bui with `GRanges` to annotate rows, rather than a `data.frame`. Use `rowData()` to extract the `GRanges` from the `SummarizedExperiment`; explore it. Add the annotations about differentially expressed probes to the row data of the `SummarizedExperiment`. ```{r} ## Genomic locations mset <- mset[rownames(dmp),] mse <- mapToGenome(mset) # 'SummarizedExperiment' rowData(mse) mcols(rowData(mse)) <- cbind(mcols(rowData(mse)), dmp) ```