---标题:“BioC2017 Multiadsayexperiment Lab”Vignette:> vignetteengine {Knitr :: Rarmmardown}%\ vignetteIndexentry {多测定实验的协调分析}%\ Vignetteencoding {UTF-8}输出:Biocstyle :: Html_document:number_sections:没有TOC:是的:是toc_depth:4 ---```{r setup,cression = false} knitr :: opts_chunk $ set(cache = true)`````````````````````````包装提供的肾上腺皮质癌(ACC)数据集。该数据集在92名患者中提供了五种测定,尽管所有五个患者未进行所有五种测定:1。** rnaseq2genorm **:基因mRNA丰富的RNA-SEQ 2. ** Gistict **:基因3的Gistic基因组拷贝数。** RPPAARRAY **:反相蛋白阵列的蛋白质丰富4. **突变**:基因的非静音体细胞突变5. ** mirnase qgene **:microRNA-SEQ的microRNA丰富。```{r} suppresspackagestartupmessages({librarysayexperiment)库(s4vectors)})数据(miniacoc)miniacc````#组件插槽## COLDATA - 信息生物单位此插槽是描述生物单位特征的“DataFrame”,例如患者的临床数据。在从[癌症基因组Atlas] []的制备的数据集中,每行是一个患者,每列是临床,病理,亚型或其他变量。“$”函数为访问或设置“COUDATA”列提供快捷方式。```{R} Coldata(MiniaCC)[1:4,1:4]表(MiniAcc $ Race)```*关键点:* *每位患者的一行*每行地图到每个实验中的零或更多观察在`实验室,下面。##实验列表 - 实验数据基本`列表或`实验室列表,其中包含所收集的样本集的实验数据集。在建设期间,这被转换为类别`实验室列表。 ```{r} experiments(miniACC) ``` *Key points:* * One matrix-like dataset per list element (although they do not even need to be matrix-like, see for example the `RaggedExperiment` package) * One matrix column per assayed specimen. Each matrix column must correspond to exactly one row of `colData`: in other words, you must know which patient or cell line the observation came from. However, multiple columns can come from the same patient, or there can be no data for that patient. * Matrix rows correspond to variables, e.g. genes or genomic ranges * `ExperimentList` elements can be genomic range-based (e.g. `SummarizedExperiment::RangedSummarizedExperiment-class` or `RaggedExperiment::RaggedExperiment-class`) or ID-based data (e.g. `SummarizedExperiment::SummarizedExperiment-class`, `Biobase::eSet-class` `base::matrix-class`, `DelayedArray::DelayedArray-class`, and derived classes) * Any data class can be included in the `ExperimentList`, as long as it supports: single-bracket subsetting (`[`), `dimnames`, and `dim`. Most data classes defined in Bioconductor meet these requirements. ## sampleMap - relationship graph `sampleMap` is a graph representation of the relationship between biological units and experimental results. In simple cases where the column names of `ExperimentList` data matrices match the row names of `colData`, the user won't need to specify or think about a sample map, it can be created automatically by the `MultiAssayExperiment` constructor. `sampleMap` is a simple three-column `DataFrame`: 1. `assay` column: the name of the assay, and found in the names of `ExperimentList` list names 2. `primary` column: identifiers of patients or biological units, and found in the row names of `colData` 3. `colname` column: identifiers of assay results, and found in the column names of `ExperimentList` elements Helper functions are available for creating a map from a list. See `?listToMap` ```{r} sampleMap(miniACC) ``` *Key points:* * relates experimental observations (`colnames`) to `colData` * permits experiment-specific sample naming, missing, and replicate observations

回到顶部

##元数据元数据可用于保留有关患者的其他信息,对单个或整个群组进行的测定,或基因,蛋白质和基因组范围等特征。有许多可用于存储元数据的选项。首先,“MultiAsayAayexPeriment”提供了自己的元数据,用于描述整个实验:```{R}元数据(MINIACC)``“另外,”SampleMap“和”COLDATA“以及”COLDATA“以及”COLDATA“以及”“实验列表课程,同样支持元数据。最后,可以在“实验列表”支持元数据中使用的许多实验数据对象。这些为用户和派生类的开发人员提供灵活的选项。2021欧洲杯体育投注开户#子集##单个括号`[在下面的伪代码中,子集操作在以下索引的行上运行:1。_i_实验数据行2. _j_主名称或列名称(输入为`list`或`list`)3。_k_测定```multiasaysayexperiment [i = rownames,j = migher或colnames,k = messay]```subleving操作始终返回另一个`multiassayexperiment`。例如,以下将返回任何名为“MAPK14”或“IGFBP2”的行,并删除任何行不匹配的任何测定:```{r,结果='hide'} miniacc [c(“mapk14”,“igfbp2”),,]``“以下将使病原体阶段IV患者,以及他们所有相关的测定:```{r,结果='hide'} miniacc [,miniacc $ pathologic_stage ==”阶段iv“,]```和以下内容只会保留RNA-SEQ数据集,只有此测定的患者可用:```{R,结果='HICK'} MINIACC [,“rnaseq2genormorm”]```### Subsetting by genomic ranges If any ExperimentList objects have features represented by genomic ranges (e.g. `RangedSummarizedExperiment`, `RaggedExperiment`), then a `GRanges` object in the first subsetting position will subset these objects as in `GenomicRanges::findOverlaps()`. ## Double bracket `[[` The "double bracket" method (`[[`) is a convenience function for extracting a single element of the `MultiAssayExperiment` `ExperimentList`. It avoids the use of `experiments(mae)[[1L]]`. For example, both of the following extract the `ExpressionSet` object containing RNA-seq data: ```{r} miniACC[[1L]] #or equivalently, miniACC[["RNASeq2GeneNorm"]] ``` ## Patients with complete data `complete.cases()` shows which patients have complete data for all assays: ```{r} summary(complete.cases(miniACC)) ``` The above logical vector could be used for patient subsetting. More simply, `intersectColumns()` will select complete cases and rearrange each `ExperimentList` element so its columns correspond exactly to rows of `colData` in the same order: ```{r} accmatched = intersectColumns(miniACC) ``` Note, the column names of the assays in `accmatched` are not the same because of assay-specific identifiers, but they have been automatically re-arranged to correspond to the same patients. In these TCGA assays, the first three `-` delimited positions correspond to patient, ie the first patient is *TCGA-OR-A5J2*: ```{r} colnames(accmatched) ``` ## Row names that are common across assays `intersectRows()` keeps only rows that are common to each assay, and aligns them in identical order. For example, to keep only genes where data are available for RNA-seq, GISTIC copy number, and somatic mutations: ```{r} accmatched2 <- intersectRows(miniACC[, , c("RNASeq2GeneNorm", "gistict", "Mutations")]) rownames(accmatched2) ```

回到顶部

#提取##测定和测定“测定”和“分析”和“分析”方法遵循“摘要化学性”公约。“测定”(奇异)方法将提取“实验名单”的第一个元素,并将返回“矩阵”。```{r}类(测定(MINIACC))```“分析”(多个)方法将返回数据的“Signelist”数据,其中每个元素是“矩阵”。```{R}测定(MINIACC)```*键点:* *,而`[[[[[[`返回作为其原始类的测定,`assay()`和`massays()`将测定数据转换为矩阵形式。

回到顶部

#概要插槽和访问器中的插槽插槽可以使用其访问器功能访问或设置,或设置:|插槽|配件|| ------ | ---------- ||`实验列表'|`实验()`||`coldata``coldata()`和`$`* | | `sampleMap` | `sampleMap()` | | `metadata` | `metadata()` | __*__ The `$` operator on a `MultiAssayExperiment` returns a single column of the `colData`. # Transformation / reshaping The `longFormat` or `wideFormat` functions will "reshape" and combine experiments with each other and with `colData` into one `DataFrame`. These functions provide compatibility with most of the common R/Bioconductor functions for regression, machine learning, and visualization. ## `longFormat` In _long_ format a single column provides all assay results, with additional optional `colData` columns whose values are repeated as necessary. Here *assay* is the name of the ExperimentList element, *primary* is the patient identifier (rowname of colData), *rowname* is the assay rowname (in this case genes), *colname* is the assay-specific identifier (column name), *value* is the numeric measurement (gene expression, copy number, presence of a non-silent mutation, etc), and following these are the *vital_status* and *days_to_death* colData columns that have been added: ```{r} longFormat(miniACC[c("TP53", "CTNNB1"), , ], colDataCols = c("vital_status", "days_to_death")) ``` ## `wideFormat` In _wide_ format, each feature from each assay goes in a separate column, with one row per primary identifier (patient). Here, each variable becomes a new column: ```{r} wideFormat(miniACC[c("TP53", "CTNNB1"), , ], colDataCols = c("vital_status", "days_to_death")) ``` # MultiAssayExperiment class construction and concatenation ## MultiAssayExperiment constructor function The `MultiAssayExperiment` constructor function can take three arguments: 1. `experiments` - An `ExperimentList` or `list` of data 2. `colData` - A `DataFrame` describing the patients (or cell lines, or other biological units) 3. `sampleMap` - A `DataFrame` of `assay`, `primary`, and `colname` identifiers The miniACC object can be reconstructed as follows: ```{r} MultiAssayExperiment(experiments=experiments(miniACC), colData=colData(miniACC), sampleMap=sampleMap(miniACC), metadata=metadata(miniACC)) ``` ## `prepMultiAssay` - Constructor function helper The `prepMultiAssay` function allows the user to diagnose typical problems when creating a `MultiAssayExperiment` object. See `?prepMultiAssay` for more details. ## `c` - concatenate to MultiAssayExperiment The `c` function allows the user to concatenate an additional experiment to an existing `MultiAssayExperiment`. The optional `sampleMap` argument allows concatenating an assay whose column names do not match the row names of `colData`. For convenience, the _mapFrom_ argument allows the user to map from a particular experiment **provided** that the **order** of the colnames is in the **same**. A `warning` will be issued to make the user aware of this assumption. For example, to concatenate a matrix of log2-transformed RNA-seq results: ```{r} miniACC2 <- c(miniACC, log2rnaseq = log2(assays(miniACC)$RNASeq2GeneNorm), mapFrom=1L) experiments(miniACC2) ```

回到顶部

每种分析组合有多少样本数据?**解决方案**内置的' upsetSamples '创建一个"upset"维恩图来回答这个问题:' ' '在这个数据集只有43个样本都5化验,32人失踪反相蛋白(RPPAArray), 2人失踪突变,1是失踪gistict, 12只突变和gistict,等。# # kaplan meier pathology_N_stage创建一个kaplan meier阴谋情节分层,使用pathology_N_stage作为分层变量。**解决方案** colData提供的临床数据,如Kaplan-Meier图,总体生存率分层的节点。{r} suppressPackageStartupMessages({library(survival) library(survminer)}) Surv(miniACC$days_to_death, miniACC$vital_status)' '并删除所有丢失的患者总体生存信息:' ' {r} miniACCsurv <- miniACC[,完整。例(miniACC$days_to_death, miniACC$vital_status),] ``` ```{r} fit <- surfit (Surv(days_to_death, vital_status) ~ pathology_N_stage, data = colData(miniACCsurv)) ggsurvplot(fit, data = colData(miniACCsurv)), risk。表= TRUE)多变量Cox回归包括RNA-seq、拷贝数和病理选择*EZH2*基因进行验证。{r} wideacc = wideFormat(miniACC["EZH2",,], colDataCols=c("vital_status", "days_to_death", "pathology_N_stage")) wideacc$y = Surv(wideacc$days_to_death, wideacc$vital_status) head(wideacc){r} coxph(Surv(days_to_death, vital_status) ~ gistict_EZH2 + log2(RNASeq2GeneNorm_EZH2) + pathology_N_stage, data=wideacc) ``` We see that *EZH2* expression is significantly associated with overal survival (p < 0.001), but *EZH2* copy number and nodal status are not. This analysis could easily be extended to the whole genome for discovery of prognostic features by repeated univariate regressions over columns, penalized multivariate regression, etc. For further detail, see the main MultiAssayExperiment vignette.

回到顶部

对于所有存在重复拷贝数(gistict assay)和RNA-seq的基因,计算log2(RNAseq + 1)和拷贝数之间的相关性。创建这些相关性的直方图。将其与所有*不匹配的*基因拷贝数对之间的相关性直方图进行比较。**解决方案**首先,缩小miniACC的范围,只需要分析:```{r} subacc <- miniACC[,, c("RNASeq2GeneNorm", "gistict")] ``对齐行和列,只保留有两种分析方法的样本:```{r} subacc <- intersectColumns(subacc) subacc <- intersectRows(subacc)创建一个数字矩阵列表:' ' ' {r} subacc。< -化验列表(subacc)对RNA-seq检测进行Log-transform: ' ' {r} subacc. ' '[[1]] < - log2列表(subacc。列表([1])+ 1)' ' ' ' ' '转置两者,使基因在列中:' ' ' ' {r} subacc。< -拉普(subacc列表。列表,t)计算第一个矩阵中的列和第二个矩阵中的列之间的相关性:列表[[1]],subacc.list [[2]])最后,创建直方图:' ' ' {r} hist(diag(corres)) hist(corres[upper.tri(corres)])对于与拷贝数相关性最高的基因,做一个log2表达与拷贝数的box plot。 **Solution** First, identify the gene with highest correlation between expression and copy number: ```{r} which.max(diag(corres)) ``` You could now make the plot by taking the EIF4E columns from each element of the list subacc.list *list* that was extracted from the subacc *MultiAssayExperiment*, but let's do it by subsetting and extracting from the *MultiAssayExperiment*: ```{r} df <- wideFormat(subacc["EIF4E", , ]) head(df) ``` ```{r} boxplot(RNASeq2GeneNorm_EIF4E ~ gistict_EIF4E, data=df, varwidth=TRUE, xlab="GISTIC Relative Copy Number Call", ylab="RNA-seq counts") ```

回到顶部

##识别相关主组件在每个测定中使用可用的样品,首先使用可用的样品进行五种测定的每个分析的主成分分析,以首先对数转换RNA-SEQ数据。使用前10个组件,计算所有分数之间的Pearson相关性,并将这些相关性绘制为热图,以识别各种分数的相关组分。**解决方法x < - log2(x + 1)} pc = prcomp(x,center = centre,scale。=比例。)返回(t(pc $ rotation [,1:10]))}``虽然它是可能的to do the following with a loop, the different data types do require different options for PCA (e.g. mutations are a 0/1 matrix with 1 meaning there is a somatic mutation, and gistict varies between -2 for homozygous loss and 2 for a genome doubling, so neither make sense to scale and center). So it is just as well to do the following one by one, concatenating each PCA results to the MultiAssayExperiment: ```{r} miniACC2 <- intersectColumns(miniACC) miniACC2 <- c(miniACC2, rnaseqPCA=getLoadings(assays(miniACC2)[[1]], dolog=TRUE), mapFrom=1L) miniACC2 <- c(miniACC2, gistictPCA=getLoadings(assays(miniACC2)[[2]], center=FALSE, scale.=FALSE), mapFrom=2L) miniACC2 <- c(miniACC2, proteinPCA=getLoadings(assays(miniACC2)[[3]]), mapFrom=3L) miniACC2 <- c(miniACC2, mutationsPCA=getLoadings(assays(miniACC2)[[4]], center=FALSE, scale.=FALSE), mapFrom=4L) miniACC2 <- c(miniACC2, miRNAPCA=getLoadings(assays(miniACC2)[[5]]), mapFrom=5L) ``` Now subset to keep *only* the PCA results: ```{r} miniACC2 <- miniACC2[, , 6:10] experiments(miniACC2) ``` Note, it would be equally easy (and maybe better) to do PCA on all samples available for each assay, then do intersectColumns at this point instead. Now, steps for calculating the correlations and plotting a heatmap: * Use *wideFormat* to paste these together, which has the nice property of adding assay names to the column names. * The first column always contains the sample identifier, so remove it. * Coerce to a matrix * Calculate the correlations, and take the absolute value (since signs of principal components are arbitrary) * Set the diagonals to NA (each variable has a correlation of 1 to itself). ```{r} df <- wideFormat(miniACC2)[, -1] mycors <- cor(as.matrix(df)) mycors <- abs(mycors) diag(mycors) <- NA ``` To simplify the heatmap, show only components that have at least one correlation greater than 0.5. ```{r} has.high.cor <- apply(mycors, 2, max, na.rm=TRUE) > 0.5 mycors <- mycors[has.high.cor, has.high.cor] pheatmap::pheatmap(mycors) ``` The highest correlation present is between PC2 of the RNA-seq assay, and PC1 of the protein assay.

回到顶部

##用范围注释,转换所有`实验室列表`miniacc`到`summarizedexperiment`对象。然后使用“rowranges`”用基因组范围向这些对象注释。对于MicroRNA测定,用预测目标的基因组坐标进行注释。**解决方案**首先,制作一个新对象并将其实验转换为概括的对象:```{r} suppressPackageStartUpMessages(图书馆(摘要化))SEACC < - MINIACC实验(SEACC)SEACC [[1]] < - 概括分析(exprs(Seacc [[1]))Seacc [[3]] < - 概述药精(Exprs(Seacc [3]))Seacc [[4]] < - 概述(Seacc [[4])Seacc [[5]] < - 概括过分(EXPRS(SEACC [[5]))实验(SEACC)````````````````````````````````````向上的范围,并将其作为格兰切列单返回,可以用于替换摘要化的对象的Rowranges:```{r} getrr <函数(标识符,ensdbfilterfunc =“symbolfilter”){suppresspackageStartUpMessages({库(注释Filter)库(ensdb.hsapiens.v86)})有趣< - get(ensdbfilterfunc)edb < - ensdb.hsapiens.v86 afl < - AnnotationFilterList(乐趣(标识符),SEQNAMEFilter(C(1:21,“x”,“Y”)),txbiotypefilter(“蛋白质_coding”))gr < - 基因(EDB,滤波器= AFL)GRL < - 拆分(GR,因子(标识符))GRL < - GRL [匹配(标识符,名称(GRL))STOPIFNOT(相同(名称)(grl),标识符))返回(grl)}}```例如:```{r} getrr(rowname(seacc)[[1]])```使用它设置实验的Rowranges 1-4与这些grangeslist对象```{r}为(i在1:4){lowranges(seacc [[i]])< - getrr(seacc [[i]))}```请注意班级实验1-4现在'ulandsummarizedexperiment`:```{r}实验(Seacc)````with multiasaysayexperiment中的远程对象,您可以通过范围进行扩展。 For example, select all genes on chromosome 1 for the four *rangedSummarizedExperiment* objects: ```{r} seACC[GRanges(seqnames="1:1-1e9"), , 1:4] ``` *Note about microRNA*: You can set ranges for the microRNA assay according to the genomic location of those microRNA, or the locations of their predicted targets, but we don't do it here. Assigning genomic ranges of microRNA targets would be an easy way to subset them according to their targets.

回到顶部

## Shiny应用程序* Maeview.r *函数在该研讨会中定义的函数打开一个有关类似TCGA对象的闪亮应用程序,以识别和可视化RNA-SEQ表达式,G基本拷贝数峰和MicroRNA表达之间的关系。对于指定的基因,您可以查看表达式与拷贝数的Boxplot,并使用* Limma *鉴定与该基因表达相关的MicroRNA。```{r,eval = false} multiassayexperimentworkshop :: maeview(miniaccc)```

回到顶部

# Session info ' ' ' {r} sessionInfo()' ' '[癌症基因组图谱]:https://cancergenome.nih.gov/