---标题:“综合分析研讨会与TCGabiolinks和Elmer - 获取数据”作者:“Tiago Chedraoui Silva,Simon Coetzee,Dennis Hazelett,Ben Berman,Houtan Noushmehr”日期:“`r sys.date()`”输出:html_document:self_contined:true number_sections:没有主题:andry突出显示:tango mathjax:null toc:true toc_float:true toc_float- 获取数据}%\ vignetteengine {knitr :: Rarmmardown}%\ vignetteencoding {utf-8} ---```{r,echo = false,hide = true,message = false,warning = false} devtools :: load_all(“。”)```#介绍本节中,我们将学习从新创建的[NCI基因组数据公共(GDC)门户网站(HTTPS:)搜索和下载DNA甲基化(表观遗传学)和基因表达(转录)数据](HTTPS://portal.gdc.cancer.gov/)并准备将它们融入概述实验对象。下图突出显示本节将涵盖的工作流部分。![本节中涵盖的工作流程的一部分](图/ Workflow_Tgcabiolinks.png)#下载数据##加载所需的库```{R libs,eval = true,message = false,warning = f}库(tcgabiolinks)库(概述)库(DT)库(DT)库(DT)```##基因表达式```{R TCGabiolinks-exp,eval = false} query.exp < - gdcquery(project =“tcga-lusc”,data.category =“转录组分析”,data.type =“基因表达量化”,workflow.type =“htseq - fpkm-uq”,条形码= c(“Tcga-34-5231-01”,“TCGA-77-7138-01”))gdcdownload(query.exp)exp < - gdcprepare(query = query.exp,save = true,save.filename =“exp_lusc.rda”,summarizedexperiment = true)`````{r tcgabiolinks-exp-objj,eval = true} exp coldata(exp)%>%as.data.frame%>%datapable(options = list(scrollx = true),rownames = true)测定(exp)[1:5,]%>%datapable(选项=列表(Scrollx = True),Rownames = True)ROWRANGES(EXP)````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````ICoctor包[TcGabiolinks](http://biocondion.org/packages/tcgabiolinks/)[@tcgabiolinks]从[nci基因组数据公共(gdc)portal](https://portal.gdc.cancer.gov/)。 In this example, we will download DNA methylation data (Infinium HumanMethylation450 platform) for two TCGA-LUSC (TCGA Lung Squamous Cell Carcinoma) samples. GDCquery function will search in the GDC database for the information required to download the data, this information is used by the `GDCdownload` function which will request the files to GDC, those files will be compacted into a 76 MB tar.gz file. After the download is completed `GDCdownload` will uncompress the tar.gz file and move its files to a folder; the default is GDCData/(Project)/(source)/(data.category)/(data.type)), in our example, it will be `GDCdata/TCGA-LUSC/harmonized/DNA_Methylation/Methylation_Beta_Value/` ![Data saved after GDCdownload is executed](figures/folder_structure.png) Finally, `GDCprepare` transforms the downloaded data into a [summarizedExperiment](//www.andersvercelli.com/packages/SummarizedExperiment/) object [@huber2015orchestrating] or a data frame. If *SummarizedExperiment* is set to TRUE, TCGAbiolinks will add to the object molecular sub-type information, which was defined by The Cancer Genome Atlas (TCGA) Research Network reports (the full list of papers can be seen in [TCGAquery\_subtype section](//www.andersvercelli.com/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/tcgaBiolinks.html#tcgaquery_subtype-working-with-molecular-subtypes-data.) in TCGAbiolinks vignette), and clinical information. ```{r tcgabiolinks-met, eval=FALSE} query.met <- GDCquery(project = "TCGA-LUSC", data.category = "DNA Methylation", platform = "Illumina Human Methylation 450", barcode = c("TCGA-34-5231-01A-21D-1818-05","TCGA-77-7138-01A-41D-2043-05")) GDCdownload(query.met) met <- GDCprepare(query = query.met, save = TRUE, save.filename = "DNAmethylation_LUSC.rda", summarizedExperiment = TRUE) ``` The object created is a Sum ```{r tcgabiolinks-met-obj, eval=TRUE} met colData(met) %>% as.data.frame %>% datatable(options = list(scrollX = TRUE), rownames = TRUE) assay(met)[1:5,] %>% datatable (options = list(scrollX = TRUE), rownames = TRUE) rowRanges(met) ``` # Session Info ```{r sessioninfo, eval=TRUE} sessionInfo() ``` # Bibliography