输出:BiocStyle::html_document: toc: true highlight: haddock css: style.css——{r include=FALSE}库(BiocStyle) knitr::opts_chunk$set(eval =FALSE){r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)目前有三种方法可以在CGC上找到你需要的数据-最简单:使用我们强大而漂亮的GUI称为“数据浏览器”在平台上交互,请阅读教程[这里](http://docs.cancergenomicscloud.org/docs/the-data-browser) -最先进:对于高级用户,请直接查询SPARQL[教程](http://docs.cancergenomicscloud.org/docs/query-tcga-metadata-programmatically#section-example-queries) -最甜:使用我们的数据集的API,通过创建一个查询列表中R(很快,请等待)#快速启动# #图形数据浏览请阅读教程(这里)(http://docs.cancergenomicscloud.org/docs/the-data-browser) # # SPARQL例子七桥的SPARQL控制台,可以在[https://opensparql.sbgenomics.com] (https://opensparql.sbgenomics.com)。请先阅读以下教程- (TCGA查询元数据编程方式)(http://docs.cancergenomicscloud.org/docs/query-tcga-metadata-programmatically section-example-queries)——(TCGA的例子在SPARQL查询元数据)(http://docs.cancergenomicscloud.org/docs/sample-sparql-queries)在这里我给你一个例子,你将需要R包"SPARQL" ' ' ' {R, eval = FALSE}库(SPARQL) endpoint = "https://opensparql.sbgenomics.com/bigdata/namespace/tcga_metadata_kb/sparql" query = "前缀rdfs: tcga前缀: select distinct ?case ?sample ?file_name ?path ?xs_label ?subtype_label where {?case a tcga: case . subtype_label ??case tcga:hasDiseaseType ??disease_type rdfs:标记为“肺腺癌”。?病例tcga: hashisticaldiagnosis ?hd。?hd rdfs:标记“肺腺癌混合亚型”。案例tcga: hasfollow - up ?follow_up。?follow_up tcga:hasDaysToLastFollowUp ?days_to_last_follow。filter(?days_to_last_follow_up>550) ?follow_up tcga:hasVitalStatus ?rdfs:label ?filter(?vital_status_label='Alive') ?case tcga:hasDrugTherapy ? ?drug_therapy tcga:hasPharmaceuticalTherapyType ?pt_type . ?pt_type rdfs:label ?pt_type_label . filter(?pt_type_label='Chemotherapy') ?case tcga:hasSample ?sample . ?sample tcga:hasSampleType ?st . ?st rdfs:label ?st_label filter(?st_label='Primary Tumor') ?sample tcga:hasFile ?file . ?file rdfs:label ?file_name . ?file tcga:hasStoragePath ?path. ?file tcga:hasExperimentalStrategy ?xs. ?xs rdfs:label ?xs_label . filter(?xs_label='WXS') ?file tcga:hasDataSubtype ?subtype . ?subtype rdfs:label ?subtype_label } " qd <- SPARQL(endpoint,query) df <- qd$results head(df) ``` You can use the CGC API to access the TCGA files found using SPARQL queries. To get files that have download links, you will need to use the SPARQL variable __path__ in your query. ```{r} ## api(api_url=base,auth_token=auth_token,path='action/files/get_ids', method='POST',query=None,data=filelist) df.path <- df[,"path"] df.path ``` You can directly copy those files to a project, make sure if the files is controled access - project support TCGA controlled access - you login from ERA Common. ```{r} library(sevenbridges) a = Auth(platform = "cgc", username = "tengfei") ## get id (only works for CGC platform) ids = a$get_id_from_path(df.path) ## copy file from id to project with controlled access (p = a$project(id = "tengfei/control-test")) a$copyFile(ids, p$id) ``` Now have fun with more examples in this [tutorial](http://docs.cancergenomicscloud.org/docs/query-tcga-metadata-programmatically#section-example-queries) ## Dataset API examples (not released)