---标题:“AnnotationHub Recipes”作者:“Marc Carlson,Sonali Arora”日期:“”r Biocstyle :: doc_date()`“package:”`r biocstyle :: pkg_ver('biocannotres2015')`“摘要:>AnnotationHub Recipes Vignette:>%\ VignetteIndexentry {AnnotationHub Recipes}%\ Vignetteencoding {UTF-8}%\ Vignetteengine {Knitr :: Rarmmardown}输出:Biocstyle :: Html_Document:Toc:True --- ---`` {R样式,回声= false,结果='asis'}生物焦质:: markdown()``````{r,echo = false,结果=“hide”}库(knitr)opts_chunk $ set(错误= false)``` -- 作者:Marc RJ Carlson和Sonali Arora关键词:注释,下一代测序,R,Biocumon --- ##如果您正在阅读此过程,则为(希望)是因为您打算编写一些代码将允许将在线资源处理到要通过AnnotationHub包提供的R对象中。为此,您必须执行四个基本步骤(下面概述)。这些步骤将您编写两个功能,然后调用第三个功能,为您做一些自动设置。第一个函数将包含有关如何处理在线存储到元数据中的数据的说明,以便描述注释空中的新R资源。第二个函数用于描述如何采用这些在线资源并将其转换为对最终用户有用的R对象。##设置应该去,而不说这个Vignette适用于对R的舒适的用户来说,并且为了遵循此Vignette中的本文,您需要安装AnnotationHubData包。 This package is not meant to be used by most people, and in fact it's not really intended to be anything other than a support package. So it's not exposed via biocLite(). So to get it you will need to use svn to check it out from the following location: ```{} https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/AnnotationHubData ``` Once you have that checked out, you will need to use R CMD INSTALL to install the package from source. ## Introducing AnnotationHubMetadata Objects The AnnotationHubData package is a complementary package to the AnnotationHub package that provides a place where we can store code that processes online resources into R objects suitable for access through the AnnotationHub package. But before you can understand the requirements for this package it is important that you 1st learn about the objects that are used as intermediaries between the hub and its web based repository behind the scenes. That means that you need to know about AnnotationHubMetadata objects. These objects store the metadata that describes an online resource. And if you want to see a set of online resources added to the repository and maintained, then it will be necessary to become familiar with the AnnotationHubMetadata constructor. For each online resource that you want to process into the AnnotationHub, you will have to be able to construct an AnnotationHubMetadata object that describes it in detail and that specifies where the recipe function lives. The steps involved include writing a recipe which adds files to AnnotationHub and can be summarized briefly as : - Writing a function which takes the metadata about the resource and processes them into AnnotationHubMetadata objects - Optional step : Write an additional function specifying how the files need to pre-processed. The data from these files is transformed into an R object that is useful to end users. - Optional step: Write a function specifying how the files need to be post-processed once downloaded to a user's local cache. ## Step 1: Writing your AnnotationHubMetadata generating function The following example function takes files from the latest release of inparanoid and processes them into AnnotationHubMetadata objects using Map. The 1st function you need to provide is one that processes some online resources into AnnotationHubMetadata objects. This function MUST return a list of AnnotationHubMetadata objects. It can rely on other helper functions that you define, but ultimately it (and it's helpers) need to contain all of the instructions needed to find resources and process those resources into AnnotationHubMetadata objects. The calling of the Map function is really the important part of this function, as it shows the function creating a series of AnnotationHubMetadata objects. Prior to that, the function was just calling out to other helper functions in order to process the metadata so that it could be passed to the AnnotationHubMetadata constructor using Map. Notice how one of the fields specified by this function is the Recipe, which indicates both the name and location of the recipe function. We expect most people will want to submit their recipe to the same package as they are submitting their metadata processing function. ```{r, exampleInpProcessing} makeinparanoid8ToAHMs <- function(currentMetadata){ baseUrl <- 'http://inparanoid.sbc.su.se/download/current/Orthologs_other_formats' ## Make list of metadata in a helper function meta <- .inparanoidMetadataFromUrl(baseUrl) ## then make AnnotationHubMetadata objects. Map(AnnotationHubMetadata, Description=meta$description, Genome=meta$genome, SourceFile=meta$sourceFile, SourceUrl=meta$sourceUrl, SourceVersion=meta$sourceVersion, Species=meta$species, TaxonomyId=meta$taxonomyId, Title=meta$title, RDataPath=meta$rDataPath, MoreArgs=list( Coordinate_1_based = TRUE, DataProvider = baseUrl, Maintainer = "Marc Carlson , RDataClass = "SQLiteFile", rdatatateadded = Sys.time(), RDataVersion = "0.0.1", Recipe = "AnnotationHubData::: Inparanoid 8todbsrecipe ", Tags = c("Inparanoid", "Gene", "Homology",”注释 "))) } ``` 现在我们继续之前在第二步是一个清单的不同参数,预计AnntotationHubMetadata对象可以和类的:' ' ' {}AnnotationHubRoot: '字符(1)的绝对路径目录结构包含资源添加到AnnotationHub SourceUrl:SourceType: 'character()'表示最初处理的资源类型。如果是单一文件类型,首选项是命名资源的类型;如果是复合资源,首选项是命名资源的来源。所以典型的答案是:“BED”,“FASTA”或“Inparanoid”等。SourceLastModifiedDate: 'POSIXct()'源文件最后一次修改的日期。保留此空白应该允许为您检索这些值(如果您的sourceURL是有效的)。SourceMd5: 'character()'原始文件的md5哈希SourceSize: 'numeric(1)'原始文件的字节数DataProvider: 'character(1)'这个资源从哪里来?标题:'character(1)'资源描述:'character(1)'资源描述:'character(1)'物种名称TaxonomyId: 'character(1)' NCBI code Genome: 'character(1)'基因组名称build Tags: 'character()' Free-form Tags Recipe: 'character(1)'配方函数名称RDataClass:派生对象的'character(1)'类(例如: 'GRanges') RDataDateAdded: 'POSIXct()' Date added to AnnotationHub. Used to determine snapshots. RDataPath: 'character(1)' file path to serialized form Maintainer: 'character(1)' Maintainer name and email address, 'A Maintainer 坐标从1开始还是从0开始?'character(1)'字符串,用于指示在资源下载时客户端应该调用哪个代码。这通常与RDataClass相同。但是允许它是一个不同的值,这样客户端就可以在需要的时候在内部做一些不同的事情。Location_Prefix: 'character(1)'这是为仅存储元数据且资源本身来自第三方网站的资源添加的。位置前缀表示资源来自的基本路径,默认值将来自我们自己的站点。'character()'记录有关资源的信息。你需要编写的第二种函数被称为Recipe函数。它始终必须使用单个AnnotationHubMetadata对象作为参数。recipe函数的工作是使用AnnotationHubMetadata对象中的元数据来生成一个R对象或数据文件,稍后可以从AnnotationHub服务检索该对象或数据文件。 Below is a recipe function that calls some helper functions to generate an inparanoid database object from the metadata stored in it's AnnotationHubMetadata object. ```{r, exampleRecipe} inparanoid8ToDbsRecipe <- function(ahm){ require(AnnotationForge) inputFiles <- metadata(ahm)$SourceFile dbname <- makeInpDb(dir=file.path(inputFiles,""), dataDir=tempdir()) db <- loadDb(file=dbname) outputPath <- file.path(metadata(ahm)$AnnotationHubRoot, metadata(ahm)$RDataPath) saveDb(db, file=outputPath) outputFile(ahm) } ``` ## Note for step 1 and step 2 While writing this function, care has to be taken for a couple of fields: Case 1 - If the file just needs to be downloaded and only post-processed in users local cache then 1) SourceUrls = Location_Prefix + RDataPath 2) Recipe = NA_character_ Example - ```{} SourceUrls="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToRn5.over.chain.gz", RDataPath="goldenPath/hg38/liftOver/hg38ToRn5.over.chain.gz" , Location_Prefix = "http://hgdownload.cse.ucsc.edu/", ``` Case 2 - If the recipe needs to retrieve a file from an external website, pre-process it, store this pre-processed file at our amazon location and always render the pre-processed file ( not the original file) to the user then 1) SourceUrls should merely document the original location of the untouched file 2) Location_Prefix + RDataPath should be equal to the file path on the amazon machine where all pre-processed files are stored. 3) Recipe = helper function which tells us how to pre-process the original file Example - ```{} SourceUrls="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToRn5.over.chain.gz", Location_Prefix = "http://s3.amazonaws.com/annotationhub/", RDataPath="chainfile/dummy.Rda" ``` If this seems confusing, please note how in both of these cases, the sourceUrl needs to reflect the location that the resource will actually come from once when the client is in use. ## Step 3: Function for Post-processing a File in User's cache. One can post-process the file when it is instantiated into AnnotationHub from the user's cache. An example, would be a BED file is downloaded to the user's cache, and we want AnnotationHub to read it as a `GRanges` using `rtrackler::import` Then along with your recipe, one would write a class to be included inside AnnotationHub as shown below- ```{r eval=FALSE} setClass("BEDFileResource", contains="AnnotationHubResource") setMethod(".get1", "BEDFileResource", function(x, ...) { .require("rtracklayer") yy <- .hub(x) dat <- rtracklayer::BEDFile(cache(yy)) rtracklayer::import(dat, format="bed", genome=yy$genome, ...) }) ``` If you need to do this with a set of files that you are crafting a recipe for, you will need to coordinate with us so that we can patch the appropriate supporting code into the client. Alternatively, you can make sure to set the RDataClass to an existing value (one that we already have a method for). ## Step 4: Test your functions and then contact us when they work So at this point you should make sure that the AnnotationHubMetadata generating function produces a list of AnnotationHubMetadata objects and that your recipe produces a path to a file that is generated in the way that you expect it to. Once this happens you should contact us about running your recipe so that your data can actually be put into the hub. ## Session Information ```{r, SessionInfo, echo=FALSE} sessionInfo() ```