ASPLI:gbcounts错误
2
0.
输入编辑模式
jbono.•0
@ jbono-7682
最后看见6小时前
美国

我正在努力在ASPLI中运行GBCounts。它似乎通过一些样本进行工作,但是我收到一条错误消息,我不知道如何解决。我很欣赏任何洞察力。这是我使用的代码:

> dmel.6.38.txdb = maketxdbfromgff(+ file =“dmel-all-r6.38.gtf”,+ format =“gtf”)从文件中导入基因组特征作为granges对象......确定准备'metadata'数据帧...确定使TXDB对象...确定警告消息:1:在.get_cds_idx(mcols0 $ type,mcols0 $阶段):“阶段”元数据列包含Type Stop_Codon功能的非Na值。这个信息被忽略了。2:在maketxdbfromgranges(gr,metadata =元数据):下降,因为它们的外显子等级不能被推断出来(因为外显子不是在同一染色体/链中,或者因为它们没有通过内含子分开):fbtr0084079,FBTR0084080,FBTR0084081,FBTR0084082,FBTR0084083,FBTR0084084,FBTR0084085,FBTR0307759,FBTR0307769,FBTR0307769,FBTR0307760 3:IN .RECT_TRANSCRIPTS(BAD_TX,因为):以下成绩称被拒绝,因为它们具有无法映射到外显子:FBTR0100857,FBTR0100863,FBTR0100863,FBTR0100863,FBTR0100863,FBTR01008500,FBTR0100863,FBTR0433500fbtr0433501> savedb(dmel.6.38.txdb,file =“dmel.6.38.txdb.sqlite”)txdb对象:#db类型:txdb#支持包:GenomicFeatures#数据源:DMEL-All-R6.38.gtf#Organism:na#taxononomy id:na#mirbase构建ID:na#genome:na #ta of renscripts:35367#db创建的:基因组法包从biocomadion#创建时间:2021-03-18 12:37:55 -0600(星期四,2010年38月18日)#GenomicFeatures在创建时的版本:1.42.2#RSQLite v创建时间:2.2.4#dbschemaversion:1.2>功能< -  bingenome(dmel.6.38.txdb)*提取的基因数= 17869 *提取的外显子箱数= 80637 *提取的内含子箱数= 72288 *数量提取的分录= 35367 *提取的结= 60431 *作为箱子的数量(不包括外部)= 9547 *作为箱子的数量(包括外部)= 9557 *分类为:ES in = 2427(25%)IR BINS = 1257(13%)ALT5's in = 1497(16%)ALT3's in = 1622(17%)多重,如箱子= 2744(29%)分类为:ES箱= 531(19%)IR箱= 492(18%) Alt5'ss bins = 885 (32%) Alt3'ss bins = 725 (26%) > targets=read.csv("Targets.csv") > getConditions(targets) [1] "Mutant_F_1D" "Mutant_M_1D" "Mutant_F_28D" "Mutant_M_28D" "Control_F_1D" "Control_M_1D" "Control_F_28D" [8] "Control_M_28D" > gbcounts <- gbCounts( features = features, + targets = targets, + minReadLength = 100, maxISize = 50000, + strandMode=0) Summarizing Mutant_F_1D_1 ETA: 53 min Summarizing Mutant_F_1D_2 ETA: 49 min Summarizing Mutant_F_1D_3 ETA: 46 min Summarizing Mutant_M_1D_1 ETA: 43 min Summarizing Mutant_M_1D_2 ETA: 41 min Summarizing Mutant_M_1D_3 ETA: 38 min Summarizing Mutant_F_28D_1 [1] 7 Error in .subset(x, j) : only 0's may be mixed with negative subscripts In addition: Warning message: In colnames(counts@junction.counts)[9:ncol(counts@junction.counts)] <- rownames(targets) : number of items to replace is not a multiple of replacement length** > sessionInfo() R version 4.0.4 (2021-02-15) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.42.2 GenomicRanges_1.42.0 GenomeInfoDb_1.26.4 ASpli_2.0.0 AnnotationDbi_1.52.0 [6] IRanges_2.24.1 S4Vectors_0.28.1 Biobase_2.50.0 BiocGenerics_0.36.0 edgeR_3.32.1 [11] limma_3.46.0 loaded via a namespace (and not attached): [1] colorspace_2.0-0 ellipsis_0.3.1 biovizBase_1.38.0 htmlTable_2.1.0 [5] XVector_0.30.0 base64enc_0.1-3 dichromat_2.0-0 rstudioapi_0.13 [9] DT_0.17 bit64_4.0.5 fansi_0.4.2 xml2_1.3.2 [13] splines_4.0.4 cachem_1.0.4 knitr_1.31 Formula_1.2-4 [17] Rsamtools_2.6.0 cluster_2.1.1 dbplyr_2.1.0 png_0.1-7 [21] BiocManager_1.30.10 compiler_4.0.4 httr_1.4.2 backports_1.2.1 [25] lazyeval_0.2.2 assertthat_0.2.1 Matrix_1.3-2 fastmap_1.1.0 [29] htmltools_0.5.1.1 prettyunits_1.1.1 tools_4.0.4 igraph_1.2.6 [33] gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.4 dplyr_1.0.5 [37] rappdirs_0.3.3 tinytex_0.30 Rcpp_1.0.6 vctrs_0.3.6 [41] Biostrings_2.58.0 rtracklayer_1.50.0 xfun_0.22 stringr_1.4.0 [45] lifecycle_1.0.0 ensembldb_2.14.0 XML_3.99-0.6 zlibbioc_1.36.0 [49] MASS_7.3-53.1 scales_1.1.1 BiocStyle_2.18.1 BSgenome_1.58.0 [53] VariantAnnotation_1.36.0 ProtGenerics_1.22.0 hms_1.0.0 MatrixGenerics_1.2.1 [57] SummarizedExperiment_1.20.0 AnnotationFilter_1.14.0 RColorBrewer_1.1-2 yaml_2.2.1 [61] curl_4.3 memoise_2.0.0 gridExtra_2.3 ggplot2_3.3.3 [65] UpSetR_1.4.0 biomaRt_2.46.3 rpart_4.1-15 latticeExtra_0.6-29 [69] stringi_1.5.3 RSQLite_2.2.4 checkmate_2.0.0 BiocParallel_1.24.1 [73] rlang_0.4.10 pkgconfig_2.0.3 matrixStats_0.58.0 bitops_1.0-6 [77] evaluate_0.14 lattice_0.20-41 purrr_0.3.4 htmlwidgets_1.5.3 [81] GenomicAlignments_1.26.0 bit_4.0.4 tidyselect_1.1.0 plyr_1.8.6 [85] magrittr_2.0.1 R6_2.5.0 generics_0.1.0 Hmisc_4.5-0 [89] DelayedArray_0.16.2 DBI_1.1.1 pillar_1.5.1 foreign_0.8-81 [93] survival_3.2-10 RCurl_1.98-1.3 nnet_7.3-15 tibble_3.1.0 [97] crayon_1.4.1 utf8_1.2.1 BiocFileCache_1.14.0 rmarkdown_2.7 [101] jpeg_0.1-8.1 progress_1.2.2 locfit_1.5-9.4 grid_4.0.4 [105] data.table_1.14.0 blob_1.2.1 digest_0.6.27 tidyr_1.1.3 [109] openssl_1.4.3 munsell_0.5.0 Gviz_1.34.1 askpass_1.1
Aspli.Aspli.•198次观点
添加评论
0.
输入编辑模式
@ b6a1dc8b.
最后见过26分钟前
布宜诺斯艾利斯

嗨jbono,

第二和第三警告报告了一些奇怪的“FBGN0002781”,“FBGN0013680”,“FBGN0013675”,“FBGN0262952”,'FBGN0013684'基因注释。请您尝试从GTF文件中删除它们并重新运行分析?

您可以使用Grep命令轻松创建辅助foo.gtf文件,而无需违规基因:

grep -iv'fbgn0002781 \ | FBGN0013680 \ | FBGN0013675 ​​\ | FBGN0262952 \ | FBGN0013684'DMEL-All-R6.38.gtf> Foo.gtf

然后继续使用'foo.gtf'而不是原始文件。

最好的阿里尔

0.
输入编辑模式

嗨ariel,

谢谢你的建议!我提出了这些基因。来自'maketxdbfromgff'的一些警告消息消失了,但在运行gbcounts时,我仍然得到了相同的错误消息:.subset(x,j)中的错误,另外,只有0的混合0.在Colnames中(counts@junction.counts)[9:ncol(counts@junction.counts)] < - rownames(targets):替换的项目数不是更换长度的倍数。

我列出了下面的代码:

dmel.6.38.txdb.edited = maketxdbfromgff(+ file =“dmel-all-r6.gt.edited.gtf”,+ format =“gtf”)从文件中导入文件的基因组特征作为调格孔对象......确定准备“元数据”数据帧......确定txdb对象...确定警告消息:1:在.get_cds_idx(mcols0 $ type,mcols0 $阶段):“阶段”元数据列包含类型的非na值stop_codon。这个信息被忽略了。2:在for(i中的(i在seq_along(sname)){:关闭未使用的连接3(dmel-all-r6.38.edited.gtf)> savedb(dmel.6.38.txdb.edited,file =“dmel.6.38.txdb。edited.sqlite") TxDb object: # Db type: TxDb # Supporting package: GenomicFeatures # Data source: dmel-all-r6.gt.edited.gtf # Organism: NA # Taxonomy ID: NA # miRBase build ID: NA # Genome: NA # Nb of transcripts: 35345 # Db created by: GenomicFeatures package from Bioconductor # Creation time: 2021-03-31 10:57:52 -0600 (Wed, 31 Mar 2021) # GenomicFeatures version at creation time: 1.42.2 # RSQLite version at creation time: 2.2.4 # DBSCHEMAVERSION: 1.2 > features <- binGenome( Dmel.6.38.TxDb.edited ) * Number of extracted Genes = 17868 * Number of extracted Exon Bins = 80610 * Number of extracted intron bins = 72241 * Number of extracted trascripts = 35345 * Number of extracted junctions = 60405 * Number of AS bins (not include external) = 9544 * Number of AS bins (include external) = 9554 * Classified as: ES bins = 2427 (25%) IR bins = 1267 (13%) Alt5'ss bins = 1497 (16%) Alt3'ss bins = 1612 (17%) Multiple AS bins = 2741 (29%) classified as: ES bins = 530 (19%) IR bins = 491 (18%) Alt5'ss bins = 885 (32%) Alt3'ss bins = 724 (26%) > targets=read.csv("Targets.csv") > getConditions(targets) [1] "Mutant_F_1D" "Mutant_M_1D" "Mutant_F_28D" "Mutant_M_28D" "Control_F_1D" "Control_M_1D" "Control_F_28D" [8] "Control_M_28D" > gbcounts <- gbCounts( features = features, + targets = targets, + minReadLength = 100, maxISize = 50000, + strandMode=0) Summarizing Mutant_F_1D_1 ETA: 53 min Summarizing Mutant_F_1D_2 ETA: 50 min Summarizing Mutant_F_1D_3 ETA: 46 min Summarizing Mutant_M_1D_1 ETA: 44 min Summarizing Mutant_M_1D_2 ETA: 40 min Summarizing Mutant_M_1D_3 ETA: 38 min Summarizing Mutant_F_28D_1 [1] 7 Error in .subset(x, j) : only 0's may be mixed with negative subscripts In addition: Warning message: In colnames(counts@junction.counts)[9:ncol(counts@junction.counts)] <- rownames(targets) : number of items to replace is not a multiple of replacement length
添加回复
0.
输入编辑模式

嗨jeremy,似乎在um_f_28d_1 bam文件中可能存在结合信息的问题。这个文件是否与QC的QC角度不同的不同?你最终可以与我分享它,以便看出这个文件的内容是什么?Ariel.

0.
输入编辑模式

嗨ariel,

我还纠正BAM文件,以防万一,但仍然有错误。我认为没有与QC的角度有很大的不同,并且该文件已经为其他工作流程工作了。除非有办法在此处这样做,否则我可以通过电子邮件发送文件。谢谢你的帮助!

杰里米

添加回复
0.
输入编辑模式

嗨杰里米,

我无法重现你发给我的BAM报告的错误....你可以运行我使用的代码并告诉我它是如何运行的?

图书馆(基因组)库(ASPLI)DME < -  maketxdbfromGFF(file ='/ data1 / genomedata / dme / foo.gtf')特征< -  bingenome(dme)target < -  data.frame(row.names ='c28f1_1',bam ='c28f1_1.fq.gz.subjunc.sorted.bam',f1 ='c28f1')gb < -  gbcounts(特征,目标,minreadlength = 100,maxisize = 50000,strandmode = 0)#总结C28f1_1#读取概述Gene完成了垃圾箱的阅读摘要#ei1地区的阅读摘要已完成#通过IE2区域完成的#read摘要完成#结综合完成的GB#对象ASPLICOUNTS#基因计数:17868个基因分析。使用Countsg(对象)#基因RD:17868基因分析。使用RDSG访问(对象)#bin计数:143297分析。使用countsb(对象)#bin rd:143297分析。使用RDSB(Object)#结核数量:分析59250个连接。访问countsj(对象)
0.
输入编辑模式

嗨ariel,

这对我来说也很好(见下文)。只是为了确保我还重新划分原始分析并获得相同的错误。我还尝试了下面的测试,其中序列中的下一个BAM文件在实际造成问题的情况下,但也很好。

dme <- makeTxDbFromGFF(file="dmel-all-r6.gt.edit .gtf")从文件中导入基因组特征作为一个GRanges对象…准备“元数据”数据帧…创建TxDb对象…在. get_cds_idx (mcols0$type, mcols0$phase)中:“phase”元数据列包含了stop_codon类型的特性的非na值。这个信息被忽略了。> features_test < - binGenome(测距装置)*提取基因的数量= 17868 *提取外显子箱= 80610 *提取基因内区垃圾箱的数量= 72241 *提取trascripts = 35345 *提取的连接的数量= 60405 *数量的垃圾箱(不包括外部的)= 9544 *数量的垃圾箱(包括外部)= 9554 *分类为:ES箱= 2427(25%)红外箱= 1267 (13%)Alt5的箱= 1497 (16%)Alt3的箱= 1612(17%)多箱= 2741(29%)分为:ES箱= 530(19%)红外箱= 491 (18%)Alt5的箱= 885 (32%)Alt3的箱= 724 (26%)> targets_test < data.frame (row.names = C28F1_1, + = '的bam C28F1_1.fq.gz.subjunc.sorted。(2) <- gbCounts(features_test, targets_test, minReadLength = 100, maxISize = 50000, maxISize)strandMode = 0) summary C28F1_1 Read summarization by gene completed Read summarization by bin completed Read summarization by ei1 region completed Read summarization by ie2 region completed Read Junction summarization completed [1] 1 > gb_test ASpliCounts类对象基因计数:分析了17868个基因。使用计数(对象)基因RD: 17868个基因分析。使用rdsg(object)容器计数:分析了143297个容器。 Access using countsb(object) Bin RD: 143297 bins analysed. Access using rdsb(object) Junction counts: 59250 junctions analysed. Access using countsj(object) > sessionInfo() R version 4.0.4 (2021-02-15) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] tximport_1.18.0 IsoformSwitchAnalyzeR_1.12.0 ggplot2_3.3.3 DEXSeq_1.36.0 [5] RColorBrewer_1.1-2 DESeq2_1.30.1 BiocParallel_1.24.1 GenomicFeatures_1.42.3 [9] GenomicAlignments_1.26.0 Rsamtools_2.6.0 Biostrings_2.58.0 XVector_0.30.0 [13] SummarizedExperiment_1.20.0 MatrixGenerics_1.2.1 matrixStats_0.58.0 GenomicRanges_1.42.0 [17] GenomeInfoDb_1.26.4 GenomeInfoDbData_1.2.4 ASpli_2.0.0 AnnotationDbi_1.52.0 [21] IRanges_2.24.1 S4Vectors_0.28.1 Biobase_2.50.0 BiocGenerics_0.36.0 [25] Rsubread_2.4.3 edgeR_3.32.1 limma_3.46.0 loaded via a namespace (and not attached): [1] backports_1.2.1 AnnotationHub_2.22.0 Hmisc_4.5-0 [4] DRIMSeq_1.18.0 BiocFileCache_1.14.0 plyr_1.8.6 [7] igraph_1.2.6 lazyeval_0.2.2 tximeta_1.8.4 [10] splines_4.0.4 digest_0.6.27 ensembldb_2.14.0 [13] htmltools_0.5.1.1 fansi_0.4.2 magrittr_2.0.1 [16] checkmate_2.0.0 memoise_2.0.0 BSgenome_1.58.0 [19] cluster_2.1.1 readr_1.4.0 annotate_1.68.0 [22] askpass_1.1 prettyunits_1.1.1 jpeg_0.1-8.1 [25] colorspace_2.0-0 blob_1.2.1 rappdirs_0.3.3 [28] xfun_0.22 dplyr_1.0.5 jsonlite_1.7.2 [31] crayon_1.4.1 RCurl_1.98-1.3 genefilter_1.72.1 [34] survival_3.2-10 VariantAnnotation_1.36.0 glue_1.4.2 [37] gtable_0.3.0 zlibbioc_1.36.0 UpSetR_1.4.0 [40] DelayedArray_0.16.3 scales_1.1.1 futile.options_1.0.1 [43] DBI_1.1.1 Rcpp_1.0.6 xtable_1.8-4 [46] progress_1.2.2 htmlTable_2.1.0 foreign_0.8-81 [49] bit_4.0.4 Formula_1.2-4 DT_0.17 [52] htmlwidgets_1.5.3 httr_1.4.2 ellipsis_0.3.1 [55] pkgconfig_2.0.3 XML_3.99-0.6 Gviz_1.34.1 [58] nnet_7.3-15 dbplyr_2.1.0 locfit_1.5-9.4 [61] utf8_1.2.1 later_1.1.0.1 tidyselect_1.1.0 [64] rlang_0.4.10 reshape2_1.4.4 BiocVersion_3.12.0 [67] munsell_0.5.0 tools_4.0.4 cachem_1.0.4 [70] generics_0.1.0 RSQLite_2.2.5 evaluate_0.14 [73] stringr_1.4.0 fastmap_1.1.0 yaml_2.2.1 [76] knitr_1.31 bit64_4.0.5 purrr_0.3.4 [79] AnnotationFilter_1.14.0 mime_0.10 formatR_1.8 [82] xml2_1.3.2 biomaRt_2.46.3 BiocStyle_2.18.1 [85] compiler_4.0.4 rstudioapi_0.13 interactiveDisplayBase_1.28.0 [88] curl_4.3 png_0.1-7 tibble_3.1.0 [91] statmod_1.4.35 geneplotter_1.68.0 stringi_1.5.3 [94] futile.logger_1.4.3 lattice_0.20-41 ProtGenerics_1.22.0 [97] Matrix_1.3-2 vctrs_0.3.7 pillar_1.5.1 [100] lifecycle_1.0.0 BiocManager_1.30.12 data.table_1.14.0 [103] bitops_1.0-6 httpuv_1.5.5 rtracklayer_1.50.0 [106] R6_2.5.0 latticeExtra_0.6-29 hwriter_1.3.2 [109] promises_1.2.0.1 gridExtra_2.3 lambda.r_1.2.4 [112] dichromat_2.0-0 MASS_7.3-53.1 assertthat_0.2.1 [115] openssl_1.4.3 withr_2.4.1 hms_1.0.0 [118] VennDiagram_1.6.20 grid_4.0.4 rpart_4.1-15 [121] tidyr_1.1.3 rmarkdown_2.7 biovizBase_1.38.0 [124] shiny_1.6.0 base64enc_0.1-3 tinytex_0.31
添加回复
0.
输入编辑模式

你能把你的targets.csv文件发给我吗?

0.
输入编辑模式

对不起,我忘了在这里回复我通过电子邮件发送了文件。如果你没有收到它,请告诉我,谢谢!

添加回复
0.
输入编辑模式

嗨杰里米。我们得到它,我们正在努力解决这个问题。谢谢你的耐心。一种

0.
输入编辑模式
@ b6a1dc8b.
最后见过26分钟前
布宜诺斯艾利斯

嗨杰里米,

谢谢你的耐心。最后,我们可以再现错误......它是通过在问题Bam中检测到的单个RDNA对齐的结(RDNA.11456.11467)产生的,其被ASPLI未正确解析。

我们今天将推动BIoC Devel Branch的新版本ASPLI [https://biocumon.org/packages/devel/bioc/html/aspli.html.]。此版本(它为2.1.2版)正确处理数据集,并应在几天内安装。

与此同时,快速和肮脏的解决方法可能只是保持BAMS中的标准染色体数据。要做,你可以使用samtools:

SAMTOOLS VIEW -B INPUT.BAM 2L 2R 3L 3R 4 x Y> OUTPUT.BAM

问候

Ariel.

登录在添加答案之前。

流量:325名用户在最后一小时访问过
使用权 rss.
API.
统计

使用本网站构成了我们的接受用户协议和隐私政策

由我们提供动力版本2.3.6.