---标题:“实验室7a:机器学习练习”输出:生物焦质:: html_document:toc:true number_sections:false gignette:>%\ vignetteindexentry {lab 7a:machine学习练习}%\ vignetteengine {knitr :: Rarmardown} \usepackage [utf8] {inputenc} ---```{r style,echo = false,结果='asis'} biocstyle :: markdown()`````````````{r设置隐藏'}库(knitr)opts_chunk $ set(cache = true,error = false)```#探索性分层群集使用shiny ## r源程序`dfhclust.r`是实验室7的Github存储库中的基本功能。将其源进入您的r会话,然后验证它是否使用呼叫```data(mtcars)dfhclust(mtcars,labels = rownames(mtcars))```如果它失败,添加任何缺少的库,并执行任何缺少的库它需要工作。打断闪亮的会议继续。##应用于组织鉴别的应用程序将输入设置为使用`tissuesgeneexpression`数据进行进入`dfhclust`。````{r gett}库(tissuesgeneexpression)数据(组织geneexpression)df = data.frame(t(e))no =哪个(tab $ subtype ==“正常”)df = df [no,] tisslabel = tab $组织[否]```使用`dfhclust(df [,1:50],tisslabel)`作为检查。中断##练习。###符号映射将“DF”列名的列名称映射到基因符号。使用`hgu133a.db`。用缺陷符号删除列,并将剩余列与符号重命名。 ```{r mapit} library(hgu133a.db) nids = mapIds(hgu133a.db, keys= sub("^X", "", colnames(df)), keytype="PROBEID", column="SYMBOL") bad = which(is.na(nids)) if (length(bad)>0) { df = df[,-bad] nids = nids[-bad] colnames(df) = nids } dim(df) ``` Interrupt the shiny session and use the new `df` as input. Note that the clustering is based on three genes by default. Other default choices for the clustering are - the object:object distance used - the agglomeration algorithm - the height at which the tree is cut to define clusters Shift the view to the `silhouette` plot. With the default settings, the average silhouette value for five clusters is 0.35. Increase the `height for cut` value to 8. How many clusters are declared, and what is the average silhouette value? Add the gene CCL5 to the feature set used for clustering. Now how many clusters are declared? Interrupt the shiny session to proceed. ### Alphabetizing the selection options Modify `df` so that the column names are in alphabetical order. Use `dfHclust(df[,1:50], tisslabel)` for the new ordering. What is the average silhouette value for the default choices of `dfHclust` settings?将聚类方法更改为“ward.d2`”。什么是新的平均轮廓值?打断闪亮的会议继续。###与基因集聚类我们使用了这些插图的任意选择基因。考虑到用于对拼接的基因的稳态表达模式的想法对于组织分化是重要的。我们可以如下获取HGU133A阵列上的相关基因列表。```#使用go.db goid term 22097 Go:0045292 mRNA顺式拼接,通过剪接,通过缩写为```````````````````````````````````````````````````````````````````````````````````````````````````````````)(hgu133a.db,keys =“go:0045292”,KeyType =“Go:0045292”,Go“,列=”符号“)tokeeal =相交(splg $符号,colnames(df))dfsp = df [,tokeep]```你应该在这些操作后有7个基因。使用“DFSP”和“DFHClust”,选择所有基因进行聚类。当您将剪贴体注释的基因添加到聚类中时,是否会出现聚类树的出现改善?#nmf含有果蝇表达式图案## drosmap包安装并附加`drosmap`包。这是[BDGP](http://insitu.fruitfly.org/insitu-pp/prinpatcode.zip)提供的代码和数据的简单重新包装。```{r doinst,eval = false}库(生物寄存器)bioclite(“vjcitn / drosmap”)库(Drosmap)````##表达式模式可获得源自胚泡样品的空间记录的基因表达模式的数据。。 We'll display some examples. ```{r lkexp, fig=TRUE} library(drosmap) data(expressionPatterns) data(template) imageBatchDisplay(expressionPatterns[,1:12], nrow=3, ncol=4, template=template[,-1]) ``` ## Exercises ### Comparing non-negative matrix factorizations of the expression pattern matrix We'll reduce the data matrix (for convenience) to 701 unique genes ```{r dou} data(uniqueGenes) uex = expressionPatterns[,uniqueGenes] ``` We'll begin with a factorization using a basis of rank 10. ```{r don1,cache=TRUE} set.seed(123) library(NMF) m10 =nmf(uex, rank=10) m10 ``` The authors of the [Wu et al. 2016 PNAS paper](http://www.pnas.org/content/113/16/4290.full) justify a rank 21 basis. ```{r don2,cache=TRUE} set.seed(123) library(NMF) m21 =nmf(uex, rank=21) ``` To visualize the clustering of the expression patterns with the rank 10 basis, use ```{r lk10,fig=TRUE,fig.height=3.8} imageBatchDisplay(basis(m10), nrow=4,ncol=3,template=template[,-1]) ``` The 'predicted' matrix with the rank 10 basis is ```{r lkpr} PM10 = basis(m10)%*%coef(m10) ``` Compare the faithfulness of the rank 10 and rank 21 approximations. ### Comparison to a cell fate schematic Produce the display of the `m21` basis with `imageBatchDisplay` and check that the constituents are similar to those shown as principal patterns below (from the Wu et al. paper). ```{r lippp,fig=TRUE,echo=FALSE} im = readPNG("fullFateMap.png") grid.raster(im) ``` Can any of the patterns found with the rank 10 basis be mapped to key anatomical components of the blastocyst fate schematic?