—title:“Data Analysis”输出:html_document—Lab #5 Differential expression 1.)获取GEO大鼠生酮脑数据集(rat_KD.txt)。[rat_KD.txt] (http://tinyurl.com/data-uruguay) 2。)使用read.table()函数和header=T参数加载到R中。' ' ' {r}鼠= read.table(“rat_KD.txt”,9 =“t \”,头= t) dimnames(老鼠)[[1]]< -老鼠[1]=鼠[1]' ' ' 2 b)。将基因名称设置为行名称并删除第一列。3)。使用t检验函数来计算控制饮食和生酮饮食类别之间每个基因的差异。(提示:使用colnames()函数来确定一个类的结束位置和另一个类的开始位置)。{r} colnames(rat) t.test(rat[1,1:6], rat[1,7:11]) y <- t.test(rat[2,1:6], rat[2,7:11]) dim(rat) ttestRat <- function(df, grp1, grp2) {x = df[grp1] y = df[grp2] x = as.numeric(x) y = as.numeric(y) results = t.test(x, y) results$p;} rawpvalue = apply(rat, 1, ttestRat, grp1 = c(1:6), grp2 = c(7:11)) ``` 4.) Plot a histogram of the p-values. ```{r} hist(rawpvalue) ``` 5.) Log2 the data, calculate the mean for each gene per group. Then calculate the fold change between the groups (control vs. ketogenic diet). hint: log2(ratio) ```{r} ##transform our data into log2 base. rat = log2(rat) #calculate the mean of each gene per control group control = apply(rat[,1:6], 1, mean) #calcuate the mean of each gene per test group test = apply(rat[, 7:11], 1, mean) #confirming that we have a vector of numbers class(control) #confirming we have a vector of numbers class(test) #because our data is already log2 transformed, we can take the difference between the means. And this is our log2 Fold Change or log2 Ratio == log2(control / test) foldchange <- control - test ``` 6.) Plot a histogram of the fold change values. ```{r} hist(foldchange, xlab = "log2 Fold Change (Control vs Test)") ``` 7.) Transform the p-value (-1*log(p-value)) and create a volcano plot using ggplot2. ```{r} results = cbind(foldchange, rawpvalue) results = as.data.frame(results) results$probename <- rownames(results) library(ggplot2) volcano = ggplot(data = results, aes(x = foldchange, y = -1*log10(rawpvalue))) volcano + geom_point() ```