```{r setup,echo = false}库(SearchBiociondumon)StopIfNot(Biocinstaller :: Biocversion()==“3.1”)`````{R样式,Echo = False,结果='ASIS'}生物科学:: markdown()knitr :: opts_chunk $ set(tidy = false)```#介绍r martin摩根
## R统计计算和图形的语言和环境-全功能的编程语言-交互式和*解释* -方便和宽容-连贯的,广泛的文档-统计,例如。' factor() ', ' NA ' -可扩展——CRAN, Bioconductor, github,…Vector, class, object -有效的_vectorized_计算'atomic' vectors ' logical ', ' integer ', ' numeric ', ' complex ', ' character ',` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `函数将输入转换为输出,可能会有副作用,例如' rnorm(1000) ' -参数首先按名称匹配,然后按位置匹配-函数可以定义(某些)参数以具有默认值- _Generic_函数根据参数类分派到特定的_method_,例如' print() '。-方法是实现特定泛型的函数,例如' print.factor ';方法是通过泛型来间接调用的。自省-一般属性,如' class() ', ' str() ' -类特定属性,如' dim() ' Help - ' ?Print ':关于类data.frame对象的打印方法的帮助示例' ' ' {r} x <- rnorm(1000) #原子向量y <- x + rnorm(1000, sd=.5) df <- data.frame(x=x, y=y) # object of class 'data.frame' plot(y ~ x, df) # generic plot, method plot。class=class(fit)) #自省### Lab ### # 1。_R_数据操作本练习是关于基本输入和数据操作的复习/教程。 Input a file that contains ALL (acute lymphoblastic leukemia) patient information ```{r echo=TRUE, eval=FALSE} fname <- file.choose() ## "ALLphenoData.tsv" stopifnot(file.exists(fname)) pdata <- read.delim(fname) ``` ```{r echo=FALSE} fname <- system.file("extdata", "ALLphenoData.tsv", package="LearnBioconductor") stopifnot(file.exists(fname)) pdata <- read.delim(fname) ``` Check out the help page `?read.delim` for input options, and explore basic properties of the object you've created, for instance... ```{r ALL-properties} class(pdata) colnames(pdata) dim(pdata) head(pdata) summary(pdata$sex) summary(pdata$cyto.normal) ``` Remind yourselves about various ways to subset and access columns of a data.frame ```{r ALL-subset} pdata[1:5, 3:4] pdata[1:5, ] head(pdata[, 3:5]) tail(pdata[, 3:5], 3) head(pdata$age) head(pdata$sex) head(pdata[pdata$age > 21,]) ``` It seems from below that there are 17 females over 40 in the data set, but when sub-setting `pdata` to contain just those individuals 19 rows are selected. Why? What can we do to correct this? ```{r ALL-subset-NA} idx <- pdata$sex == "F" & pdata$age > 40 table(idx) dim(pdata[idx,]) ``` Use the `mol.biol` column to subset the data to contain just individuals with 'BCR/ABL' or 'NEG', e.g., ```{r ALL-BCR/ABL-subset} bcrabl <- pdata[pdata$mol.biol %in% c("BCR/ABL", "NEG"),] ``` The `mol.biol` column is a factor, and retains all levels even after subsetting. How might you drop the unused factor levels? ```{r ALL-BCR/ABL-drop-unused} bcrabl$mol.biol <- factor(bcrabl$mol.biol) ``` The `BT` column is a factor describing B- and T-cell subtypes ```{r ALL-BT} levels(bcrabl$BT) ``` How might one collapse B1, B2, ... to a single type B, and likewise for T1, T2, ..., so there are only two subtypes, B and T ```{r ALL-BT-recode} table(bcrabl$BT) levels(bcrabl$BT) <- substring(levels(bcrabl$BT), 1, 1) table(bcrabl$BT) ``` Use `xtabs()` (cross-tabulation) to count the number of samples with B- and T-cell types in each of the BCR/ABL and NEG groups ```{r ALL-BCR/ABL-BT} xtabs(~ BT + mol.biol, bcrabl) ``` Use `aggregate()` to calculate the average age of males and females in the BCR/ABL and NEG treatment groups. ```{r ALL-aggregate} aggregate(age ~ mol.biol + sex, bcrabl, mean) ``` Use `t.test()` to compare the age of individuals in the BCR/ABL versus NEG groups; visualize the results using `boxplot()`. In both cases, use the `formula` interface. Consult the help page `?t.test` and re-do the test assuming that variance of ages in the two groups is identical. What parts of the test output change? ```{r ALL-age} t.test(age ~ mol.biol, bcrabl) boxplot(age ~ mol.biol, bcrabl) ``` ## Resources - [StackOverflow](http://stackoverflow.com/questions/tagged/r) for _R_ programming questions; also [R-help]() mailing list. Publications (General _R_)(R): http://r-project.org