下载甲基化数据的说明：

选择数据时，请确保只选择以01 (TCGA-..-....-01)结尾的实体组织肿瘤类型样本，选择以11 (TCGA-..-....-11)结尾的实体组织正常类型样本。对于每种癌症类型，一旦选择了正确的设置，点击“Build Archive”按钮，然后下载数据。数据矩阵页面使用的设置以及选择的特定TCGA样本如下截图所示:

BRCA正常 - 过滤器设置
BRCA下载设置”>
BRCA正常-样本 <img src=

BRCA肿瘤样本
BRCA肿瘤选择样品”>
COAD正常 - 过滤器设置 <img src=

解压缩发送到您选择的电子邮件的文件后，将每种癌症类型(BRCA、COAD、LUSC)保存为新目录下的不同文件夹，如下面所示的“datadir”。您将需要更新文件的路径。

负载乳腺肿瘤和乳房正常数据：

图书馆(minfi datadir < -)”摩根/用户/ /文件/ methylation_files /乳房”clinicalDir < -file.path(datadir“临床/ BIOTAB”) sample_tab < -read.delim(file.path(clinicalDir“nationwidechildrens.org_biospecimen_sample_brca.txt”),SEP =”\”,stringsAsFactors =假）保持< -sample_tab sample_type美元% %c(“原发肿瘤”,“固体组织正常”) sa mple_tab <-sample_tab [保留，] upenty_id < -独特的(酸式焦磷酸钠(strsplit（sample_tab $ bcr_sample_barcode，分=“-”），功能（x）粘贴(x [1:3.),崩溃=“-”tumor_sample_id < -)))sample_tab $ bcr_sample_uuid [sample_tab sample_type = =美元“原发肿瘤”] normal_sample_id < -sample_tab $ bcr_sample_uuid [sample_tab sample_type = =美元“坚实的组织普通的”]#读取肿瘤数据tumor_tab < -read.delim(file.path(clinicalDir“nationwidechildrens.org_biopecimen_tumor_sample_brca.txt”),SEP =”\”,stringsAsFactors =假< -)选项卡合并(tumor_tab sample_tab通过=“bcr_sample_uuid”,后缀=c(“采样”,“.tumor”),All.x =真的)#读取正常数据normal_tab < -read.delim(file.path(clinicalDir“nationwidechildrens.org_biospecimen_normal_control_brca.txt”),SEP =”\”,stringsAsFactors =假< -)选项卡合并(normal_tab选项卡通过=“bcr_sample_uuid”,后缀=c(“.tumor”,“.normal”),All.x =真的bcr_patient_barcode < -)选项卡选项卡bcr_patient_barcode美元。肿瘤二< -is.na(标签bcr_patient_barcode美元)选项卡bcr_patient_barcode美元(ii) < -选项卡bcr_patient_barcode.normal美元[2]#读取患者数据patient_tab < -read.delim(file.path(clinicalDir“nationwidechildrens.org_clinical_patient_brca.txt”),SEP =”\”,stringsAsFactors =假)的名字(patient_tab) < -粘贴(“病人”,的名字(patient_tab),SEP =“。”tmp) < -合并(patient_tab选项卡by.x =“bcr_patient_barcode”,by.y =“patient.bcr_patient_barcode”,All.x =真的,后缀=c(“采样”,“。病人”< -))选项卡tmp#读取meth元数据methetadir < -file.path(datadir“元数据/ jhu_usc__humanmethylation450”) methMeta_tab < -read.delim(file.path(methMetaDir“jhu-usc.edu_brca.humanmethylation450.1.9.0.SDRF.txt”),SEP =”\”,stringsAsFactors =假) sample_barcode < -酸式焦磷酸钠(strsplit（metheta_tab $ compy ..tcga.barcode。，分=“-”），功能（x）粘贴(x [1:4),崩溃=“-”））m < -比赛（选项卡$ bcr_sample_barcode，sample_barcode）选项卡$ basename < -gsub(“_Grn\ \.idat“,”“，metheta_tab $ array.data.file [m]）选项卡< -标签[！is.na（选项卡$ basename），]母增，< -file.path(datadir“DNA_Methylation / JHU_USC__HumanMethylation450 / Level_1”: < -)选项卡file.path(basedir选项卡:美元)< -file.exists(粘贴(选项卡:美元,“_Grn.idat”,SEP =”“breast_targets < -))选项卡obj < -grep(“标签”,LS.(),值=真的)rm(列表=obj) obj < -grep(“dir”,LS.(),值=真的,忽视=真的)rm(列表=obj) nms < -的名字(breast_targets)目标。乳房< -breast_targets (nms)目标。< -乳房美元地位因素(ifelse(targets.breast sample_type美元= =“原发肿瘤”,“癌症”,“正常”),水平=c(“正常”,“癌症”)目标。乳房组织< -美元放低(targets.breast patient.tumor_tissue_site美元)的目标。乳房性< -美元targets.breast patient.gender美元

正常数据:

datadir < -”摩根/用户/ /文件/ methylation_files /结肠”clinicalDir < -file.path(datadir“临床/ BIOTAB”) sample_tab < -read.delim(file.path(clinicalDir“nationwidechildrens.org_biospecimen_sample_coad.txt”),SEP =”\”,stringsAsFactors =假）保持< -sample_tab sample_type美元% %c(“原发肿瘤”,“固体组织正常”) sample_tab < -sample_tab [保留，] upenty_id < -独特的(酸式焦磷酸钠(strsplit（sample_tab $ bcr_sample_barcode，分=“-”），功能（x）粘贴(x [1:3.),崩溃=“-”tumor_sample_id < -)))sample_tab $ bcr_sample_uuid [sample_tab sample_type = =美元“原发肿瘤”] normal_sample_id < -sample_tab $ bcr_sample_uuid [sample_tab sample_type = =美元“固体组织正常”]#读取肿瘤数据tumor_tab < -read.delim(file.path(clinicalDir“nationwidechildrens.org_biospecimen_tumor_sample_coad.txt”),SEP =”\”,stringsAsFactors =假< -)选项卡合并(tumor_tab sample_tab通过=“bcr_sample_uuid”,后缀=c(“采样”,“.tumor”),All.x =真的)#读取正常数据normal_tab < -read.delim(file.path(clinicalDir“nationwidechildrens.org_biopecimen_normal_control_coad.txt”),SEP =”\”,stringsAsFactors =假< -)选项卡合并(normal_tab选项卡通过=“bcr_sample_uuid”,后缀=c(“.tumor”,“.normal”),All.x =真的bcr_patient_barcode < -)选项卡选项卡bcr_patient_barcode美元。肿瘤二< -is.na(标签bcr_patient_barcode美元)选项卡bcr_patient_barcode美元(ii) < -选项卡bcr_patient_barcode.normal美元[2]#读取患者数据patient_tab < -read.delim(file.path(clinicalDir“nationwidechildrens.org_clinical_patient_coad.txt”),SEP =”\”,stringsAsFactors =假)的名字(patient_tab) < -粘贴(“病人”,的名字(patient_tab),SEP =“。”tmp) < -合并(patient_tab选项卡by.x =“bcr_patient_barcode”,by.y =“patient.bcr_patient_barcode”,All.x =真的,后缀=c(“采样”,“。病人”< -))选项卡tmp#读取meth元数据methetadir < -file.path(datadir“元数据/ jhu_usc__humanmethylation450”) methMeta_tab < -read.delim(file.path(methMetaDir“jhu-usc.edu_COAD.HumanMethylation450.1.9.0.sdrf.txt”),SEP =”\”,stringsAsFactors =假) sample_barcode < -酸式焦磷酸钠(strsplit（metheta_tab $ compy ..tcga.barcode。，分=“-”），功能（x）粘贴(x [1:4),崩溃=“-”））m < -比赛（选项卡$ bcr_sample_barcode，sample_barcode）选项卡$ basename < -gsub(“_Grn\ \.idat“,”“，metheta_tab $ array.data.file [m]）选项卡< -标签[！is.na（选项卡$ basename），]母增，< -file.path(datadir“DNA_Methylation / JHU_USC__HumanMethylation450 / Level_1”: < -)选项卡file.path(basedir选项卡:美元)< -file.exists(粘贴(选项卡:美元,“_Grn.idat”,SEP =”“colon_targets < -))选项卡obj < -grep(“标签”,LS.(),值=真的)rm(列表=obj) obj < -grep(“dir”,LS.(),值=真的,忽视=真的)rm(列表=obj) nms < -的名字(colon_targets)目标。结肠< -colon_targets (nms)目标。< -结肠美元地位因素(ifelse(targets.colon sample_type美元= =“原发肿瘤”,“癌症”,“正常”),水平=c(“正常”,“癌症”））Targets.Colon $组织< -放低(targets.colon patient.tumor_tissue_site美元)的目标。结肠性< -美元Targets.Colon $患者

负荷肺正常数据:

datadir < -”摩根/用户/ /文件/ methylation_files /肺”clinicalDir < -file.path(datadir“临床/ BIOTAB”) sample_tab < -read.delim(file.path(clinicalDir“nationwidechildrens.org_biopecimen_sample_lusc.txt”),SEP =”\”,stringsAsFactors =假）保持< -sample_tab sample_type美元% %c(“原发肿瘤”,“固体组织正常”) sample_tab < -sample_tab [保留，] upenty_id < -独特的(酸式焦磷酸钠(strsplit（sample_tab $ bcr_sample_barcode，分=“-”），功能（x）粘贴(x [1:3.),崩溃=“-”tumor_sample_id < -)))sample_tab $ bcr_sample_uuid [sample_tab sample_type = =美元“原发肿瘤”] normal_sample_id < -sample_tab $ bcr_sample_uuid [sample_tab sample_type = =美元“固体组织正常”]#读取肿瘤数据tumor_tab < -read.delim(file.path(clinicalDir“nationwidechildrens.org_biopecimen_tumor_sample_lusc.txt”),SEP =”\”,stringsAsFactors =假< -)选项卡合并(tumor_tab sample_tab通过=“bcr_sample_uuid”,后缀=c(“采样”,“.tumor”),All.x =真的)#读取正常数据normal_tab < -read.delim(file.path(clinicalDir“nationwidechildrens.org_biospecimen_normal_control_lusc.txt”),SEP =”\”,stringsAsFactors =假< -)选项卡合并(normal_tab选项卡通过=“bcr_sample_uuid”,后缀=c(“.tumor”,“.normal”),All.x =真的bcr_patient_barcode < -)选项卡选项卡bcr_patient_barcode美元。肿瘤二< -is.na(标签bcr_patient_barcode美元)选项卡bcr_patient_barcode美元(ii) < -选项卡bcr_patient_barcode.normal美元[2]#读取患者数据patient_tab < -read.delim(file.path(clinicalDir“nationwidechildrens.org_clinical_patient_lusc.txt”),SEP =”\”,stringsAsFactors =假)的名字(patient_tab) < -粘贴(“病人”,的名字(patient_tab),SEP =“。”tmp) < -合并(patient_tab选项卡by.x =“bcr_patient_barcode”,by.y =“patient.bcr_patient_barcode”,All.x =真的,后缀=c(“采样”,“。病人”< -))选项卡tmp#读取meth元数据methetadir < -file.path(datadir“元数据/ jhu_usc__humanmethylation450”) methMeta_tab < -read.delim(file.path(methMetaDir“jhu-usc.edu_lusc.humanmethylation450.1.7.0.sdrf.txt”),SEP =”\”,stringsAsFactors =假) sample_barcode < -酸式焦磷酸钠(strsplit（metheta_tab $ compy ..tcga.barcode。，分=“-”），功能（x）粘贴(x [1:4),崩溃=“-”））m < -比赛（选项卡$ bcr_sample_barcode，sample_barcode）选项卡$ basename < -gsub(“_Grn\ \.idat“,”“，metheta_tab $ array.data.file [m]）选项卡< -标签[！is.na（选项卡$ basename），]母增，< -file.path(datadir“DNA_Methylation / JHU_USC__HumanMethylation450 / Level_1”: < -)选项卡file.path(basedir选项卡:美元)< -file.exists(粘贴(选项卡:美元,“_Grn.idat”,SEP =”“））Lung_targets < -选项卡obj < -grep(“标签”,LS.(),值=真的)rm(列表=obj) obj < -grep(“dir”,LS.(),值=真的,忽视=真的)rm(列表=obj) nms < -的名字(lung_targets)目标。肺< -lung_targets (nms)目标。< -肺美元地位因素(ifelse(targets.lung sample_type美元= =“原发肿瘤”,“癌症”,“正常”),水平=c(“正常”,“癌症”））Targets.Lung $组织< -放低(targets.lung patient.tumor_tissue_site美元)的目标。肺性< -美元targets.lung patient.gender美元rm(列表=LS.() [!(LS.() % %c(“targets.breast”,“targets.colon”,“targets.lung”))))

合并并读取甲基化数据。

合并< -合并(targets.colon targets.breast所有=真的< -)目标合并(targets.lung合并所有=真的< -)目标目标(哪一个(file.exists(paste0(目标:美元,“_Grn.idat”))))memory.limit(大小=10000) rg_set < -read.metharray(目标:美元,verbose =真的)pData(rg.set) < -目标表格美元(美元目标组织,目标状态)

数据输入	处理函数	输出	分析使用
原始数据(IDAT文件)	read.450k.exp ()	RGChannelSet	读取数据输出
RGChannelSet	Preprocessillumina（）	MethylSet	dmpFinder方法
MethylSet	mapToGenome ()	GenomicMethylSet	blockFinder方法
GenomicMethylSet	ratioConvert ()	GenomicRatioSet	bumphunter方法

车间数据的获取和预处理

赫克托·科拉达·布拉沃和摩根·沃尔特

2016-06-22

下载甲基化数据的说明：

预处理

子集