更新时间:2023-12-01 11:10:16
合并和子集应该可以让您到达那里(尽管我认为您的预期结果与您对所需内容的描述不匹配):
result <- merge(d,m,by="Gene.name")结果[with(result,Mutation >= Start & Mutation <= End),]# Gene.name Domain Start End Mutation#1 ABCF1 低复杂度区域 2 13 10#4 ABCF1 AAA 328 532 335#6 F2 盘绕线圈区域 499 558 499
I have two datasets. One called domain (d) which as general information about a gene and table called mutation (m). Both tables have similar column called Gene.name, which I'll use to look for. The two datasets do not have the same number of columns or rows.
I want to go through all the data in the file mutation and check to see whether the data found in column gene.name also exists in the file domain. If it does, I want it to check whether the data in column mutation is between the column "Start" and "End" (they can be equal to Start or End). If it is, I want to print it out to a new table with the merged column: Gene.Name, Mutation, and the domain information. If it doesn't exist, ignore it.
So this is what I have so far:
d<-read.table("domains.txt")
d
Gene.name Domain Start End
ABCF1 low_complexity_region 2 13
DKK1 low_complexity_region 25 39
ABCF1 AAA 328 532
F2 coiled_coil_region 499 558
m<-read.table("mutations.tx")
m
Gene.name Mutation
ABCF1 10
DKK1 21
ABCF1 335
xyz 15
F2 499
newfile<-m[, list(new=findInterval(d(c(d$Start, d$End)),by'=Gene.Name']
My code isn't working and I'm reading a lot of different questions/answers and I'm much more confused. Any help would be great.
I"d like my final data to look like this:
Gene.name Mutation Domain
DKK1 21 low_complexity_region
ABCF1 335 AAA
F2 499 coiled_coil_region
A merge and subset should get you there (though I think your intended result doesn't match your description of what you want):
result <- merge(d,m,by="Gene.name")
result[with(result,Mutation >= Start & Mutation <= End),]
# Gene.name Domain Start End Mutation
#1 ABCF1 low_complexity_region 2 13 10
#4 ABCF1 AAA 328 532 335
#6 F2 coiled_coil_region 499 558 499