且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

计算R中的一列中子串的出现次数

更新时间:2022-12-10 17:37:56

你也可以使用我的splitstackshape包中的 cSplit()。因为这个包也加载了data.table,所以你可以使用 dcast()来列表结果。





  library(splitstackshape)
cSplit(mydf,String,direction =long) [,dcast(.SD,village〜String)]
#使用'村庄'作为值栏。使用'value.var'覆盖
#缺少聚合函数,默认为'length'
#village fd_sec ht_rm san不适用
#1:A 1 2 0 1
#2 :B 1 0 0 0
#3:C 0 1 1 0


I would like to count the occurrences of a string in a column ....per group. In this case the string is often a substring in a character column.

I have some data e.g.

ID   String              village
1    fd_sec, ht_rm,      A
2    NA, ht_rm           A
3    fd_sec,             B
4    san, ht_rm,         C

The code that I began with is obviously incorrect, but I am failing on my search to find out I could use the grep function in a column and group by village

impacts <- se %>%  group_by(village) %>%
summarise(c_NA = round(sum(sub$en41_1 ==  "NA")),
          c_ht_rm = round(sum(sub$en41_1 ==  "ht_rm")),
          c_san = round(sum(sub$en41_1 ==  "san")),
          c_fd_sec = round(sum(sub$en41_1 ==  "fd_sec")))

Ideally my output would be:

village  fd_sec  NA  ht_rm  san
A        1       1   2 
B        1
C                    1      1

Thank you in advance

You can also use cSplit() from my "splitstackshape" package. Since this package also loads "data.table", you can then just use dcast() to tabulate the result.

Example:

library(splitstackshape)
cSplit(mydf, "String", direction = "long")[, dcast(.SD, village ~ String)]
# Using 'village' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
#    village fd_sec ht_rm san NA
# 1:       A      1     2   0  1
# 2:       B      1     0   0  0
# 3:       C      0     1   1  0