更新时间:2022-12-20 11:31:56
p>您可以尝试以下代码删除在开始或结束处出现的逗号。
>数据 +,Siem Reap,FC,11 Wall Street,New York,NY,,Addis Ababa, b $ b>
pre>
[1]1600 Pennsylvania Avenue,Washington DC
[2](1)
[1] ]暹粒,FC
[3]11华尔街,纽约,纽约
[4]亚的斯亚贝巴,FC模式说明:
(?在regex (?称为正后备。在我们的例子中,它断言什么前面的逗号必须是一个行开始 ^
。
|
逻辑OR运算符通常用于合并(即ORing)两个正则表达式。,(?= $)
Lookahead要求逗号后面的行必须是行尾 $
。因此,它匹配行末尾的逗号。I have (sometimes incomplete) data on addresses that looks like this:
data <- c("1600 Pennsylvania Avenue, Washington DC",
",Siem Reap,FC,", "11 Wall Street, New York, NY", ",Addis Ababa,FC,")
I need to remove the first and/or last character if either one of them are a comma.
So far, I have:
for(i in 1:length(data)){
lastchar <- nchar(data[i])
sec2last <- nchar(data[i]) - 1
if(regexpr(",",data[i])[1] == 1){
data[i] <- substr(data[i],2, lastchar)
}
if(regexpr(",",data[i])[1] == nchar(data[i])){
data[i] <- substr(data[i],1, sec2last)
}
}
data
which works for the first character, but not the last character. How can I modify the second if
statement or otherwise accomplish my goal?
You could try the below code which remove the comma present at the start or at the end,
> data <- c("1600 Pennsylvania Avenue, Washington DC",
+ ",Siem Reap,FC,", "11 Wall Street, New York, NY", ",Addis Ababa,FC,")
> gsub("(?<=^),|,(?=$)", "", data, perl=TRUE)
[1] "1600 Pennsylvania Avenue, Washington DC"
[2] "Siem Reap,FC"
[3] "11 Wall Street, New York, NY"
[4] "Addis Ababa,FC"
Pattern explanation:
(?<=^),
In regex (?<=)
called positive look-behind. In our case it asserts What precedes the comma must be a line start ^
. So it matches the starting comma.|
Logical OR operator usually used to combine(ie, ORing) two regexes.,(?=$)
Lookahead aseerts that what follows comma must be a line end $
. So it matches the comma present at the line end.