且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

lubridate转换午夜时间戳返回NA:如何填充缺少的时间戳

更新时间:2023-01-22 12:41:06

如果开始时原始数据中完全没有'00:00:00',则可以使用grep查找这些情况,然后粘贴'00:00:00',然后再使用ymd_hms()或mdy_hm()函数.

If the '00:00:00' is completely missing in the original data to begin with, you can use grep to find those cases, then paste '00:00:00' before using the ymd_hms() or mdy_hm() function.

第一种情况,日期/时间格式为"YYYY-mm-dd HH:MM:SS":

First case, where date/time format is 'YYYY-mm-dd HH:MM:SS':

#Before
test <- fread("time,  btc_price
2017-08-28 23:57:00, 4439.8163
2017-08-28 23:58:00, 4440.2363
2017-08-28 23:58:00, 4440.2363
2017-08-28 23:59:00, 4439.3313
2017-08-29         , 4439.6588
2017-08-29 00:01:00, 4440.3050")

test$time[grep("[0-9]{4}-[0-9]{2}-[0-9]{2}$",test$time)] <- paste(
  test$time[grep("[0-9]{4}-[0-9]{2}-[0-9]{2}$",test$time)],"00:00:00")

#After
print(test)

                  time btc_price
1: 2017-08-28 23:57:00  4439.816
2: 2017-08-28 23:58:00  4440.236
3: 2017-08-28 23:58:00  4440.236
4: 2017-08-28 23:59:00  4439.331
5: 2017-08-29 00:00:00  4439.659
6: 2017-08-29 00:01:00  4440.305

#Now you can use ymd_hms(as.character(df$date)) as usual.

第二种情况,其中日期/时间格式为"m/dd/yy HH:MM":

Second case, where date/time format is 'm/dd/yy HH:MM':

#Step 1 is to find/replace:
test <- fread("time,  btc_price
8/28/17 23:57, 4439.8163
8/28/17 23:57, 4440.2363
8/28/17 23:57, 4440.2363
8/28/17 23:57, 4439.3313
8/28/17      , 4439.6588
8/29/17 00:01, 4440.3050")

test$time[grep("[0-9]{1}/[0-9]{2}/[0-9]{2}$",test$time)] <- paste(
  test$time[grep("[0-9]{1}/[0-9]{2}/[0-9]{2}$",test$time)],"00:00"
)

print(test)
            time btc_price
1: 8/28/17 23:57  4439.816
2: 8/28/17 23:57  4440.236
3: 8/28/17 23:57  4440.236
4: 8/28/17 23:57  4439.331
5: 8/28/17 00:00  4439.659
6: 8/29/17 00:01  4440.305

#Step 2 is to adjust your mdy_hms() command; you need to leave off the 's':
#Ex. before:
mdy_hms(as.character("8/28/17 16:19"))
[1] NA
Warning message:
All formats failed to parse. No formats found. 

#After
test <- c("8/28/17 16:19","8/28/17 00:00")
mdy_hm(as.character(test))
[1] "2017-08-28 16:19:00 UTC" "2017-08-28 00:00:00 UTC"

通常,将数字格式化为R时不要用逗号表示也是一种好习惯;因此4,439.3313应该是4439.3313.否则,R可能会将其解释为列之间的逗号分隔.

In general, it's also good practice to have numbers be formatted without commas in R; so 4,439.3313 should be 4439.3313. Otherwise, R might interpret that as a comma separation between columns.