且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在Pandas数据框中的2个日期之间添加日期列

更新时间:2023-11-18 23:08:58

首先,我要切断过期日期,以标准化end_time(以确保它在时间范围内):

First I'd cut off the dud dates, to normalize the end_time (to ensure it's in the time range):

In [11]: df.end_date = df.end_date.where(df.end_date < '2019-02-01', pd.Timestamp('2019-01-31')) + pd.offsets.MonthBegin()

In [12]: df
Out[12]:
   id start_date   end_date
0   1 2017-06-01 2019-02-01
1   2 2018-10-01 2019-02-01
2   3 2015-01-01 2019-02-01
3   4 2017-11-01 2019-02-01

注意:如果有2012年之前的日期,您需要对开始日期做同样的技巧。

Note: you'll need to do the same trick for start_date if there are dates prior to 2012.

我将从列的日期范围中创建结果DataFrame,然后将其填充(在开始时加上其他内容:

I'd create the resulting DataFrame from a date range of the columns and then fill it in (with ones at start time and something else:

In [13]: m = pd.date_range('2012-01-01', '2019-02-01', freq='MS')

In [14]: res = pd.DataFrame(0., columns=m, index=df.index)

In [15]: res.update(pd.DataFrame(np.diag(np.ones(len(df))), df.index, df.start_date).groupby(axis=1, level=0).sum())

In [16]: res.update(-pd.DataFrame(np.diag(np.ones(len(df))), df.index, df.end_date).groupby(axis=1, level=0).sum())

如果同一行中有多行开始或结束,则需要groupby总和。

The groupby sum is required if multiple rows start or end in the same month.

# -1 and NaN were really placeholders for zero
In [17]: res = res.replace(0, np.nan).ffill(axis=1).replace([np.nan, -1], 0)

In [18]: res
Out[18]:
   2012-01-01  2012-02-01  2012-03-01  2012-04-01  2012-05-01     ...      2018-09-01  2018-10-01  2018-11-01  2018-12-01  2019-01-01
0         0.0         0.0         0.0         0.0         0.0     ...             1.0         1.0         1.0         1.0         1.0
1         0.0         0.0         0.0         0.0         0.0     ...             0.0         1.0         1.0         1.0         1.0
2         0.0         0.0         0.0         0.0         0.0     ...             1.0         1.0         1.0         1.0         1.0
3         0.0         0.0         0.0         0.0         0.0     ...             1.0         1.0         1.0         1.0         1.0