更新时间:2023-11-18 23:08:58
首先,我要切断过期日期,以标准化end_time(以确保它在时间范围内):
First I'd cut off the dud dates, to normalize the end_time (to ensure it's in the time range):
In [11]: df.end_date = df.end_date.where(df.end_date < '2019-02-01', pd.Timestamp('2019-01-31')) + pd.offsets.MonthBegin()
In [12]: df
Out[12]:
id start_date end_date
0 1 2017-06-01 2019-02-01
1 2 2018-10-01 2019-02-01
2 3 2015-01-01 2019-02-01
3 4 2017-11-01 2019-02-01
注意:如果有2012年之前的日期,您需要对开始日期
做同样的技巧。
Note: you'll need to do the same trick for start_date
if there are dates prior to 2012.
我将从列的日期范围中创建结果DataFrame,然后将其填充(在开始时加上其他内容:
I'd create the resulting DataFrame from a date range of the columns and then fill it in (with ones at start time and something else:
In [13]: m = pd.date_range('2012-01-01', '2019-02-01', freq='MS')
In [14]: res = pd.DataFrame(0., columns=m, index=df.index)
In [15]: res.update(pd.DataFrame(np.diag(np.ones(len(df))), df.index, df.start_date).groupby(axis=1, level=0).sum())
In [16]: res.update(-pd.DataFrame(np.diag(np.ones(len(df))), df.index, df.end_date).groupby(axis=1, level=0).sum())
如果同一行中有多行开始或结束,则需要groupby总和。
The groupby sum is required if multiple rows start or end in the same month.
# -1 and NaN were really placeholders for zero
In [17]: res = res.replace(0, np.nan).ffill(axis=1).replace([np.nan, -1], 0)
In [18]: res
Out[18]:
2012-01-01 2012-02-01 2012-03-01 2012-04-01 2012-05-01 ... 2018-09-01 2018-10-01 2018-11-01 2018-12-01 2019-01-01
0 0.0 0.0 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 1.0
1 0.0 0.0 0.0 0.0 0.0 ... 0.0 1.0 1.0 1.0 1.0
2 0.0 0.0 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 1.0
3 0.0 0.0 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 1.0