更新时间:2023-12-06 13:48:04
当在 df1
上使用 mean
时,它默认计算每一列并产生一个 pd.系列
.
When using mean
on df1
, it calculates over each column by default and produces a pd.Series
.
将 pd.Series
添加到 pd.DataFrame
时,它会将 pd.Series
的索引与pd.DataFrame
并默认沿 pd.DataFrame
... 的索引广播.
When adding adding a pd.Series
to a pd.DataFrame
it aligns the index of the pd.Series
with the columns of the pd.DataFrame
and broadcasts along the index of the pd.DataFrame
... by default.
唯一棘手的一点是处理 Date
列.
The only tricky bit is handling the Date
column.
选项 1
m = df1.mean()
df2.loc[:, m.index] += m
df2
Date c1 c2 c3 c4 c5 c6 c10
0 2017-09-12 1.5 1.0 2.65 1.45 2.5 3.0 3.3
1 2017-09-13 0.8 2.7 2.45 1.95 3.7 1.9 2.5
2 2017-10-10 2.1 1.8 2.75 1.45 2.6 2.9 3.1
3 2017-10-11 3.3 2.0 3.15 0.95 1.8 1.9 2.7
如果我知道 'Date'
总是在第一列,我可以:
If I know that 'Date'
is always in the first column, I can:
df2.iloc[:, 1:] += df1.mean()
df2
Date c1 c2 c3 c4 c5 c6 c10
0 2017-09-12 1.5 1.0 2.65 1.45 2.5 3.0 3.3
1 2017-09-13 0.8 2.7 2.45 1.95 3.7 1.9 2.5
2 2017-10-10 2.1 1.8 2.75 1.45 2.6 2.9 3.1
3 2017-10-11 3.3 2.0 3.15 0.95 1.8 1.9 2.7
选项 2
请注意,我在 set_index
中使用了 append=True
参数,以防万一索引中有您不想弄乱的内容.
Option 2
Notice that I use the append=True
parameter in the set_index
just incase there are things in the index you don't want to mess up.
df2.set_index('Date', append=True).add(df1.mean()).reset_index('Date')
Date c1 c2 c3 c4 c5 c6 c10
0 2017-09-12 1.5 1.0 2.65 1.45 2.5 3.0 3.3
1 2017-09-13 0.8 2.7 2.45 1.95 3.7 1.9 2.5
2 2017-10-10 2.1 1.8 2.75 1.45 2.6 2.9 3.1
3 2017-10-11 3.3 2.0 3.15 0.95 1.8 1.9 2.7
如果你不关心索引,你可以把它缩短为
If you don't care about the index, you can shorten this to
df2.set_index('Date').add(df1.mean()).reset_index()
Date c1 c2 c3 c4 c5 c6 c10
0 2017-09-12 1.5 1.0 2.65 1.45 2.5 3.0 3.3
1 2017-09-13 0.8 2.7 2.45 1.95 3.7 1.9 2.5
2 2017-10-10 2.1 1.8 2.75 1.45 2.6 2.9 3.1
3 2017-10-11 3.3 2.0 3.15 0.95 1.8 1.9 2.7