且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将新列添加到Python pandas 中的现有DataFrame中

更新时间:2022-03-29 07:22:26

使用原始df1索引创建系列:

Use the original df1 indexes to create the series:

df1['e'] = Series(np.random.randn(sLength), index=df1.index)









编辑2015

有些报告使用此代码获取 SettingWithCopyWarning

但是,代码仍然使用当前的大熊猫版本0.16.1完美。



Edit 2015
Some reported to get the SettingWithCopyWarning with this code.
However, the code still runs perfect with the current pandas version 0.16.1.

>>> sLength = len(df1['a'])
>>> df1
          a         b         c         d
6 -0.269221 -0.026476  0.997517  1.294385
8  0.917438  0.847941  0.034235 -0.448948

>>> df1['e'] = p.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e
6 -0.269221 -0.026476  0.997517  1.294385  1.757167
8  0.917438  0.847941  0.034235 -0.448948  2.228131

>>> p.version.short_version
'0.16.1'

SettingWithCopyWarning 旨在通知Dataframe副本中的可能无效的分配。它不一定说你做错了(它可以触发误报),但从0.13.0它让你知道有更多的适当的方法为同一目的。然后,如果您收到警告,只需按照其建议:尝试使用.loc [row_index,col_indexer] =值而不是

The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead

>>> df1.loc[:,'f'] = p.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e         f
6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927
8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109
>>> 

事实上,这是当前更有效的方法,因为

In fact, this is currently the more efficient method as described in pandas docs

编辑2017

如意见和@Alexander,目前***的方法是将一个Series的值添加为DataFrame的新列,可以使用 assign

As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign:

df1 = df1.assign(e=p.Series(np.random.randn(sLength)).values)