且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

创建新列来比较 Pandas 数据框中的行

更新时间:2023-12-01 10:14:16

您可以将 new 值设置为 1,然后将 locshift 和 lt(小于)将适当的值设置为零.

df = pd.DataFrame({"A": [5, 6, 7, 8, 2]})df['新'] = 1df.loc[(df.A.shift(-1).lt(4)) |(df.A.shift(-2).lt(4)), 'new'] = 0# 最后一个值没有任何未来的观察,应该设置为零.df.new.iat[-1] = 0>>>df一个新的0 5 11 6 12 7 03 8 04 2 0

要扩展到接下来的 8 行而不是 2 行:

nrows = 8df.loc[eval(" | ".join("df.A.shift(-{0}).lt(4)".format(n)对于范围内的 n(1, nrows + 1))), 'new'] = 0

I am looking to create a new column in a dataframe based on the values seen in the next 2 rows. Specifically, if any values in the next 2 rows are below 4, then I want the new value in the current row to be 0 (and if all values in the next 2 rows are above 4 then I want the new value in the current row to be 1).

>>> df = pandas.DataFrame({"A": [5,6,7,3,2]})
>>> df
   A
0  5
1  6
2  7
3  8
4  2
>>> desired_result = pandas.DataFrame({"A": [5,6,7,8,2], "new": [1,1,0,0,0]})
>>> desired_result
   A  new
0  5    1
1  6    1
2  7    0
3  8    0
4  2    0

Where you can see that in the "desired_result" the first value is 1 because 6 and 7 are both > 4 (and hte same logic applies) until in the third row the new value becomes 0 because when we look ahead to the next two rows (8,2) then we see that 2 is < 4 so the value becomes 0.

I have been trying to use the apply function but I cannot figure out how to pass along the next 2 row values as inputs.

I have found lots of help on this site about comparing across columns, but cannot figure out how to "look ahead" like I described.

Thanks for the help!

You can set the new value to one and then use loc together with shift and lt (less than) to set the appropriate values to zero.

df = pd.DataFrame({"A": [5, 6, 7, 8, 2]})
df['new'] = 1

df.loc[(df.A.shift(-1).lt(4)) | (df.A.shift(-2).lt(4)), 'new'] = 0

# The last value does not have any future observations and should be set to zero.
df.new.iat[-1] = 0

>>> df
   A  new
0  5    1
1  6    1
2  7    0
3  8    0
4  2    0

To expand to the next 8 rows instead of 2:

nrows = 8
df.loc[eval(" | ".join("df.A.shift(-{0}).lt(4)".format(n) 
                       for n in range(1, nrows + 1))), 'new'] = 0