且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何根据pandas-python中其他列的值计算新列

更新时间:2022-12-10 08:00:05

  df = pd.DataFrame({'a':numpy.random.choice(['l1','l2'],1000000) ,
'b':numpy.random.choice(['1','2'],1000000)})

一个快速的解决方案只有两个不同的值:

 %timeit df ['c'] = ((df.a =='l1')==(df.b =='1'))。astype(int)

10个循环,***3:178 ms每循环



@Viktor Kerkes:

 %timeit df ['c'] =(df.a.str [-1] == df.b).astype(int)

1循环,***的3 :每循环412 ms



@ user1470788:

 %timeit df ['c'] =(((df ['a'] =='l1')&(df ['b'] =='1'))|((df ['a'] ==' ')&(df ['b'] =='2')))。astype(int)

1个循环,***3:363 ms每循环



@herrfz

 code>%timeit df ['c'] =(df.a.apply(lambda x:x [1:])== df.b).astype(int)

1循环,***3:387 ms每循环


Let's say my data frame contains these data:

>>> df = pd.DataFrame({'a':['l1','l2','l1','l2','l1','l2'],
                       'b':['1','2','2','1','2','2']})
>>> df
    a       b
0  l1       1
1  l2       2
2  l1       2
3  l2       1
4  l1       2
5  l2       2

l1 should correspond to 1 whereas l2 should correspond to 2. I'd like to create a new column 'c' such that, for each row, c = 1 if a = l1 and b = 1 (or a = l2 and b = 2). If a = l1 and b = 2 (or a = l2 and b = 1) then c = 0.

The resulting data frame should look like this:

  a         b   c
0  l1       1   1
1  l2       2   1
2  l1       2   0
3  l2       1   0
4  l1       2   0
5  l2       2   1

My data frame is very large so I'm really looking for the most efficient way to do this using pandas.

df = pd.DataFrame({'a': numpy.random.choice(['l1', 'l2'], 1000000),
                   'b': numpy.random.choice(['1', '2'], 1000000)})

A fast solution assuming only two distinct values:

%timeit df['c'] = ((df.a == 'l1') == (df.b == '1')).astype(int)

10 loops, best of 3: 178 ms per loop

@Viktor Kerkes:

%timeit df['c'] = (df.a.str[-1] == df.b).astype(int)

1 loops, best of 3: 412 ms per loop

@user1470788:

%timeit df['c'] = (((df['a'] == 'l1')&(df['b']=='1'))|((df['a'] == 'l2')&(df['b']=='2'))).astype(int)

1 loops, best of 3: 363 ms per loop

@herrfz

%timeit df['c'] = (df.a.apply(lambda x: x[1:])==df.b).astype(int)

1 loops, best of 3: 387 ms per loop