且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

根据条件创建新的numpy数组

更新时间:2021-08-16 02:21:08

关注性能并使用两种方法,可以添加一些方法.一种方法是获取有效数组的布尔数组,并使用 np.where 根据相同的布尔数组在01之间选择.因此,从本质上讲,我们将有两种方法,一种利用有效的数据类型转换,另一种利用选择标准.现在,可以通过两种方式获得布尔数组-一种使用简单比较,另一种使用 np.logical_and .因此,通过两种方法来获取布尔数组,并通过两种方法将布尔数组转换为int数组,我们最终将得到以下四种实现-

With focus on performance and using two methods few aproaches could be added. One method would be to get the boolean array of valid ones and converting to int datatype with .astype() method. Another way could involve using np.where that lets us select between 0 and 1 based on the same boolean array. Thus, essentially we would have two methods, one that harnesses efficient datatype conversion and another that uses selection criteria. Now, the boolean array could be obtained in two ways - One using simple comparison and another using np.logical_and. So, with two ways to get the boolean array and two methods to convert the boolean array to int array, we would end up with four implementations as listed below -

out1 = ((aa>0.5) & (bb>0.5)).astype(int)
out2 = np.logical_and(aa>0.5, bb>0.5).astype(int)
out3 = np.where((aa>0.5) & (bb>0.5),1,0)
out4 = np.where(np.logical_and(aa>0.5, bb>0.5), 1, 0)

您可以尝试使用数据类型以使用精度较低的类型,因为我们无论如何都将值设置为01,这应该不会受到损害.好处应该是明显的加速,因为它利用了内存效率.我们可以使用 int8uint8np.int8类型.因此,使用新的int数据类型的较早列出的方法的变体为-

You can play around with the datatypes to use less precision types, which shouldn't hurt as we are setting the values to 0 and 1 anyway. The benefit should be noticeable speedup as it leverages memory efficiency. We could use int8, uint8, np.int8, np.uint8 types. Thus, the variants of the earlier listed approaches using the new int datatypes would be -

out5 = ((aa>0.5) & (bb>0.5)).astype('int8')
out6 = np.logical_and(aa>0.5, bb>0.5).astype('int8')
out7 = ((aa>0.5) & (bb>0.5)).astype('uint8')
out8 = np.logical_and(aa>0.5, bb>0.5).astype('uint8')

out9 = ((aa>0.5) & (bb>0.5)).astype(np.int8)
out10 = np.logical_and(aa>0.5, bb>0.5).astype(np.int8)
out11 = ((aa>0.5) & (bb>0.5)).astype(np.uint8)
out12 = np.logical_and(aa>0.5, bb>0.5).astype(np.uint8)

运行时测试(因为本文着重于性能)-

Runtime test (as we are focusing on performance with this post) -

In [17]: # Input arrays
    ...: aa = np.random.rand(1000,1000)
    ...: bb = np.random.rand(1000,1000)
    ...: 

In [18]: %timeit ((aa>0.5) & (bb>0.5)).astype(int)
    ...: %timeit np.logical_and(aa>0.5, bb>0.5).astype(int)
    ...: %timeit np.where((aa>0.5) & (bb>0.5),1,0)
    ...: %timeit np.where(np.logical_and(aa>0.5, bb>0.5), 1, 0)
    ...: 
100 loops, best of 3: 9.13 ms per loop
100 loops, best of 3: 9.16 ms per loop
100 loops, best of 3: 10.4 ms per loop
100 loops, best of 3: 10.4 ms per loop

In [19]: %timeit ((aa>0.5) & (bb>0.5)).astype('int8')
    ...: %timeit np.logical_and(aa>0.5, bb>0.5).astype('int8')
    ...: %timeit ((aa>0.5) & (bb>0.5)).astype('uint8')
    ...: %timeit np.logical_and(aa>0.5, bb>0.5).astype('uint8')
    ...: 
    ...: %timeit ((aa>0.5) & (bb>0.5)).astype(np.int8)
    ...: %timeit np.logical_and(aa>0.5, bb>0.5).astype(np.int8)
    ...: %timeit ((aa>0.5) & (bb>0.5)).astype(np.uint8)
    ...: %timeit np.logical_and(aa>0.5, bb>0.5).astype(np.uint8)
    ...: 
100 loops, best of 3: 5.6 ms per loop
100 loops, best of 3: 5.61 ms per loop
100 loops, best of 3: 5.63 ms per loop
100 loops, best of 3: 5.63 ms per loop
100 loops, best of 3: 5.62 ms per loop
100 loops, best of 3: 5.62 ms per loop
100 loops, best of 3: 5.62 ms per loop
100 loops, best of 3: 5.61 ms per loop

In [20]: %timeit 1 * ((aa > 0.5) & (bb > 0.5)) #@BPL's vectorized soln
100 loops, best of 3: 10.2 ms per loop