更新时间:2023-11-25 11:09:22
您可以使用 np.where
设置所需的值基于布尔条件:
You can use np.where
to set your desired value based on a boolean condition:
In [18]:
DF_test['value'] = np.where(DF_test['value'] > threshold, 1,0)
DF_test
Out[18]:
c1 c2 value
0 a p 0
1 b q 0
2 c r 1
3 d s 1
4 e t 0
请注意,因为您的数据是一个异构的np数组,'value'列包含字符串而不是浮点数:
Note that because your data is a heterogenous np array the 'value' column contains strings rather than floats:
In [58]:
DF_test.iloc[0]['value']
Out[58]:
'0.12'
所以你需要首先将 dtype
转换为 float
: DF_test ['value'] = DF_test ['value']。astype(float)
So you'll need to convert the dtype
to float
first: DF_test['value'] = DF_test['value'].astype(float)
您可以比较时间:
In [16]:
%timeit np.where(DF_test['value'] > threshold, 1,0)
1000 loops, best of 3: 297 µs per loop
In [17]:
%%timeit
DF_naive = pd.DataFrame()
for i in range(DF_test.shape[0]):
#Get first 2 columns
first2cols = list(DF_test.ix[i][:-1])
#Check if value is greater than threshold
binary_value = [int((bool(float(DF_test.ix[i][-1]) > threshold)))]
#Create series object
SR_row = pd.Series( first2cols + binary_value,name=i)
#Add to empty dataframe container
DF_naive = DF_naive.append(SR_row)
10 loops, best of 3: 39.3 ms per loop
np.where
版本速度超过100倍,不可否认,你的代码正在做很多不必要的事情,但你得到了点
the np.where
version is over 100x faster, admittedly your code is doing a lot of unnecessary stuff but you get the point