且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Python:numpy/pandas根据条件更改值

更新时间:2022-10-17 23:15:55

如果我们在谈论数组:

import numpy as np
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], dtype=np.float)
print 1 / a[a <= 0.5] * (-1)

这将只返回小于0.5的值.

或者使用np.where:

import numpy as np
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], dtype=np.float)
print np.where(a < 0.5, 1 / a * (-1), a)

谈论pandas DataFrame :

@dmvianna 的回答相同(请给他一些荣誉;)),然后将其调整为pd.DataFrame:

df.a = df.a.where(df.a > 0.5, (1 / df.a) * (-1))

I would like to know if there is a faster and more "pythonic" way of doing the following, e.g. using some built in methods. Given a pandas DataFrame or numpy array of floats, if the value is equal or smaller than 0.5 I need to calculate the reciprocal value and multiply with -1 and replace the old value with the newly calculated one. "Transform" is probably a bad choice of words, please tell me if you have a better/more accurate description.

Thank you for your help and support!!

Data:

import numpy as np
import pandas as pd
dicti = {"A" : np.arange(0.0, 3, 0.1), 
         "B" : np.arange(0, 30, 1),
         "C" : list("ELVISLIVES")*3}
df = pd.DataFrame(dicti)

my function:

def transform_colname(df, colname):
    series = df[colname]    
    newval_list = []
    for val in series:
        if val <= 0.5:
            newval = (1/val)*-1
            newval_list.append(newval)
        else:
            newval_list.append(val)
    df[colname] = newval_list
    return df

function call:

transform_colname(df, colname="A")

**--> I'm summing up the results here, since comments wouldn't allow to post code (or I don't know how to do it).**

Thank you all for your fast and great answers!!

using ipython "%timeit" with "real" data:

my function: 10 loops, best of 3: 24.1 ms per loop

from jojo:

def transform_colname_v2(df, colname):
    series = df[colname]        
    df[colname] = np.where(series <= 0.5, 1/series*-1, series)
    return df

100 loops, best of 3: 2.76 ms per loop

from FooBar:

def transform_colname_v3(df, colname):
    df.loc[df[colname] <= 0.5, colname]  = - 1 / df[colname][df[colname] <= 0.5]
    return df

100 loops, best of 3: 3.32 ms per loop

from dmvianna:

def transform_colname_v4(df, colname):
    df[colname] = df[colname].where(df[colname] <= 0.5, (1/df[colname])*-1)
    return df

100 loops, best of 3: 3.7 ms per loop

Please tell/show me if you would implement your code in a different way!

One final QUESTION: (answered) How could "FooBar" and "dmvianna" 's versions be made "generic"? I mean, I had to write the name of the column into the function (since using it as a variable didn't work). Please explain this last point! --> thanks jojo, ".loc" isn't the right way, but very simple df[colname] is sufficient. changed the functions above to be more "generic". (also changed ">" to be "<=", and updated timing)

Thank you very much!!

If we are talking about arrays:

import numpy as np
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], dtype=np.float)
print 1 / a[a <= 0.5] * (-1)

This will, however only return the values smaller than 0.5.

Alternatively use np.where:

import numpy as np
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], dtype=np.float)
print np.where(a < 0.5, 1 / a * (-1), a)

Talking about pandas DataFrame:

As in @dmvianna's answer (so give some credit to him ;) ), adapting it to pd.DataFrame:

df.a = df.a.where(df.a > 0.5, (1 / df.a) * (-1))