且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在pandas数据框中映射值的范围

更新时间:2023-02-26 17:34:07

有几种选择.

您可以构造边界列表,然后使用专业的库功能. @EdChum的解决方案以及

You can construct a list of boundaries, then use specialist library functions. This is described in @EdChum's solution, and also in this answer.

df = pd.DataFrame(data=np.random.randint(1,10,10), columns=['a'])

criteria = [df['a'].between(1, 3), df['a'].between(4, 7), df['a'].between(8, 10)]
values = [1, 2, 3]

df['b'] = np.select(criteria, values, 0)

criteria的元素是布尔系列,因此对于列表值,可以使用df['a'].isin([1, 3])等.

The elements of criteria are Boolean series, so for lists of values, you can use df['a'].isin([1, 3]), etc.

d = {range(1, 4): 1, range(4, 8): 2, range(8, 11): 3}

df['c'] = df['a'].apply(lambda x: next((v for k, v in d.items() if x in k), 0))

print(df)

   a  b  c
0  1  1  1
1  7  2  2
2  5  2  2
3  1  1  1
4  3  1  1
5  5  2  2
6  4  2  2
7  4  2  2
8  9  3  3
9  3  1  1