且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

有效地写入 pandas 中的多个相邻列

更新时间:2023-11-18 21:25:52

我们添加了即使在多dtype框架中也可以直接建立索引的功能.这是现在掌握的,将是0.17.0.您可以在< 0.17.0,但需要对内部进行更多的操作.

We are adding the ability to index directly even in a multi-dtype frame. This is in master now and will be in 0.17.0. You can do this in < 0.17.0, but it requires (more) manipulation of the internals.

In [1]: df = DataFrame({'A' : range(5), 'B' : range(6,11), 'C' : 'foo'})

In [2]: df.dtypes
Out[2]: 
A     int64
B     int64
C    object
dtype: object

copy=False标志是新的.这给了你dtypes-> blocks的字典(它们是dtype可分离的)

The copy=False flag is new. This gives you a dict of dtypes->blocks (which are dtype separable)

In [3]: b = df.as_blocks(copy=False)

In [4]: b
Out[4]: 
{'int64':    A   B
 0  0   6
 1  1   7
 2  2   8
 3  3   9
 4  4  10, 'object':      C
 0  foo
 1  foo
 2  foo
 3  foo
 4  foo}

这是基础的numpy数组.

Here is the underlying numpy array.

In [5]: b['int64'].values
Out[5]: 
array([[ 0,  6],
       [ 1,  7],
       [ 2,  8],
       [ 3,  9],
       [ 4, 10]])

这是原始数据集中的数组

This is the array in the original data set

In [7]: id(df._data.blocks[0].values)
Out[7]: 4429267232

这是我们对此的看法.他们是一样的

Here is our view on it. They are the same

In [8]: id(b['int64'].values.base)
Out[8]: 4429267232

现在,您可以访问框架,并使用pandas set操作进行修改. 您还可以通过.values直接访问numpy数组,现在它是原始视图的VIEW.

Now you can access the frame, and use pandas set operations to modify. You can also directly access the numpy array via .values, which is now a VIEW into the original.

只要您不更改数据本身的dtype(例如,不要尝试在此处放置字符串,就不会进行复制),就不会进行复制,因此不会对修改造成任何速度损失(例如,不要尝试在此处放置字符串;它可以工作,但是视图将丢失)

You will not incur any speed penalty for modifications as copies won't be made as long as you don't change the dtype of the data itself (e.g. don't try to put a string here; it will work but the view will be lost)

In [9]: b['int64'].loc[0,'A'] = -1

In [11]: b['int64'].values[0,1] = -2

有了视图,您就可以更改基础数据了.

Since we have a view, you can then change the underlying data.

In [12]: df
Out[12]: 
   A   B    C
0 -1  -2  foo
1  1   7  foo
2  2   8  foo
3  3   9  foo
4  4  10  foo

请注意,如果您修改数据的形状(例如,如果添加一列),则视图将丢失.

Note that if you modify the shape of the data (e.g. if you add a column for example) then the views will be lost.