且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

的Python:转排序,重复值的单个数组到数组的数组?

更新时间:2022-10-19 14:05:58

itertools.groupby 使得这个简单

 导入和itertoolswanted_array = [列表(GRP)为_,GRP在itertools.groupby(my_first_array)

由于没有的功能,它只是收益取值由相同值的运行组,所以你列表化每一个列表中的COM prehension;十分简单。你可以把它看作是一个基本上在Python的API做了GNU工具包项目的工作, uniq的和相关业务。

在CPython中(参考间preTER), GROUPBY 用C实现的,它懒洋洋地和线性操作;数据必须已经出现在运行匹配的功能,所以整理可能会使它太昂贵,但对于已经被排序数据,如你有,没有什么会更有效

请注意:如果输入可能值相同,但不同的对象,它可能是有意义的内存原因改变列表(GRP)为_,GRP [K] * LEN(列表(GRP))为K,GRP 。在最终结果前者将保留原来的(可能值但未同一性重复)对象,后者将复制从各组的第一个对象,而不是,减少每个组的最终成本的低成本的N 单个对象的引用,而不是 N 1 之间的引用>和 N 的对象。

I have a sorted array with some repeated values. How can this array be turned into an array of arrays with the subarrays grouped by value (see below)? In actuality, my_first_array has ~8 million entries, so the solution would preferably be as time efficient as possible.

my_first_array = [1,1,1,3,5,5,9,9,9,9,9,10,23,23]

wanted_array = [ [1,1,1], [3], [5,5], [9,9,9,9,9], [10], [23,23] ]

itertools.groupby makes this trivial:

import itertools

wanted_array = [list(grp) for _, grp in itertools.groupby(my_first_array)]

With no key function, it just yields groups consisting of runs of identical values, so you list-ify each one in a list comprehension; easy-peasy. You can think of it as basically a within-Python API for doing the work of the GNU toolkit program, uniq, and related operations.

In CPython (the reference interpreter), groupby is implemented in C, and it operates lazily and linearly; the data must already appear in runs matching the key function, so sorting might make it too expensive, but for already sorted data like you have, there is nothing that will be more efficient.

Note: If the inputs might be value identical, but different objects, it may make sense for memory reasons to change list(grp) for _, grp to [k] * len(list(grp)) for k, grp. The former would retain the original (possibly value but not identity duplicate) objects in the final result, the latter would replicate the first object from each group instead, reducing the final cost per group to the cost of N references to a single object, instead of N references to between 1 and N objects.