且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从NumPy 2D数组中删除重复的列和行

更新时间:2022-10-19 20:10:12

这是一个想法,它需要一点点工作,但可能相当快。我会给你1d的情况,让你弄清楚如何将它扩展到2d。以下函数查找一个1d数组的唯一元素:

  import numpy as np 
def unique(a) :
a = np.sort(a)
b = np.diff(a)
b = np.r_ [1,b]
返回一个[b!= 0]

现在将其扩展到2d,您需要更改两件事。您将需要弄清楚自己如何进行排序,关于排序的重要事情将是两个完全相同的条目彼此相邻。第二,你需要像(b!= 0).all(axis)这样做,因为你想比较整个行/列。让我知道这是否足以让你开始。



更新:对于doug有一些帮助,我认为这应该适用于2d。

  import numpy as np 
def unique(a):
order = np.lexsort(aT)
a = a [order]
diff = np.diff(a,axis = 0)
ui = np.ones(len(a),'bool')
ui [1:] = != 0).any(axis = 1)
return a [ui]


I'm using a 2D shape array to store pairs of longitudes+latitudes. At one point, I have to merge two of these 2D arrays, and then remove any duplicated entry. I've been searching for a function similar to numpy.unique, but I've had no luck. Any implementation I've been thinking on looks very "unoptimizied". For example, I'm trying with converting the array to a list of tuples, removing duplicates with set, and then converting to an array again:

coordskeys = np.array(list(set([tuple(x) for x in coordskeys])))

Are there any existing solutions, so I do not reinvent the wheel?

To make it clear, I'm looking for:

>>> a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
>>> unique_rows(a)
array([[1, 1], [2, 3],[5, 4]])

BTW, I wanted to use just a list of tuples for it, but the lists were so big that they consumed my 4Gb RAM + 4Gb swap (numpy arrays are more memory efficient).

Here's one idea, it'll take a little bit of work but could be quite fast. I'll give you the 1d case and let you figure out how to extend it to 2d. The following function finds the unique elements of of a 1d array:

import numpy as np
def unique(a):
    a = np.sort(a)
    b = np.diff(a)
    b = np.r_[1, b]
    return a[b != 0]

Now to extend it to 2d you need to change two things. You will need to figure out how to do the sort yourself, the important thing about the sort will be that two identical entries end up next to each other. Second, you'll need to do something like (b != 0).all(axis) because you want to compare the whole row/column. Let me know if that's enough to get you started.

updated: With some help with doug, I think this should work for the 2d case.

import numpy as np
def unique(a):
    order = np.lexsort(a.T)
    a = a[order]
    diff = np.diff(a, axis=0)
    ui = np.ones(len(a), 'bool')
    ui[1:] = (diff != 0).any(axis=1) 
    return a[ui]