pandas 独特价值多列

更新时间：2023-02-16 15:38:34

一种方法是选择列将它们传递给 np.unique ：

One way is to select the columns and pass them to np.unique:

>>> np.unique(df[['Col1', 'Col2']])
array(['Bill', 'Bob', 'Joe', 'Mary', 'Steve'], dtype=object)

请注意，一些版本的Pandas / NumPy可能需要您从列中显式传递值， code> .values 属性：

Note that some versions of Pandas/NumPy may require you to explicitly pass the values from the columns with the .values attribute:

np.unique(df[['Col1', 'Col2']].values)

更快的方法是使用 pd.unique 。该函数使用基于哈希表的算法，而不是使用NumPy的基于分类的算法。您将需要使用 ravel（）传递1D数组：

A faster way is to use pd.unique. This function uses a hashtable-based algorithm instead of NumPy's sort-based algorithm. You will need to pass a 1D array using ravel():

>>> pd.unique(df[['Col1', 'Col2']].values.ravel())
array(['Bob', 'Joe', 'Steve', 'Bill', 'Mary'], dtype=object)

对于较大的DataFrames，速度差异很大：

The difference in speed is significant for larger DataFrames:

>>> df1 = pd.concat([df]*100000) # DataFrame with 500000 rows
>>> %timeit np.unique(df1[['Col1', 'Col2']].values)
1 loops, best of 3: 619 ms per loop

>>> %timeit pd.unique(df1[['Col1', 'Col2']].values.ravel())
10 loops, best of 3: 49.9 ms per loop

上一篇 : ：Favicon未在我的网站上显示下一篇 : 反序列化JSON对象 - 日期时间

pandas 独特价值多列

相关阅读

技术问答最新文章