且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

根据三列将一个Pandas数据框中的行与另一个数据框中的行进行匹配

更新时间:2023-08-28 16:33:34

可能有更有效的方法,但是这里的方法没有那些慢的for循环:

There is probably an even more efficient way to do this, but here is a method without those slow for loops:

import pandas as pd

dfB = pd.DataFrame({'X':[1,2,3],'Y':[1,2,3], 'Time':[10,20,30]})
dfA = pd.DataFrame({'X':[1,1,2,2,2,3],'Y':[1,1,2,2,2,3], 'ONSET_TIME':[5,7,9,16,22,28],'COLOR': ['Red','Blue','Blue','red','Green','Orange']})

#create one single table
mergeDf = pd.merge(dfA, dfB, left_on = ['X','Y'], right_on = ['X','Y'])
#remove rows where time is less than onset time
filteredDf = mergeDf[mergeDf['ONSET_TIME'] < mergeDf['Time']]
#take min time (closest to onset time)
groupedDf = filteredDf.groupby(['X','Y']).max()

print filteredDf

 COLOR  ONSET_TIME  X  Y  Time
0     Red           5  1  1    10
1    Blue           7  1  1    10
2    Blue           9  2  2    20
3     red          16  2  2    20
5  Orange          28  3  3    30


print groupedDf

COLOR  ONSET_TIME  Time
X Y                          
1 1     Red           7    10
2 2     red          16    20
3 3  Orange          28    30

基本思想是合并两个表,以便将时间一起放在一个表中.然后,我筛选了最大的记录(最接近dfB上的时间).如果您对此有疑问,请告诉我.

The basic idea is to merge the two tables so you have the times together in one table. Then I filtered on the recs that are the largest (closest to the time on your dfB). Let me know if you have questions about this.