且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用动态键加入pyspark数据框

更新时间:2023-11-18 20:03:28

不能那样使用.表示法,但是可以将timePeriodgetItem(方括号)运算符一起使用.

You can not use the . notation like that, but you can use timePeriod with the getItem (square brackets) operator.

由于captureRate DataFrame中的相应列略有不同,因此请创建一个新变量:

Since the corresponding columns in the captureRate DataFrame are slightly different, create a new variable:

# turns "year_mon" into "yr_mon" and "year_qtr" into "yr_qtr"
timePeriodCapture = timePeriod.replace("year", "yr")  

capturedPatients = PatientCounts.join(
    captureRate, 
    on=PatientCounts[timePeriod] == captureRate[timePeriodCapture]
    how="left_outer"
)

或者,如果联接列始终位于相同位置,则可以通过按索引访问列来创建联接条件:

Alternatively if the join columns are always in the same positions, you can create a join condition by accessing the columns by index:

capturedPatients = PatientCounts.join(
    captureRate, 
    on=PatientCounts[0] == captureRate[1], 
    how="left_outer"
)

查看更多: