Pyspark：Concat函数将列生成到新数据框中

更新时间：2022-12-12 09:36:09

select 。

为了使代码更加紧凑，我们首先可以获取想要的列列表中的差异：

To make the code a little more compact, we can first get the columns we want to diff in a list:

diff_columns = [c for c in df.columns if c != 'index']

下一步选择索引并遍历 diff_columns 计算新列。使用 .alias（）重命名结果列：

Next select the index and iterate over diff_columns to compute the new column. Use .alias() to rename the resulting column:

df_diff = df.select(
    'index',
    *[(func.log(func.col(c)) - func.log(func.lag(func.col(c)).over(w))).alias(c + "_diff")
      for c in diff_columns]
)
df_diff.show()
#+-----+------------------+-------------------+-------------------+
#|index|         col1_diff|          col2_diff|          col3_diff|
#+-----+------------------+-------------------+-------------------+
#|    1|              null|               null|               null|
#|    2| 0.693147180559945| 0.6931471805599454| 0.6931471805599454|
#|    3|0.4054651081081646|0.40546510810816416|0.40546510810816416|
#+-----+------------------+-------------------+-------------------+

上一篇 : ：如何将列表转换为列表下一篇 : 仅将值添加到DataGridView C＃中的一列

Pyspark：Concat函数将列生成到新数据框中

相关阅读

技术问答最新文章