且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

替换CSV文件中的新行(\ n)字符-Spark Scala

更新时间:2023-12-03 19:16:52

如果可以使用Spark SQL 1.5或更高版本,则可以考虑使用

If you can use Spark SQL 1.5 or higher, you may consider using the functions available for columns. Assuming you don't know (or don't have) the names for the columns, you can do as in the following snippet:

val df = test.toDF()

import org.apache.spark.sql.functions._
val newDF = df.withColumn(df.columns(4), regexp_replace(col(df.columns(4)), "[\\r\\n]", "|"))

如果您知道该列的名称,则在两种情况下都可以用它的名称替换df.columns(4).

If you know the name of the column, you can replace df.columns(4) by its name in both occurences.

我希望能有所帮助. 干杯.

I hope that helps. Cheers.