且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用 Zeppelin 将 Spark DataFrame 从 Python 迁移到 Scala

更新时间:2023-09-19 15:23:58

您可以放置内部 Java 对象而不是 Python 包装器:

You canput internal Java object not a Python wrapper:

%pyspark

df = sc.parallelize([(1, "foo"), (2, "bar")]).toDF(["k", "v"])
z.put("df", df._jdf)

然后确保您使用正确的类型:

and then make sure you use correct type:

val df = z.get("df").asInstanceOf[org.apache.spark.sql.DataFrame]
// df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]

但***注册临时表:

%pyspark

# registerTempTable in Spark 1.x
df.createTempView("df")

并使用 SQLContext.table 阅读:

and use SQLContext.table to read it:

// sqlContext.table in Spark 1.x
val df = spark.table("df")

df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]

要向相反方向转换,请参阅Zeppelin:Scala Dataframe to python

To convert in the opposite direction see Zeppelin: Scala Dataframe to python