且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何获取上传文件的路径

更新时间:2023-11-29 19:59:40

使用 SparkFiles 机制分发的文件的本地路径(--files 参数,SparkContext.addFile) 方法可以使用 SparkFiles.get 获得:

Local path to a file distributed using SparkFiles mechanism (--files argument, SparkContext.addFile) method can be obtained using SparkFiles.get:

org.apache.spark.SparkFiles.get(fileName)

您还可以使用 SparkFiles.getRootDirectory 获取根目录的路径:

You can also get the path to the root directory using SparkFiles.getRootDirectory:

org.apache.spark.SparkFiles.getRootDirectory

您可以将这些与标准 IO 实用程序结合使用来读取文件.

You can use these combined with standard IO utilities to read the files.

如何在 SparkContext 初始化之前读取文件 Configuration.properties?

how can I read the file Configuration.properties before the SparkContext has been initialized?

SparkFiles 由驱动程序分发,在上下文初始化之前无法访问,并且首先要分发,必须可以从驱动程序节点访问.因此,这部分问题完全取决于您将使用哪种类型的存储将文件公开给驱动程序节点.

SparkFiles are distributed by the driver, cannot be accessed before context has been initialized, and to be distributed in the first place, have to be accessible from the driver node. So this part of the question solely depends what type of storage you'll use to expose the file to the driver node.