且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在Scala/Spark的HDFS上将文件从一个文件夹移动到另一个文件夹

更新时间:2022-12-03 21:45:30

尝试以下Scala代码.

Try the following Scala code.

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path

val hadoopConf = new Configuration()
val hdfs = FileSystem.get(hadoopConf)

val srcPath = new Path(srcFilePath)
val destPath = new Path(destFilePath)

hdfs.copyFromLocalFile(srcPath, destPath)

您还应该检查Spark是否在conf/spark-env.sh文件中设置了HADOOP_CONF_DIR变量.这样可以确保Spark可以找到Hadoop配置设置.

You should also check if Spark has the HADOOP_CONF_DIR variable set in the conf/spark-env.sh file. This will make sure that Spark is going to find the Hadoop configuration settings.

build.sbt文件的依赖项:

The dependencies for the build.sbt file:

libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0"
libraryDependencies += "org.apache.commons" % "commons-io" % "1.3.2"
libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.6.0"

OR

您可以使用来自Apache Commons的IOUtils将数据从InputStream复制到OutputStream

you can used IOUtils from apache commons to copy data from InputStream to OutputStream

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import org.apache.commons.io.IOUtils;



val hadoopconf = new Configuration();
val fs = FileSystem.get(hadoopconf);

//Create output stream to HDFS file
val outFileStream = fs.create(new Path("hdfs://<namenode>:<port>/output_path"))

//Create input stream from local file
val inStream = fs.open(new Path("hdfs://<namenode>:<port>/input_path"))

IOUtils.copy(inStream, outFileStream)

//Close both files
inStream.close()
outFileStream.close()