如何在 spark scala 中重命名 S3 文件而不是 HDFS

更新时间：2023-11-22 22:06:10

你可以使用普通的 HDFS API，比如(输入，未测试)

you can use the normal HDFS APIs, something like (typed in, not tested)

val src = new Path("s3a://bucket/data/src")
val dest = new Path("s3a://bucket/data/dest")
val conf = sc.hadoopConfiguration   // assuming sc = spark context
val fs = src.getFileSystem(conf)
fs.rename(src, dest)

S3A 客户端伪造重命名的方式是对每个文件进行 copy+delete，所以它花费的时间与文件数量和数据量成正比.而 S3 会限制您:如果您尝试并行执行此操作，则可能会减慢您的速度.如果需要一段时间"，请不要感到惊讶.

The way the S3A client fakes a rename is a copy + delete of every file, so the time it takes is proportional to the #of files, and the amount of data. And S3 throttles you: if you try to do this in parallel, it will potentially slow you down. Don't be surprised if it takes "a while".

您还需要按 COPY 通话收费，每 1,000 个通话收费 0.005，因此您将花费约 5 美元来尝试.在一个小目录上进行测试，直到您确定一切正常

You also get billed per COPY call, at 0.005 per 1,000 calls, so it will cost you ~$5 to try. Test on a small directory until you are sure everything is working

上一篇 : ：CSS - 在下降而不是基线显示文本？下一篇 : Rest Controller 如何同时处理单个实例应用程序的多个请求?

如何在 spark scala 中重命名 S3 文件而不是 HDFS

相关阅读

推荐文章