且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何将Spark DataFrame作为CSV存储到Azure Blob存储中

更新时间:2023-01-06 18:51:12

结果证明在作业失败之前出现内部错误

It turns out way before the job fails there was an internal error

Caused by: java.lang.NoSuchMethodError: com.microsoft.azure.storage.blob.CloudBlob.startCopyFromBlob(Ljava/net/URI;Lcom/microsoft/azure/storage/AccessCondition;Lcom/microsoft/azure/storage/AccessCondition;Lcom/microsoft/azure/storage/blob/BlobRequestOptions;Lcom/microsoft/azure/storage/OperationContext;)Ljava/lang/String;
    at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobWrapperImpl.startCopyFromBlob(StorageInterfaceImpl.java:399)
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2449)
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2372)
    at org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsOutputStream.restoreKey(NativeAzureFileSystem.java:918)
    at org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsOutputStream.close(NativeAzureFileSystem.java:819)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
    at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:320)
    at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
    at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
    at com.univocity.parsers.common.AbstractWriter.close(AbstractWriter.java:876)
    ... 18 more

正在发生的是在使用实际数据创建临时文件之后,它正在尝试将文件移动到用户使用CloudBlob.startCopyFromBlob指定的位置.像往常一样,microsft人通过将此方法重命名为CloudBlob.startCopy来打破了这一点.

What's happening is after creating a temp file with the actual data, it's trying to move the file to the location given by the user using CloudBlob.startCopyFromBlob. Like always, microsft people broke this by renaming this method to CloudBlob.startCopy.

我使用的是"org.apache.hadoop" % "hadoop-azure" % "3.2.1"的最新版本的"org.apache.hadoop" % "hadoop-azure" % "3.2.1",它似乎与较早的startCopyFromBlob保持一致,所以 我需要使用具有此方法的旧版azure-storage版本,可能是2.x.x.

I'm using "org.apache.hadoop" % "hadoop-azure" % "3.2.1" which is most recent for "hadoop-azure" and it seems to have stuck with the older startCopyFromBlob, so I need to use an old azure-storage version that has this method, probably 2.x.x.

请参见 https://github.com/Azure/azure-storage- java/issues/113