且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Databricks读取Azure Blob的上次修改日期

更新时间:2023-01-29 19:51:29

通常,有两种方法可以读取Azure Blob最后修改的数据,如下所示.

Generally, there are two ways to read an Azure Blob last modified data, as below.

  1. 通过Azure存储REST API或Java的Azure存储SDK直接阅读. 在研究了Azure Blob存储REST API之后,有两个REST API Get Blob & Get Blob Properties 可以获取Last-Modified来自响应标头的属性.因此,您可以在Scala中调用这些api来解析api响应标头来获取它,或者只是在Scala中使用Java的Azure存储SDK来执行相同的操作.
  1. Directly read it via Azure Storage REST API or Azure Storage SDK for Java. After I researched Azure Blob Storage REST APIs, there are two REST APIs Get Blob & Get Blob Properties which can get the Last-Modified property from the response header. So you can call these apis in Scala to parse api response header to get it, or simply using Azure Storage SDK for Java in Scala to do the same.

这是我在Java中的示例代码,用于获取blob的Last-Modified属性.

Here is my sample code in Java for getting Last-Modified property of a blob.

import java.util.Date;

import com.microsoft.azure.storage.CloudStorageAccount;
import com.microsoft.azure.storage.StorageException;
import com.microsoft.azure.storage.blob.CloudBlob;
import com.microsoft.azure.storage.blob.CloudBlobClient;
import com.microsoft.azure.storage.blob.CloudBlobContainer;

String StorageConnectionStringTemplate = "DefaultEndpointsProtocol=https;" + 
        "DefaultEndpointsProtocol=https;" +
        "AccountName=%s;" +
        "AccountKey=%s";
String accountName = "<your storage account name for HDInsight>";
String accountKey = "<your storage account key for HDInsight>";
String containerName = "<container name for HDFS>";
String blobName = "<blob name>";
String storageConnectionString = String.format(StorageConnectionStringTemplate, accountName, accountKey);
CloudStorageAccount storageAccount = CloudStorageAccount.parse(storageConnectionString);
CloudBlobClient client = storageAccount.createCloudBlobClient();
CloudBlobContainer container = client.getContainerReference(containerName);
CloudBlob blob = container.getBlobReferenceFromServer(blobName);
Date lastModifiedDate = blob.getProperties().getLastModified();

考虑 Hadoop Azure 基于适用于Java的Azure存储SDK 8.0.0,而不是最新版本的10.0,因此上面的示例代码与

Considering for Hadoop Azure is based on Azure Storage SDK for Java 8.0.0, not a newest version 10.0, so my sample code above is different from the offical tutorial of Azure Blob Storage for Java.

如果要获取容器的Last-Modified属性,则可以使用REST API [Get Container Properties][5]或Java代码Date lastModifiedDate = container.getProperties().getLastModified();.

If you want to get the Last-Modified property of a container, you can use the REST API [Get Container Properties][5] or the Java code Date lastModifiedDate = container.getProperties().getLastModified();.

  1. 将Hadoop Azure Java API用于wasb://协议.

import java.util.Date;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileStatus;

Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(conf);
Path f = new Path("<blob path on HDFS>");
FileStatus fileStatus = hdfs.getFileStatus(f);
long lastModifiedTime = f.getModificationTime();
Date lastModifiedDate = new Date(lastModifiedTime);