且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何获取Azure数据工厂以循环浏览文件夹中的文件

更新时间:2021-07-29 08:34:16

我想每天循环浏览几个月

I would like to loop through months any days

  • 为此,您可以将两个参数从管道传递到活动,以便可以基于这些参数动态构建路径. ADF V2允许您传递参数.
  • 让我们一步一步地开始该过程:

    Let's start the process one by one:

    注意:如果需要,也可以从其他活动的输出中传递此参数.参考:

    Note: This parameters can be passed from the output of other activities as well if needed. Reference: Parameters in ADF

    2. Create two datasets.

    2.1 Sink Dataset - Blob Storage here. Link it with your Linked Service and provide the container name (make sure it is existing). Again if needed, it can be passed as parameters.

    2.2 Source Dataset - Blob Storage here again or depends as per your need. Link it with your Linked Service and provide the container name (make sure it is existing). Again if needed, it can be passed as parameters.
    Note: 1. The folder path decides the path to copy the data. If the container does not exists, the activity will create for you and if the file already exists the file will get overwritten by default.

    2. Pass the parameters in the dataset if you want to build the output path dynamically. Here i have created two parameters for dataset named monthcopy and datacopy.

    3. Create Copy Activity in the pipeline.

    Wildcard Folder Path:

    @{concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),'/',string(pipeline().parameters.month),'/',string(pipeline().parameters.day),'/*')}

where:
    The path will become as: current-yyyy/month-passed/day-passed/* (the * will take any folder on one level)

{
    "name": "pipeline2",
    "properties": {
        "activities": [
            {
                "name": "Copy Data1",
                "type": "Copy",
                "dependsOn": [],
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "source": {
                        "type": "DelimitedTextSource",
                        "storeSettings": {
                            "type": "AzureBlobStorageReadSettings",
                            "recursive": true,
                            "wildcardFolderPath": {
                                "value": "@{concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),'/',string(pipeline().parameters.month),'/',string(pipeline().parameters.day),'/*')}",
                                "type": "Expression"
                            },
                            "wildcardFileName": "*.csv",
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSettings"
                        }
                    },
                    "sink": {
                        "type": "DelimitedTextSink",
                        "storeSettings": {
                            "type": "AzureBlobStorageWriteSettings"
                        },
                        "formatSettings": {
                            "type": "DelimitedTextWriteSettings",
                            "quoteAllText": true,
                            "fileExtension": ".csv"
                        }
                    },
                    "enableStaging": false
                },
                "inputs": [
                    {
                        "referenceName": "DelimitedText1",
                        "type": "DatasetReference"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "DelimitedText2",
                        "type": "DatasetReference",
                        "parameters": {
                            "monthcopy": {
                                "value": "@pipeline().parameters.month",
                                "type": "Expression"
                            },
                            "datacopy": {
                                "value": "@pipeline().parameters.day",
                                "type": "Expression"
                            }
                        }
                    }
                ]
            }
        ],
        "parameters": {
            "month": {
                "type": "string"
            },
            "day": {
                "type": "string"
            }
        },
        "annotations": []
    }
}

用于SINK数据集的JSON模板:

{
    "name": "DelimitedText1",
    "properties": {
        "linkedServiceName": {
            "referenceName": "AzureBlobStorage1",
            "type": "LinkedServiceReference"
        },
        "annotations": [],
        "type": "DelimitedText",
        "typeProperties": {
            "location": {
                "type": "AzureBlobStorageLocation",
                "container": "corpdata"
            },
            "columnDelimiter": ",",
            "escapeChar": "\\",
            "quoteChar": "\""
        },
        "schema": []
    }
}

源数据集的JSON模板:

{
    "name": "DelimitedText2",
    "properties": {
        "linkedServiceName": {
            "referenceName": "AzureBlobStorage1",
            "type": "LinkedServiceReference"
        },
        "parameters": {
            "monthcopy": {
                "type": "string"
            },
            "datacopy": {
                "type": "string"
            }
        },
        "annotations": [],
        "type": "DelimitedText",
        "typeProperties": {
            "location": {
                "type": "AzureBlobStorageLocation",
                "folderPath": {
                    "value": "@concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),dataset().monthcopy,'/',dataset().datacopy)",
                    "type": "Expression"
                },
                "container": "copycorpdata"
            },
            "columnDelimiter": ",",
            "escapeChar": "\\",
            "quoteChar": "\""
        },
        "schema": []
    }
}