且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何控制Azure数据工厂管道中的数据故障?

更新时间:2023-02-04 21:22:06

我认为您在ADF中遇到了一个相当普遍的问题和限制.尽管您使用JSON定义的数据集允许ADF理解数据的结构,但仅是结构,编排工具就无法在活动处理中做任何事情来转换或操纵数据.

I think you've hit a fairly common problem and limitation within ADF. Although the datasets you define with your JSON allow ADF to understand the structure of the data, that is all, just the structure, the orchestration tool can't do anything to transform or manipulate the data as part of the activity processing.

要直接回答您的问题,肯定有可能.但是您需要先分解C#并使用ADF的可扩展性功能来处理坏行,然后再将其传递到最终目的地.

To answer your question directly, it's certainly possible. But you need to break out the C# and use ADF's extensibility functionality to deal with your bad rows before passing it to the final destination.

我建议您扩展数据工厂以包括一个自定义活动,在该活动中,您可以构建一些较低级别的清理过程来转移不良行,如所述.

I suggest you expand your data factory to include a custom activity where you can build some lower level cleaning processes to divert the bad rows as described.

我们经常采用这种方法,因为并非所有数据都是完美的(我希望如此),并且 ETL ELT 不起作用.我更喜欢使用首字母缩写 ECLT . "C"代表干净的地方.或清理,准备等.这当然适用于ADF,因为此服务没有自己的计算或SSIS样式的数据流引擎.

This is an approach we often take as not all data is perfect (I wish) and ETL or ELT doesn't work. I prefer the acronym ECLT. Where the 'C' stands for clean. Or cleanse, prepare etc. This certainly applies to ADF because this service doesn't have its own compute or SSIS style data flow engine.

所以...

有关如何执行此操作.首先,我建议您查看有关创建ADF自定义活动的博客文章.链接:

In terms of how to do this. First I recommend you check out this blog post on creating ADF custom activities. Link:

https://www. purplefrogsystems.com/paul/2016/11/creating-azure-data-factory-custom-activities/

然后在C#类中继承自IDotNetActivity的内容进行以下操作.

Then within your C# class inherited from IDotNetActivity do something like the below.

    public IDictionary<string, string> Execute(
        IEnumerable<LinkedService> linkedServices,
        IEnumerable<Dataset> datasets,
        Activity activity,
        IActivityLogger logger)
    {

    //etc

    using (StreamReader vReader = new StreamReader(YourSource))
        {
            using (StreamWriter vWriter = new StreamWriter(YourDestination))
            {
                while (!vReader.EndOfStream)
                {
                //data transform logic, if bad row etc
                }
            }
        }
  }

您明白了.建立自己的SSIS数据流!

You get the idea. Build your own SSIS data flow!

然后将干净的行写为输出数据集,可以作为下一个ADF活动的输入.要么具有多个管道,要么作为单个管道内的链接活动.

Then write out your clean row as an output dataset, which can be the input for your next ADF activity. Either with multiple pipelines, or as chained activities within a single pipeline.

这是让ADF处理当前服务产品中的不良数据的唯一方法.

This is the only way you will get ADF to deal with your bad data in the current service offerings.

希望这会有所帮助