更新时间:2023-01-11 16:29:37
我没有发现 UDF 调用有任何开销问题.
I don't see any overhead issue with the UDF invocation.
参考:http://datafu.incubator.apache.org/docs/datafu/guide/set-operations.html,我们有一个使用 SetDifference 方法的例子.
Ref : http://datafu.incubator.apache.org/docs/datafu/guide/set-operations.html, we have a example for using SetDifference method.
根据 API (http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/sets/SetDifference.html) SetDifference 方法将包作为输入并发出它们之间的差异.
As per API (http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/sets/SetDifference.html) SetDifference method takes bags as input and emits the difference between them.
注意请注意,输入的袋子必须进行排序.
N.B. Do note that the input bags have to be sorted.
在共享的示例代码段中,我看不到需要以下代码段
In the example snippet shared, I don't see the need of below code snippet
F1 = foreach A generate B1;
F2 = foreach A generate B2;