且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Hadoop Pig UDF 调用问题

更新时间:2023-01-11 16:29:37

我没有发现 UDF 调用有任何开销问题.

I don't see any overhead issue with the UDF invocation.

参考:http://datafu.incubator.apache.org/docs/datafu/guide/set-operations.html,我们有一个使用 SetDifference 方法的例子.

Ref : http://datafu.incubator.apache.org/docs/datafu/guide/set-operations.html, we have a example for using SetDifference method.

根据 API (http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/sets/SetDifference.html) SetDifference 方法将包作为输入并发出它们之间的差异.

As per API (http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/sets/SetDifference.html) SetDifference method takes bags as input and emits the difference between them.

注意请注意,输入的袋子必须进行排序.

N.B. Do note that the input bags have to be sorted.

在共享的示例代码段中,我看不到需要以下代码段

In the example snippet shared, I don't see the need of below code snippet

F1 = foreach A generate B1;
F2 = foreach A generate B2;