且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

检查袋子中是否存在某种元素?

更新时间:2023-11-26 08:43:52

在Apache Pig中,您可以使用嵌套在FOREACH

In Apache Pig you can use statements nested in FOREACH see Pig Basics. Here is example from the documentation: A is a bag in B.

X = FOREACH B {
        S = FILTER A BY 'xyz';
        GENERATE COUNT (S.$0);
}

您可以使用IsEmpty和?:运算符代替COUNT个

Instead of COUNT you can use IsEmpty and ?: operator

X = FOREACH B {
        S = FILTER A BY 'xyz';
        GENERATE (IsEmpty(S.$0)) ? 'xyz NOT PRESENT' : 'xyz PRESENT') as present, B;
}

或者只留下装有数据的袋子:

Or only to leave the bags that contain the data:

X = FOREACH B {
        S = FILTER A BY 'xyz';
        GENERATE B, S;
}
F = FILTER X BY not IsEmpty(S);
R = FOREACH F GENERATE B;

这将避免昂贵的连接本身,因为额外的连接是额外的Map Reduce作业.

This will avoid costly join to itself, as extra joins are extra Map Reduce jobs.