更新时间:2022-05-22 22:58:08
您的第二种方法是正确的.不幸的是,您需要一个 UDF 来将元组转换为包,据我所知,没有内置函数可以做到这一点.然而,写一个是一件简单的事情.
Your second approach is on the right track. Unfortunately, you'll need a UDF to convert a tuple to a bag, and as far as I know there is no builtin to do this. It's a simple matter to write one, however.
您不想对固定字段进行分组,而是对键值对本身进行分组.所以只需要保留键值对的元组即可;您可以完全忽略固定字段.
You won't want to group on the fixed fields, but rather on the key-value pairs themselves. So you only need to keep the tuple of key-value pairs; you can completely ignore the fixed fields.
UDF 非常简单.在 Java 中,您可以在 exec
方法中执行以下操作:
The UDF is pretty simple. In Java, you can just do something like this in your exec
method:
DataBag b = new DefaultDataBag();
Tuple t = (Tuple) input.get(0);
for (int i = 0; i < t.size(); i++) {
Object o = t.get(i);
Tuple e = TupleFactory.getInstance().createTuple(o);
b.add(e);
}
return b;
一旦你有了它,把 STRSPLIT
中的元组变成一个袋子,把它压平,然后进行分组和计数.
Once you have that, turn the tuple from STRSPLIT
into a bag, flatten it, and then do the grouping and counting.