更新时间:2021-10-29 23:29:06
您的第二种方法是正确的.不幸的是,您将需要UDF将元组转换为包,据我所知,没有内置函数可以执行此操作.但是,写一个是简单的事情.
Your second approach is on the right track. Unfortunately, you'll need a UDF to convert a tuple to a bag, and as far as I know there is no builtin to do this. It's a simple matter to write one, however.
您不想对固定字段进行分组,而是希望对键值对本身进行分组.因此,您只需要保留键值对的元组即可;您可以完全忽略固定字段.
You won't want to group on the fixed fields, but rather on the key-value pairs themselves. So you only need to keep the tuple of key-value pairs; you can completely ignore the fixed fields.
UDF非常简单.在Java中,您可以在exec
方法中执行以下操作:
The UDF is pretty simple. In Java, you can just do something like this in your exec
method:
DataBag b = new DefaultDataBag();
Tuple t = (Tuple) input.get(0);
for (int i = 0; i < t.size(); i++) {
Object o = t.get(i);
Tuple e = TupleFactory.getInstance().createTuple(o);
b.add(e);
}
return b;
一旦有了,就将STRSPLIT
中的元组变成一个袋子,将其展平,然后进行分组和计数.
Once you have that, turn the tuple from STRSPLIT
into a bag, flatten it, and then do the grouping and counting.