且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在 Pig 中合并两行

更新时间:2023-11-22 15:51:40

使用原生猪很难解决这个问题.一种选择是下载 datafu-1.2.0.jar 库并尝试以下方法.

It will be difficult to solve this problem using native pig. One option could be download the datafu-1.2.0.jar library and try the below approach.

input.txt

ABC,DEF,,
,,GHI,JKL
MNO,PQR,,
,,STU,VWX

PigScript:

REGISTER /tmp/datafu-1.2.0.jar;
DEFINE BagSplit datafu.pig.bags.BagSplit();

A = LOAD 'input.txt' USING PigStorage(',') AS(f1,f2,f3,f4);
B = GROUP A ALL;
C = FOREACH B GENERATE FLATTEN(BagSplit(2,$1)) AS mybag;
D = FOREACH C GENERATE FLATTEN(STRSPLIT(REPLACE(BagToString(mybag),'_null_null_null_null',''),'_',4));
E = FOREACH D GENERATE $2,$3,$0,$1;
DUMP E;

输出:

(MNO,PQR,STU,VWX)
(ABC,DEF,GHI,JKL)

注意:基于上述输入格式,我的假设将是第一行最后两个列为空,第二行前两个列为空,同样对于第三和第四行也是

Note: Based on the above input format, my assumption will be 1st row last two cols will be null, 2nd row first two cols will be null, similarly for 3rd and 4th row also