如何使用 apache pig 在 hadoop 集群上加载文件?

更新时间：2023-01-11 16:55:11

我的建议:

在hdfs中创建文件夹:hadoop fs -mkdir/pigdata

加载文件到创建的hdfs文件夹:hadoop fs -put/opt/pig/tutorial/data/excite-small.log/pigdata

Load the file to the created hdfs folder: hadoop fs -put /opt/pig/tutorial/data/excite-small.log /pigdata

(或者你可以从 grunt shell 中执行 grunt> copyFromLocal/opt/pig/tutorial/data/excite-small.log/pigdata)

(or you can do it from grunt shell as grunt> copyFromLocal /opt/pig/tutorial/data/excite-small.log /pigdata)

执行猪拉丁脚本:

Execute the pig latin script :

   grunt> set debug on

   grunt> set job.name 'first-p2-job'

   grunt> log = LOAD 'hdfs://hostname:54310/pigdata/excite-small.log' AS 
              (user:chararray, time:long, query:chararray); 
   grunt> grpd = GROUP log BY user; 
   grunt> cntd = FOREACH grpd GENERATE group, COUNT(log); 
   grunt> STORE cntd INTO 'output';

输出文件将存储在hdfs://hostname:54310/pigdata/output

上一篇 : ：使用Excel VBA控制Internet Explorer本地Intranet下一篇 : Gradle distZip配置文件

如何使用 apache pig 在 hadoop 集群上加载文件?

相关阅读

技术问答最新文章