更新时间:2021-09-20 00:14:01
经过很多错误的转折后,我发现,至少在弹性映射上减少了Hadoop的实现,Pig似乎忽略了CLASSPATH环境变量.相反,我发现我可以使用HADOOP_CLASSPATH变量来控制类路径.
After quite a few wrong turns, I found that, at least on the elastic map reduce implementation of Hadoop, Pig seems to ignore the CLASSPATH environment variable. I found instead that I could control the class path using the HADOOP_CLASSPATH variable instead.
一旦我意识到这一点,就可以很容易地设置要使用Python UDFS的内容:
Once I made that realization, it was fairly easy to get things setup to use Python UDFS:
sudo apt-get install jython -y -qq
sudo apt-get install jython -y -qq
export HADOOP_CLASSPATH=/usr/share/java/jython.jar:/usr/share/maven-repo/org/antlr/antlr-runtime/3.2/antlr-runtime-3.2.jar
sudo mkdir /usr/share/java/cachedir/
sudo chmod a+rw /usr/share/java/cachedir
sudo mkdir /usr/share/java/cachedir/
sudo chmod a+rw /usr/share/java/cachedir
我应该指出,这似乎与我在寻找解决此问题的方法时发现的其他建议直接矛盾:
I should point out that this seems to directly contradict other advice I found while searching for solutions to this problem:
register
语句中使用的.py文件的路径可以是相对的,也可以是绝对的,这似乎无关紧要.register
statement may be relative or absolute, it doesn't seem to matter.