且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用Shell脚本将HiveQL查询的结果输出到CSV?

更新时间:2023-02-04 21:56:45

您可以在Shell脚本中运行和监视并行作业:

You can run and monitor parallel jobs in a shell script:

#!/bin/bash

#Run parallel processes and wait for their completion

#Add loop here or add more calls
hive -e "SELECT * FROM db.table1;" | tr "\t" "," > example1.csv &
hive -e "SELECT * FROM db.table2;" | tr "\t" "," > example2.csv &
hive -e "SELECT * FROM db.table3;" | tr "\t" "," > example3.csv &

#Note the ampersand in above commands says to create parallel process
#You can wrap hive call in a function an do some logging in it, etc
#And call a function as parallel process in the same way
#Modify this script to fit your needs

#Now wait for all processes to complete

#Failed processes count
FAILED=0

for job in `jobs -p`
do
   echo "job=$job"
   wait $job || let "FAILED+=1"
done   

#Final status check
if [ "$FAILED" != "0" ]; then
    echo "Execution FAILED!  ($FAILED)"
    #Do something here, log or send messege, etc
    exit 1
fi

#Normal exit
#Do something else here
exit 0

还有其他方法(使用XARGS,GNU并行)在shell中运行并行进程,并在其中运行大量资源.另请阅读 https://www.slashroot.in/how-run- multiple-commands-parallel-linux https ://thoughtsimproved.wordpress.com/2015/05/18/parellel-processing-in-bash/

There are other ways (using XARGS, GNU parallel) to run parallel processes in shell and a lot of resources on it. Read also https://www.slashroot.in/how-run-multiple-commands-parallel-linux and https://thoughtsimproved.wordpress.com/2015/05/18/parellel-processing-in-bash/