且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在EMR上的Spark中添加JDBC驱动程序

更新时间:2021-10-23 18:48:10

我遇到了同样的问题.对我来说结束的工作是使用与spark-submit一起使用的--driver-class-path参数.

I had the same problem. What ended working for me is to use the --driver-class-path parameter used with spark-submit.

主要是将整个spark类路径添加到--driver-class-path

The main thing is to add the entire spark class path to the --driver-class-path

这是我的步骤:

  1. 我通过获取 Spark历史记录服务器中的"spark.driver.extraClassPath"属性 在环境"下.
  2. 将MySQL JAR文件复制到EMR集群中的每个节点.
  3. 将MySQL jar路径放在--driver-class-path参数的最前面,并添加到spark-submit命令,并将"spark.driver.extraClassPath"的值附加到该路径上
  1. I got the default driver class path by getting the value of the "spark.driver.extraClassPath" property from the Spark History Server under "Environment".
  2. Copied the MySQL JAR file to each node in the EMR cluster.
  3. Put the MySQL jar path at the front of the --driver-class-path argument to the spark-submit command and append the value of "spark.driver.extraClassPath" to it

我的驱动程序类路径最终看起来像这样:

My driver class path ended up looking like this:

-驱动程序类路径/home/hadoop/jars/mysql-connector-java-5.1.35.jar:/etc/hadoop/conf:/usr/lib/hadoop/:/usr/lib/hadoop-hdfs/:/usr/lib/hadoop-mapreduce/:/usr/lib/hadoop-yarn/:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/* >

--driver-class-path /home/hadoop/jars/mysql-connector-java-5.1.35.jar:/etc/hadoop/conf:/usr/lib/hadoop/:/usr/lib/hadoop-hdfs/:/usr/lib/hadoop-mapreduce/:/usr/lib/hadoop-yarn/:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/*

这与使用Java和Spark 1.5.0的EMR 4.1一起使用. 我已经在Maven pom.xml中将MySQL JAR添加为依赖项

This worked with EMR 4.1 using Java with Spark 1.5.0. I had already added the MySQL JAR as a dependency in the Maven pom.xml

您可能还希望查看

You may also want to look at this answer as it seems like a cleaner solution. I haven't tried it myself.