且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Spark SQL RDD加载到pyspark中,但不加载到spark-submit中:"JDBCRDD:已关闭连接"

更新时间:2022-06-18 03:51:31

使用spark-submit时,应将jar提供给执行者.

When using spark-submit you should supply the jar to the executors.

中所述spark 2.1 JDBC文档:

要开始使用,您将需要包括JDBC驱动程序 spark类路径上的特定数据库.例如,连接到 在Spark Shell中使用postgres,您可以运行以下命令:

To get started you will need to include the JDBC driver for you particular database on the spark classpath. For example, to connect to postgres from the Spark Shell you would run the following command:

bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar

注意:spark-submit命令应该与此相同

Note: The same should be for spark-submit command

问题排查

JDBC驱动程序类对于原始类加载器必须是可见的 在客户端会话和所有执行者上.这是因为Java DriverManager类执行安全检查,导致其忽略 原始类加载器不可见的所有驱动程序 打开连接. 一种方便的方法是修改 在所有工作节点上 compute_classpath.sh包括驱动程序JAR.

The JDBC driver class must be visible to the primordial class loader on the client session and on all executors. This is because Java’s DriverManager class does a security check that results in it ignoring all drivers not visible to the primordial class loader when one goes to open a connection. One convenient way to do this is to modify compute_classpath.sh on all worker nodes to include your driver JARs.