且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

自定义spark在纱线上运行时找不到蜂巢数据库

更新时间:2023-11-17 21:50:40

Hive jar必须位于Spark的类路径中才能启用Hive支持. 如果配置路径中不存在配置单元罐,则目录实现 使用的是in-memory
在spark-shell中,我们可以通过执行

Hive jars need to be in the classpath of Spark for hive support to be enabled. if the hive jars are not present in classpath, the catalog implementation used is in-memory
In spark-shell we can confirm this by executing

sc.getConf.get("spark.sql.catalogImplementation") 

这将给出in-memory

    def enableHiveSupport(): Builder = synchronized {
      if (hiveClassesArePresent) {
        config(CATALOG_IMPLEMENTATION.key, "hive")
      } else {
        throw new IllegalArgumentException(
          "Unable to instantiate SparkSession with Hive support because " +
            "Hive classes are not found.")
      }
    }

SparkSession.scala

  private[spark] def hiveClassesArePresent: Boolean = {
    try {
      Utils.classForName(HIVE_SESSION_STATE_BUILDER_CLASS_NAME)
      Utils.classForName("org.apache.hadoop.hive.conf.HiveConf")
      true
    } catch {
      case _: ClassNotFoundException | _: NoClassDefFoundError => false
    }
  }

如果不存在这些类,则不会启用Hive支持. 链接到

If the classes are not present, Hive support is not enabled. Link to the code where the above checks happen as part of Spark shell initialization.

在上面作为问题粘贴的代码中,SPARK_DIST_CLASSPATH仅填充了Hadoop类路径和缺少Hive jar的路径.

In the above code pasted as part of question, SPARK_DIST_CLASSPATH is populated only with the Hadoop classpath and the paths to Hive jars missing.