自定义spark在纱线上运行时找不到蜂巢数据库

更新时间：2023-11-17 21:50:40

Hive jar必须位于Spark的类路径中才能启用Hive支持. 如果配置路径中不存在配置单元罐，则目录实现使用的是in-memory
在spark-shell中，我们可以通过执行

Hive jars need to be in the classpath of Spark for hive support to be enabled. if the hive jars are not present in classpath, the catalog implementation used is in-memory
In spark-shell we can confirm this by executing

sc.getConf.get("spark.sql.catalogImplementation")

这将给出in-memory

    def enableHiveSupport(): Builder = synchronized {
      if (hiveClassesArePresent) {
        config(CATALOG_IMPLEMENTATION.key, "hive")
      } else {
        throw new IllegalArgumentException(
          "Unable to instantiate SparkSession with Hive support because " +
            "Hive classes are not found.")
      }
    }

SparkSession.scala

  private[spark] def hiveClassesArePresent: Boolean = {
    try {
      Utils.classForName(HIVE_SESSION_STATE_BUILDER_CLASS_NAME)
      Utils.classForName("org.apache.hadoop.hive.conf.HiveConf")
      true
    } catch {
      case _: ClassNotFoundException | _: NoClassDefFoundError => false
    }
  }

如果不存在这些类，则不会启用Hive支持. 链接到

If the classes are not present, Hive support is not enabled. Link to the code where the above checks happen as part of Spark shell initialization.

在上面作为问题粘贴的代码中，SPARK_DIST_CLASSPATH仅填充了Hadoop类路径和缺少Hive jar的路径.

In the above code pasted as part of question, SPARK_DIST_CLASSPATH is populated only with the Hadoop classpath and the paths to Hive jars missing.

上一篇 : ：linux - shell脚本中怎么判断发行版？下一篇 : 在纱线上运行时，自定义 spark 找不到 hive 数据库

自定义spark在纱线上运行时找不到蜂巢数据库

相关阅读

推荐文章