更新时间:2023-11-17 21:50:40
Hive jar必须位于Spark的类路径中才能启用Hive支持.
如果配置路径中不存在配置单元罐,则目录实现
使用的是in-memory
在spark-shell中,我们可以通过执行
Hive jars need to be in the classpath of Spark for hive support to be enabled.
if the hive jars are not present in classpath, the catalog implementation
used is in-memory
In spark-shell we can confirm this by executing
sc.getConf.get("spark.sql.catalogImplementation")
这将给出in-memory
def enableHiveSupport(): Builder = synchronized {
if (hiveClassesArePresent) {
config(CATALOG_IMPLEMENTATION.key, "hive")
} else {
throw new IllegalArgumentException(
"Unable to instantiate SparkSession with Hive support because " +
"Hive classes are not found.")
}
}
private[spark] def hiveClassesArePresent: Boolean = {
try {
Utils.classForName(HIVE_SESSION_STATE_BUILDER_CLASS_NAME)
Utils.classForName("org.apache.hadoop.hive.conf.HiveConf")
true
} catch {
case _: ClassNotFoundException | _: NoClassDefFoundError => false
}
}
If the classes are not present, Hive support is not enabled. Link to the code where the above checks happen as part of Spark shell initialization.
在上面作为问题粘贴的代码中,SPARK_DIST_CLASSPATH
仅填充了Hadoop类路径和缺少Hive jar的路径.
In the above code pasted as part of question, SPARK_DIST_CLASSPATH
is populated only with the Hadoop classpath and the paths to Hive jars missing.