且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Spark Java:无法更改驱动程序内存

更新时间:2023-11-22 23:50:52

我怀疑您正在以客户端模式运行您的应用程序,然后 每个文档:

可以使用 spark 设置最大堆大小设置.司机.集群模式下的内存和客户端模式下的 --driver-memory 命令行选项.注意:在客户端模式下,此配置不能直接在您的应用程序中通过 SparkConf 设置,因为驱动程序 JVM 已经在此时启动.

在当前情况下,Spark 作业是从应用程序提交的,因此应用程序本身是一个驱动程序,它的内存按照 Java 应用程序的常规方式进行调节 - 通过 -Xmx 等.>

So, I have a spark standalone cluster with 16 worker nodes and one master node. I start the cluster with "sh start-all.sh" command from the master node in spark_home/conf folder. The master node has 32Gb Ram and 14 VCPUS, while I have 16Gb Ram and 8 VCPUS per node. I also have a spring application which, when it starts(with java -jar app.jar), it initializes the spark context. The spark-env.sh file is:

export SPARK_MASTER_HOST='192.168.100.17'
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=14000mb 
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_OPTS='-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=172800 -Dspark.worker.cleanup.appDataTtl=172800'

I do not have anything in spark-defaults.conf and the code for initializing the spark context programmatically is:

@Bean
public SparkSession sparksession() {
     SparkSession sp = SparkSession
             .builder()
    .master("spark://....")
    .config("spark.cassandra.connection.host","192.168.100......")
    .appName("biomet")
    .config("spark.driver.memory","20g")
    .config("spark.driver.maxResultSize", "10g")
    .config("spark.sql.shuffle.partitions",48) 
    .config("spark.executor.memory","7g") 
    .config("spark.sql.pivotMaxValues","50000") 
    .config("spark.sql.caseSensitive",true)
    .config("spark.executor.extraClassPath","/home/ubuntu/spark-2.4.3-bin-hadoop2.7/jars/guava-16.0.1.jar")
    .config("spark.hadoop.fs.s3a.access.key","...")
    .config("spark.hadoop.fs.s3a.secret.key","...")
             .getOrCreate();
     return sp;
 }

After all this the Environment tab of the Spark UI has spark.driver.maxResultSize 10g and spark.driver.memory 20g BUT the executors tab for the storage memory of the driver says 0.0 B / 4.3 GB.

(FYI: I used to have spark.driver.memory at 10g(programmatically set), and in the executor tab was saying 4.3Gb, but now it seems I cannot change it. But I am thinking that even if when I had it 10g, wasn't it suppose to give me more than 4.3Gb?!)

How can I change the driver memory? I tried setting it from spark-defaults.conf but nothing changed. Even if I do not set at all the driver memory(or set it to smaller than 4.3Gb) it still says 4.3Gb in executors tab.

I suspect that you're running your application in the client mode, then per documentation:

Maximum heap size settings can be set with spark. driver. memory in the cluster mode and through the --driver-memory command line option in the client mode. Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point.

In current case, the Spark job is submitted from the application, so the application itself is a driver, and its memory is regulated as usual for Java applications - via -Xmx, etc.