Resource allocation configurations for Spark on Yarn

Env:

Spark 1.3.1
Hadoop 2.5.1
Spark on Yarn mode

Goal:

This article explains the resource allocation configurations for Spark on Yarn with examples.
It talks about the 2 modes: yarn-client and yarn-cluster.

Solution:

Spark can request 2 resources in YARN which are CPU and memory.

Note: Spark configurations for resource allocation are set in spark-defaults.conf with the name like spark.xx.xx. Some of the them have a corresponding flag for client tools spark-submit/spark-shell/pyspark, with the name like --xx-xx.
In the following forms, if the configuration has a corresponding flag for client tools, we put the flag after the configurations in parenthesis"()". For example:

spark.driver.cores 
(--driver-cores)

Now let's start.

1. yarn-client VS yarn-cluster mode

Per Spark documentation:

In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application.

2. Application Master(AM)

a. yarn-client

Configuration	Description	Default Value
spark.yarn.am.cores	Number of CPU cores used by AM	1
spark.yarn.am.memory	JAVA heap size of AM	512m
spark.yarn.am.memoryOverhead	The amount of off heap memory (in megabytes) to be allocated in AM.	AM memory * 0.07, with minimum of 384

Take below settings for example:

[root@h1 conf]# cat spark-defaults.conf |grep am
spark.yarn.am.cores     4
spark.yarn.am.memory 777m

By default, spark.yarn.am.memoryOverhead is AM memory * 0.07, with minimum of 384.
It means, if we set spark.yarn.am.memory to 777M, the actual AM container size would be 2G. Because 777+Max(384, 777 * 0.07) = 777+384 = 1161, and the default yarn.scheduler.minimum-allocation-mb=1024, so 2GB container will be allocated to AM.
As a result, a (2G, 4 Cores) AM container with JAVA heap size -Xmx777M is allocated:

Assigned container container_1432752481069_0129_01_000001 of capacity <memory:2048, vCores:4, disks:0.0>

b. yarn-cluster

Since in yarn-cluster mode, the Spark driver is inside YARN AM, below driver related configurations also control the resource allocation for AM.

Configuration	Description	Default Value
spark.driver.cores (--driver-cores)	Number of CPU cores used by AM	1
spark.driver.memory (--driver-memory)	JAVA heap size of AM	512m
spark.yarn.driver.memoryOverhead	The amount of off heap memory (in megabytes) to be allocated in AM.	AM memory * 0.07, with minimum of 384

Take below settings for example:

MASTER=yarn-cluster /opt/mapr/spark/spark-1.3.1/bin/spark-submit --class org.apache.spark.examples.SparkPi  \
--driver-memory 1665m \
--driver-cores 2 \
/opt/mapr/spark/spark-1.3.1/lib/spark-examples*.jar 1000

Because 1665+Max(384,1665*0.07)=1665+384=2049 > 2048(2G), so a 3G container will be allocated to AM.
As a result, a (3G, 2 Cores) AM container with JAVA heap size -Xmx1665M is allocated:

Assigned container container_1432752481069_0135_02_000001 of capacity <memory:3072, vCores:2, disks:0.0>

3. Containers for Spark executors

For Spark executors' resource, yarn-client and yarn-cluster modes use the same configurations.

Configuration	Description	Default Value
spark.executor.instances (--num-executors)	The number of executors.	2
spark.executor.cores (--executor-cores)	Number of CPU cores used by each executor.	1
spark.executor.memory (--executor-memory)	JAVA heap size of each executor.	512m
spark.yarn.executor.memoryOverhead	The amount of off heap memory (in megabytes) to be allocated per executor.	executorMemory * 0.07, with minimum of 384

Out of the box in spark-defaults.conf, spark.executor.memory is set to 2g.
Spark will start 2 (3G, 1 Core) executor containers with JAVA heap size -Xmx2048M:

Assigned container container_1432752481069_0140_01_000002 of capacity <memory:3072, vCores:1, disks:0.0>
Assigned container container_1432752481069_0140_01_000003 of capacity <memory:3072, vCores:1, disks:0.0>

However 1 core per executor means only 1 task can be running at any time for one executor.
In the case of broadcast join, the memory can be shared by multiple running tasks in the same executor if we increase the number of cores per executor.

Note: if dynamic resource allocation is enabled by setting spark.dynamicAllocation.enabled to true, Spark can scale the number of executors registered with this application up and down based on the workload. In this case, you do not need to specify spark.executor.instances manually.

Key takeaways:

Spark driver resource related configurations also control the YARN application master resource in yarn-cluster mode.
Be aware of the max(7%, 384m) overhead off heap memory when calculating the memory for executors.
The number of CPU cores per executor controls the number of concurrent tasks per executor.

Wednesday, June 10, 2015