Tuesday, March 9, 2021

Understanding RAPIDS Accelerator For Apache Spark parameter -- spark.rapids.memory.pinnedPool.size

Goal:

 This article explains the RAPIDS Accelerator For Apache Spark parameter -- spark.rapids.memory.pinnedPool.size.

Env:

Spark 3.0.2

RAPIDS Accelerator For Apache Spark 0.4

Solution:

1. Concept

As per the RAPIDS Accelerator For Apache Spark tuning guide, spark.rapids.memory.pinnedPool.size means Pinned memory refers to memory pages that the OS will keep in system RAM and will not relocate or swap to disk. 

So basically this is inside the Host memory instead of Device memory(GPU memory).

It is not included in JAVA heap memory either for the Spark executor.

The reason is shared in this blog How to Optimize Data Transfers in CUDA C/C++ :

Host (CPU) data allocations are pageable by default. 
The GPU cannot access data directly from pageable host memory,
so when a data transfer from pageable host memory to device memory is invoked,
the CUDA driver must first allocate a temporary page-locked, or “pinned”, host array,
copy the host data to the pinned array, and then transfer the data from the pinned array to device memory, as illustrated below.
...
As you can see in the figure, pinned memory is used as a staging area for transfers from the device to the host.
We can avoid the cost of the transfer between pageable and pinned host arrays by directly allocating our host arrays in pinned memory.

So basically it can help improve the data transfer performance between Host(CPU) and Device(GPU).

2. Test

Since it is outside JAVA heap memory, as per my test, it can be shown in "SHR"(Shared Memory Size) from top output.

In below test, I just launch a spark-shell with different settings for spark.executor.memory and spark.rapids.memory.pinnedPool.size, and do nothing.

This was done in a spark standalone cluster without YARN:

1. 
spark.executor.memory 10g
spark.rapids.memory.pinnedPool.size 8g


PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31714 xxxxx 20 0 0.183t 0.010t 8.484g S 0.0 8.1 0:20.06 /usr/lib/jvm/java-11-openjdk-amd64//bin/java -cp /home/xxxxx/spark/myspark/conf/:/home/xxxxx/spark/myspark/jars/* -Xmx10240M

2.
spark.executor.memory 10g
spark.rapids.memory.pinnedPool.size 2g

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32296 xxxxx 20 0 0.183t 4.188g 2.484g S 0.0 3.3 0:19.14 /usr/lib/jvm/java-11-openjdk-amd64//bin/java -cp /home/xxxxx/spark/myspark/conf/:/home/xxxxx/spark/myspark/jars/* -Xmx10240M

3.
spark.executor.memory 10g
spark.rapids.memory.pinnedPool.size 12g

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
528 xxxxx 20 0 0.183t 0.014t 0.012t S 6.3 11.3 0:20.95 /usr/lib/jvm/java-11-openjdk-amd64//bin/java -cp /home/xxxxx/spark/myspark/conf/:/home/xxxxx/spark/myspark/jars/* -Xmx10240M

So basically the pined memory will be accounted for both RES and SHR.

This made me think that if this is done in Spark on YARN cluster, will NodeManager's physical killer kill this container if the pinned pool size > executor memory size?

The answer is yes.

With below settings in a Spark on YARN cluster:

spark.executor.memory 4g
spark.rapids.memory.pinnedPool.size 8g

YARN will allocate a 5G container due to default spark.executor.memoryOverhead setting as I explained in previous blog.

Right after that, I can see NodeManager's physical killer killed this YARN container:

ERROR YarnScheduler: Lost executor 4 on nodeX: Container killed by YARN for exceeding physical memory limits. 7.7 GB of 5 GB physical memory used. Consider boosting spark.executor.memoryOverhead.

3. Key takeaways

As the tuning guide tells,  we need to account for the extra pinned memory in a Spark on YARN cluster by setting spark.executor.memoryOverhead large enough.

So spark.executor.memoryOverhead should be > spark.rapids.memory.pinnedPool.size + spark.memory.offHeap.size + default memoryOverhead which is max(spark.executor.memory * 0.07, 384MB).


No comments:

Post a Comment

Popular Posts