Sunday, April 4, 2021

How to enable GpuKryoRegistrator on RAPIDS Accelerator for Spark


This article shares the steps to enable GpuKryoRegistrator on RAPIDS Accelerator for Spark.


Spark 3.1.1

RAPIDS Accelerator for Apache Spark 0.4.1


As mentioned in Spark Tuning Doc:

  • Java serialization: By default, Spark serializes objects using Java’s ObjectOutputStream framework, and can work with any class you create that implements You can also control the performance of your serialization more closely by extending Java serialization is flexible but often quite slow, and leads to large serialized formats for many classes.
  • Kryo serialization: Spark can also use the Kryo library (version 4) to serialize objects more quickly. Kryo is significantly faster and more compact than Java serialization (often as much as 10x), but does not support all Serializable types and requires you to register the classes you’ll use in the program in advance for best performance.

In Rapids Accelerator, it also has a class named com.nvidia.spark.rapids.GpuKryoRegistrator to use Kryo to register below classes in org.apache.spark.sql.rapids.execution.GpuBroadcastExchangeExec :

  • SerializeConcatHostBuffersDeserializeBatch
  • SerializeBatchDeserializeHostBuffer 

How to enable?

Set below 2 parameters(eg, in spark-defaults.conf):

spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator com.nvidia.spark.rapids.GpuKryoRegistrator

Common Issues

This is a common issue in Kryo serialization : Buffer overflow.

For example, when running Q7 of TPCDS/NDS, it may fail with:

Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 636
at java.base/$BlockDataOutputStream.write(
at java.base/
at java.base/
at java.base/
at ai.rapids.cudf.JCudfSerialization$DataOutputStreamWriter.copyDataFrom(
at ai.rapids.cudf.JCudfSerialization$DataWriter.copyDataFrom(
at ai.rapids.cudf.JCudfSerialization.copySlicedAndPad(
at ai.rapids.cudf.JCudfSerialization.copySlicedOffsets(
at ai.rapids.cudf.JCudfSerialization.writeSliced(
at ai.rapids.cudf.JCudfSerialization.writeSliced(
at ai.rapids.cudf.JCudfSerialization.writeToStream(
at org.apache.spark.sql.rapids.execution.SerializeBatchDeserializeHostBuffer.writeObject(GpuBroadcastExchangeExec.scala:153)
at jdk.internal.reflect.GeneratedMethodAccessor91.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(
at java.base/java.lang.reflect.Method.invoke(
at java.base/
at java.base/
at java.base/
at java.base/
at java.base/
at com.esotericsoftware.kryo.serializers.JavaSerializer.write(
... 9 more

The fix is to increase the spark.kryoserializer.buffer.max from default 64M to bigger, say 512M:

spark.kryoserializer.buffer.max 512m




No comments:

Post a Comment

Popular Posts