Goal:
This article explains how to access Azure Open Dataset from Spark.
Env:
spark-3.1.1-bin-hadoop2.7
This article explains how to access Azure Open Dataset from Spark.
spark-3.1.1-bin-hadoop2.7
This article research on how Spark calculates the Decimal precision and scale using GPU or CPU mode.
Basically we will test Addition/Subtraction/Multiplication/Division/Modulo/Union in this post.
This article shares the steps on how to run Spark job with Rapids Accelerator using Spark Operator in a Kubernetes Cluster.
This article talked about the compatibility of Rapids Accelerator for Spark regarding parquet writing related to parameters spark.sql.legacy.parquet.datetimeRebaseModeInWrite etc.
This article digs into different types of SparkListenerEvent in Spark event log with some examples.
Understanding this can help us know how to pares Spark event log.
This article shows how to use latest version of Rapids Accelerator for Spark on EMR.
Currently the latest EMR 6.2 only ships with Rapids Accelerator 0.2.0 with cuDF 0.15 jar.
However as of today, the latest Rapids Accelerator is 0.4.1 with cuDF 0.18 jar.
Note: This is NOT official steps on enabling rapids+Spark on EMR, but just some technical research.
This article explains how to use NVIDIA Nsight Systems to profile a Spark on K8s job with Rapids Accelerator.
This is a follow-up blog after How to use NVIDIA Nsight Systems to profile a Spark job on Rapids Accelerator.
This article explains how to use NVIDIA Nsight Systems to profile a Spark job on Rapids Accelerator.
This article shares the steps to enable GpuKryoRegistrator on RAPIDS Accelerator for Spark.