Thursday, September 23, 2021

How to access Azure Open Dataset from Spark

Goal:

This article explains how to access Azure Open Dataset from Spark.

Env:

spark-3.1.1-bin-hadoop2.7

Monday, May 3, 2021

Understand Decimal precision and scale calculation in Spark using GPU or CPU mode

Goal:

This article research on how Spark calculates the Decimal precision and scale using GPU or CPU mode.

Basically we will test Addition/Subtraction/Multiplication/Division/Modulo/Union in this post.

Thursday, April 29, 2021

Tuesday, April 27, 2021

Rapids Accelerator compatibility related to spark.sql.legacy.parquet.datetimeRebaseModeInWrite

Goal:

This article talked about the compatibility of Rapids Accelerator for Spark regarding parquet writing related to parameters spark.sql.legacy.parquet.datetimeRebaseModeInWrite etc.

Tuesday, April 20, 2021

Spark Code -- Dig into SparkListenerEvent

Goal:

This article digs into different types of SparkListenerEvent in Spark event log with some examples. 

Understanding this can help us know how to pares Spark event log.

How to use latest version of Rapids Accelerator for Spark on EMR

Goal:

This article shows how to use latest version of Rapids Accelerator for Spark on EMR. 

Currently the latest EMR 6.2 only ships with Rapids Accelerator 0.2.0 with cuDF 0.15 jar.

However as of today, the latest Rapids Accelerator is 0.4.1 with cuDF 0.18 jar.

Note: This is NOT official steps on enabling rapids+Spark on EMR, but just some technical research.

Thursday, April 8, 2021

Sunday, April 4, 2021

Popular Posts