Friday, November 4, 2016

How to troubleshoot Hive local task in a separate JVM process

Goal:

How to troubleshoot Hive local task in a separate JVM process.
Especially when troubleshooting issues of Map Join's local task, we may want to put this local task into a separate JVM process instead of in the Hive CLI process.

Friday, October 7, 2016

How to use Drill to parse ResourceManager Rest API results

Goal:

ResourceManager REST API provides all detailed information regarding YARN applications, metrics of YARN cluster, etc.
This article provides a simple demo on how to use Drill to query the result of the REST APIs.
One example use case is to show the largest YARN applications which are currently running.

Friday, September 30, 2016

Thursday, September 15, 2016

Using Spark job to upload files to AWS S3 with Server Side Encryption enabled

Goal:

This article shows an example java code for:
Using Spark job to upload files to AWS S3 with Server Side Encryption enabled

Wednesday, September 14, 2016

Job fails with "AWS authentication requires a valid Date or x-amz-date header"

Symptom:

Job fails with error "AWS authentication requires a valid Date or x-amz-date header".
For example:
# java UploadObjectSingleOperation
Uploading a new object to S3 from a file

Caught an AmazonServiceException, which means your request made it to Amazon S3, but was rejected with an error response for some reason.
Error Message:    AWS authentication requires a valid Date or x-amz-date header (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 6XXXXXXXXXXXXXX)
HTTP Status Code: 403
AWS Error Code:   AccessDenied
Error Type:       Client
Request ID:       6XXXXXXXXXXXXXX

Tuesday, September 13, 2016

Thursday, September 8, 2016

How to use Cgroups with YARN to limit the CPU utilization

Goal:

YARN parameters like mapreduce.map.cpu.vcores/mapreduce.reduce.cpu.vcores can not hard limit the CPU utilization.
This article explains how to configure YARN to use Control Groups (Cgroups) when you want to limit and monitor the CPU resources that are available to process YARN containers on a node.

Friday, September 2, 2016

How to override the Hive compression algorithm set in Hive

Goal:

Hive users may set different customized "mapred.output.compression.codec"(same as "mapreduce.output.fileoutputformat.compress.codec") in Hive Cli, Beeline, hive-site.xml or even Hive script files.  There could be thousands of such Hive script files.
This article explains how to override "mapred.output.compression.codec" globally without modifying each script file one by one.

Quickstart for SQL workbench connecting to Apache Drill cluster

Goal:

This is a quick start for using SQL workbench connecting to Apache Drill cluster.

Tuesday, August 30, 2016

Popular Posts