Thursday, September 15, 2016

Using Spark job to upload files to AWS S3 with Server Side Encryption enabled

Goal:

This article shows an example java code for:
Using Spark job to upload files to AWS S3 with Server Side Encryption enabled

Wednesday, September 14, 2016

Job fails with "AWS authentication requires a valid Date or x-amz-date header"

Symptom:

Job fails with error "AWS authentication requires a valid Date or x-amz-date header".
For example:
# java UploadObjectSingleOperation
Uploading a new object to S3 from a file

Caught an AmazonServiceException, which means your request made it to Amazon S3, but was rejected with an error response for some reason.
Error Message:    AWS authentication requires a valid Date or x-amz-date header (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 6XXXXXXXXXXXXXX)
HTTP Status Code: 403
AWS Error Code:   AccessDenied
Error Type:       Client
Request ID:       6XXXXXXXXXXXXXX

Tuesday, September 13, 2016

Thursday, September 8, 2016

How to use Cgroups with YARN to limit the CPU utilization

Goal:

YARN parameters like mapreduce.map.cpu.vcores/mapreduce.reduce.cpu.vcores can not hard limit the CPU utilization.
This article explains how to configure YARN to use Control Groups (Cgroups) when you want to limit and monitor the CPU resources that are available to process YARN containers on a node.

Friday, September 2, 2016

How to override the Hive compression algorithm set in Hive

Goal:

Hive users may set different customized "mapred.output.compression.codec"(same as "mapreduce.output.fileoutputformat.compress.codec") in Hive Cli, Beeline, hive-site.xml or even Hive script files.  There could be thousands of such Hive script files.
This article explains how to override "mapred.output.compression.codec" globally without modifying each script file one by one.

Quickstart for SQL workbench connecting to Apache Drill cluster

Goal:

This is a quick start for using SQL workbench connecting to Apache Drill cluster.

Tuesday, August 30, 2016

Friday, July 29, 2016

Partition pruning is not happening for query with many in-lists(20+)

Symptom:

Partition pruning is not happening for query with more than 20 in-lists.
As a result, this query may fail with OutOfMemory.
For example:
SELECT something FROM dfs.`sometable` WHERE dir0 IN ('a', 'b','c',... ,'z' );
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.

Tuesday, July 12, 2016

Memory allocation for Oozie Launcher job

Goal:

This article explains the configuration parameters for Oozie Launcher job.

Monday, July 11, 2016

How to troubleshoot Oozie Hive action

Goal:

This article explains the troubleshooting methodology for Oozie Hive action for newcomers. 

Popular Posts