Friday, July 29, 2016

Partition pruning is not happening for query with many in-lists(20+)

Symptom:

Partition pruning is not happening for query with more than 20 in-lists.
As a result, this query may fail with OutOfMemory.
For example:
SELECT something FROM dfs.`sometable` WHERE dir0 IN ('a', 'b','c',... ,'z' );
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.

Tuesday, July 12, 2016

Memory allocation for Oozie Launcher job

Goal:

This article explains the configuration parameters for Oozie Launcher job.

Monday, July 11, 2016

How to troubleshoot Oozie Hive action

Goal:

This article explains the troubleshooting methodology for Oozie Hive action for newcomers. 

Thursday, July 7, 2016

Drill SQL types to Parquet logical types

Goal:

This article shows a mapping relationships between SQL data types and Parquet logical types when using Drill to create a parquet file.

Friday, July 1, 2016

Understanding Drill's znodes

Goal:

znodes in zookeeper for Apache Drill is the brain.
It has the most important the cluster level information.
This article walks through those information in the brain.

Wednesday, June 29, 2016

How to troubleshoot Yarn job failure issue

Goal:

This article explains the troubleshooting methodology for Yarn job failure issue for newcomers.

Tuesday, June 28, 2016

Drill query fails with error "Unable to allocate buffer of size 256 due to memory limit. Current allocation: 268435456"

Symptom:

Drill query fails with error message like below:
[<SQL_ID>:foreman] WARN  o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune partition.
org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer of size 256 due to memory limit. Current allocation: 268435456
This error always shows in the foreman drillbit.log during planning phase.
In above example, it fails when doing partition pruning in planning phase.

Friday, June 3, 2016

Drillbit Health Check script

Goal:

If drillbit gets hung, and queries may get stuck on that drillbit, this script can help you quickly do the health check for each drillbit.
Basically It will connect to each drillbit using JDBC and run a simple query.
If the script get stuck when connecting to one drillbit,  we need to troubleshoot by collecting jstack output and drillbit.log.

Wednesday, May 4, 2016

Drill Direct Scan optimization for count(*) queries on parquet files

Goal:

When running count(*) queries on parquet files, the physical plan may show below:
Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@331f5e71...
This article is to explain what does "org.apache.drill.exec.store.pojo.PojoRecordReader" mean.

Wednesday, April 27, 2016

MapR Stream Workshop 4: Cold Backup and Restore

Theory:

mapr exportstream and mapr importstream are used together to export data from MapR streams into binary sequence files, and then import the data from the binary sequence files into other MapR streams.
So we can use the 2 tools to do cold backup/restore.

After that, mapr diffstreams can be used to check the differences between the 2 streams.
mapr formatresult can be used to parse a sequence file generated by mapr diffstreams.

Popular Posts