Friday, July 29, 2016

Partition pruning is not happening for query with many in-lists(20+)


Partition pruning is not happening for query with more than 20 in-lists.
As a result, this query may fail with OutOfMemory.
For example:
SELECT something FROM dfs.`sometable` WHERE dir0 IN ('a', 'b','c',... ,'z' );
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.


Drill 1.6r1
Drill 1.8(Not GAed yet as this blog was written)

Root Cause:

Once the # of in-lists is larger than 20(including), this optimization will kick in to convert it to a join:
As a result, the partition pruning can not happen.
This means the query has to scan the full table including all partitions or all sub-directories.
If this table is too large or has too many partitions, this query may use up all the memory allocated to this drillbit.


Per DRILL-3710, a new parameter is introduced starting from latest Drill 1.6r1 release:
If the # of in-lists size is below this parameter, Drill will not convert those in-lists to a join. As a result, partition pruning will happen.
For example:
alter session set `planner.in_subquery_threshold` = 500;

Note: if the total number of partitions or the in-lists are large, the planner may need more memory than planner.memory_limit(256MB by default).
So if the query runs out of the planner memory, the partition pruning may not happen either.
The evidence for this issue is in the foreman drillbit.log. The details are described in below blog:
If you triggered this issue, please also increase planner.memory_limit to large enough.
For example, set it to 512MB:
alter session set `planner.memory_limit`=536870912;

No comments:

Post a Comment

Popular Posts