Sunday, May 17, 2015

Drill Workshop -- Control Query Parallelization

Env:

Drill 1.0

Theory:

  • planner.slice_target: Minimum amount of estimated records to work within one fragment before applying additional parallelization.
    Default: 100000
  • planner.width.max_per_node: Maximum number of threads for any major fragment of a query per node.
    Default: number of active drillbits (typically one per node) * number of cores per node * 0.7
  • planner.width.max_per_query: Maximum number of threads for any major fragment of a query.
    Default: 1000

Goal:

Know how to adjust the parallelization of a given query.

Workshop:

This workshop uses below SQL to explain:
select a.* from dfs.root.`user/hive/warehouse/passwords_csv` a, dfs.root.`user/hive/warehouse/passwords_csv_2` b
where a.columns[1]=b.columns[1] order by a.columns[3] desc limit 5;
Its SQL plan is in Drill Workshop -- Understanding SQL plan.

1. Check current value of the configurations.

> select * from sys.options where name in ('planner.slice_target','planner.width.max_per_node','planner.width.max_per_query');
+------------------------------+-------+---------+----------+----------+-------------+-----------+------------+
|             name             | kind  |  type   |  status  | num_val  | string_val  | bool_val  | float_val  |
+------------------------------+-------+---------+----------+----------+-------------+-----------+------------+
| planner.slice_target         | LONG  | SYSTEM  | DEFAULT  | 100000   | null        | null      | null       |
| planner.width.max_per_node   | LONG  | SYSTEM  | DEFAULT  | 6        | null        | null      | null       |
| planner.width.max_per_query  | LONG  | SYSTEM  | DEFAULT  | 1000     | null        | null      | null       |
+------------------------------+-------+---------+----------+----------+-------------+-----------+------------+

2. planner.slice_target

For example, below physical plan shows the statistics for Major Fragment 03 after it receives all the rows from upstream fragment:
03-08 HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(ANY ITEM, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1537784.0, cumulative cost = {7688920.0 rows, 3.9982384E7 cpu, 0.0 io, 1.2597526528E10 network, 0.0 memory}, id = 5122
The estimated row count at this step is 1537784.0, since the default planner.slice_target=100000, the number of slices(or Minor Fragments) of this Major Fragment is:
ceil(1537784.0/100000)=16.
Now let's increase planner.slice_target to 300000:
alter session set `planner.slice_target`= 300000;
After that, the number of Minor Fragments of Major Fragment 03 becomes:
ceil(1537784.0/300000)=6.

3. planner.width.max_per_node

Assume planner.slice_target is default 100000 , planner.width.max_per_node is default 6,  this is a 3-nodes cluster. Currently Major Fragment 03 has 16 Minor Fragments as below:
As we can see, each of above 3 nodes has about 5~6 Minor Fragments.
Now let's decrease planner.width.max_per_node from 6 to 3, and re-run the SQL.
alter session set `planner.width.max_per_node`= 3;
After that, since each node can only has 3 Minor Fragments for any Major Fragment, the Major Fragment 03 can only has 9 Minor Fragments as below:

4. planner.width.max_per_query

Assume Major Fragment 03 has 16 Minor Fragments in the beginning, now let's reduce planner.width.max_per_query from 1000 to 15:
alter session set `planner.width.max_per_query`= 15;
Then the Major Fragment 03 can only have 15 Minor Fragments:
Note: If a query is very complex and has many Major&Minor Fragments, it may use up all resources like CPU on all nodes. You may want to use above 3 parameters to reduce the parallelization.

Reference:

Youtube: Drill Configuration Options (33:08) 
Planning and Execution Options
Planning and Execution Options(on Drill Wiki)

2 comments:

  1. Thank you very much for sharing this . It helped me a lot

    ReplyDelete
  2. Thank you very much for sharing this . It helped me a lot

    ReplyDelete

Popular Posts