Drill Workshop -- Spill

Env:

Drill 1.0

Theory:

Hash aggregation and hash join are hash-based operations. Streaming aggregation and merge join are sort-based operations. Both hash-based and sort-based operations consume memory; however, currently, hash-based operations do not spill to disk as needed, but the sort-based operations do.

By default, spill happens in local directory /tmp/drill/spill on each node. (Set by drill.exec.sort.external.spill.directories boot option)

> select * from sys.boot where name in ('drill.exec.sort.external.spill.fs','drill.exec.sort.external.spill.directories');
+------+------+------+--------+---------+------------+----------+-----------+
| name | kind | type | status | num_val | string_val | bool_val | float_val |
+------+------+------+--------+---------+------------+----------+-----------+
| drill.exec.sort.external.spill.directories | STRING | BOOT | BOOT | null | [
    # jar:file:/opt/mapr/drill/drill-1.0.0/jars/drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar!/drill-module.conf: 145
    "/tmp/drill/spill"
] | null | null |
| drill.exec.sort.external.spill.fs | STRING | BOOT | BOOT | null | "file:///" | null | null |
+------+------+------+--------+---------+------------+----------+-----------+
2 rows selected (0.112 seconds)

Goal:

Test spill behavior of sort-based and hash-based operations.
Know how to change the spill directory.

Workshop:

1. Sort-based Operation test

Parameter "planner.memory.max_query_memory_per_node" sets the maximum estimate of memory for a query per node. If the estimate is too low, Drill re-plans the query without memory-constrained operators. By default, it is set to 2G per node. To make spilling happens easily, we will decrease it to 20MB in below case.
Create a 400MB CSV hive table named "passwords_csv_middle", and do sorting:

alter session set `planner.memory.max_query_memory_per_node`=20000000;
select count(*) from
(
select columns[5] from dfs.root.`user/hive/warehouse/passwords_csv_middle` order by columns[0], columns[1],columns[2]
);

By monitoring the spill directory, we can find out that the spill directory architecture is:
/tmp/drill/spill/<SQL ID>/<major_fragment_ID>/<minor_fragment_ID>/<operator_ID>/<Spill file and CRC file>
For example:

[root@h1 spill]# ls -altr 2aa86ed9-68e6-3345-ef5d-1a0a9bd06768/major_fragment_*/minor_fragment_*/operator*
2aa86ed9-68e6-3345-ef5d-1a0a9bd06768/major_fragment_2/minor_fragment_5/operator_2:
total 50052
drwxr-xr-x 3 mapr mapr     4096 May 16 18:49 ..
-rw-r--r-- 1 mapr mapr   151248 May 16 18:49 .0.crc
-rw-r--r-- 1 mapr mapr 19358710 May 16 18:49 0
drwxr-xr-x 2 mapr mapr     4096 May 16 18:49 .
-rw-r--r-- 1 mapr mapr   245444 May 16 18:49 .1.crc
-rw-r--r-- 1 mapr mapr 31415694 May 16 18:49 1

drillbit.log will show spilling for that SQL ID:

[root@h1 logs]# grep -i '2aa86ed9-68e6-3345-ef5d-1a0a9bd06768' drillbit.log  |grep -i spill |more
2015-05-16 18:49:16,377 [2aa86ed9-68e6-3345-ef5d-1a0a9bd06768:frag:2:1] INFO  o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to /tmp/drill/spill/2aa86ed9-68e6-3345-ef5d-1a0a9bd06768/major_fr
agment_2/minor_fragment_1/operator_2/0

2. Hash-based Operation test

Let's reduce DRILL_MAX_DIRECT_MEMORY from default 8G to 2G and run a huge hash join.

select count(*) from
(
select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, dfs.root.`user/hive/warehouse/passwords_csv_big` b
where a.columns[1]=b.columns[1]
);

The visualized plan contains "HashJoin":

Drill will not spill and it will fail with below error in drillbit.log:

Caused by: java.lang.OutOfMemoryError: Direct buffer memory

If so, we can disable memory hungry hash join by setting "planner.enable_hashjoin" to false:

alter session set `planner.enable_hashjoin`=false;

The query will probably become sort-based merge join:

3. How to change spill directory.

For MapR clusters, use MapReduce volumes or set up local volumes to use for spooling purposes.
3.1 Create a "spill" directory in "mapred" local volume for each node.

[root@h1 ~]# hadoop fs -ls /var/mapr/local/h1.poc.com/mapred
Found 2 items
drwxr-xr-x   - mapr mapr          0 2015-05-16 19:57 /var/mapr/local/h1.poc.com/mapred/drill
drwxr-xr-x   - mapr mapr          5 2015-05-05 22:41 /var/mapr/local/h1.poc.com/mapred/nodeManager

3.2 Modify drill-override.conf.

$ cat drill-override.conf
drill.exec: {
  cluster-id: "MyCluster-drillbits",
  zk.connect: "h2.poc.com:5181,h3.poc.com:5181,h4.poc.com:5181",
  sort.external.spill.directories: ["/var/mapr/local/h1.poc.com/mapred/drill"],
  sort.external.spill.fs: "maprfs:///"
}

Note: make sure drill.exec.sort.external.spill.directories is a List type instead of a String. Otherwise below error will show up when spilling happens:

Error: SYSTEM ERROR: com.typesafe.config.ConfigException$WrongType: drill-override.conf: 27: 
drill.exec.sort.external.spill.directories has type STRING rather than LIST

3.3 Run above sort-based operations to verify spilling happens in correct directory.

Reference:

Drill Configuration Options Introduction
Drill Planning and Execution Options
Drill Start-up Options

Saturday, May 16, 2015