Tuesday, April 26, 2016

How to use MapR local volume as the spill directory for Apache Drill

Goal:

How to use MapR local volume as the spill directory for Apache Drill.

Env:

Apache Drill on MapR

Solution:

By default, each drillbit uses its local disk "/tmp" as the spill directory.
However if the local disk is not large enough for certain huge query which requires lots of spilling space, another choice is to use MapR local volume as the spill directory.
Of course, the disks used by MapR local volume should be large enough.

1. Create a MapR local volume for each node.

MapR local volume is just a volume limited by its topology to reside only on its own node.
Here is one example of local volume created by MapR for MapReduce jobs.
Here I have 3 nodes with “hostname -f" outputs shown as below:
v1.poc.com
v2.poc.com
v3.poc.com
Assume on each node, I have created the local volumes with below path:
/tmp/v1.poc.com
/tmp/v2.poc.com
/tmp/v3.poc.com

2. Add an environment variable to drill-env.sh on all nodes.

export DRILL_LOCALHOST=`hostname -f`

3. Add the configurations for spill directory in drill-override.conf on all nodes.

Sample is:
drill.exec: {
  cluster-id: "my_cluster_com-drillbits",
  zk.connect: "v1.poc.com:5181,v2.poc.com:5181,v3.poc.com:5181",
  sort.external.spill.directories: ["/tmp/"${DRILL_LOCALHOST}],
  sort.external.spill.fs: "maprfs:///"
}

4. Restart all drillbits

maprcli node services -name drill-bits -action restart -filter csvc=="drill-bits"

5. Check the configurations

> select * from sys.drillbits where `current`=true;
+-------------+------------+---------------+------------+----------+
|  hostname   | user_port  | control_port  | data_port  | current  |
+-------------+------------+---------------+------------+----------+
| v1.poc.com  | 31010      | 31011         | 31012      | true     |
+-------------+------------+---------------+------------+----------+
1 row selected (1.132 seconds)
>  select name,string_val from sys.boot where name in ('drill.exec.sort.external.spill.fs','drill.exec.sort.external.spill.directories');
+---------------------------------------------+-------------------------------------------------------------------------------------------+
|                    name                     |                                        string_val                                         |
+---------------------------------------------+-------------------------------------------------------------------------------------------+
| drill.exec.sort.external.spill.directories  | [
    # merge of drill-override.conf: 27,env var DRILL_LOCALHOST
    "/tmp/v1.poc.com"
]  |
| drill.exec.sort.external.spill.fs           | "maprfs:///"                                                                              |
+---------------------------------------------+-------------------------------------------------------------------------------------------+
2 rows selected (0.991 seconds)


No comments:

Post a Comment

Popular Posts