Env:
Drill 1.0Theory:
Hash aggregation and hash join are hash-based operations. Streaming aggregation and merge join are sort-based operations. Both hash-based and sort-based operations consume memory; however, currently, hash-based operations do not spill to disk as needed, but the sort-based operations do.By default, spill happens in local directory /tmp/drill/spill on each node. (Set by drill.exec.sort.external.spill.directories boot option)
> select * from sys.boot where name in ('drill.exec.sort.external.spill.fs','drill.exec.sort.external.spill.directories');
+------+------+------+--------+---------+------------+----------+-----------+
| name | kind | type | status | num_val | string_val | bool_val | float_val |
+------+------+------+--------+---------+------------+----------+-----------+
| drill.exec.sort.external.spill.directories | STRING | BOOT | BOOT | null | [
# jar:file:/opt/mapr/drill/drill-1.0.0/jars/drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar!/drill-module.conf: 145
"/tmp/drill/spill"
] | null | null |
| drill.exec.sort.external.spill.fs | STRING | BOOT | BOOT | null | "file:///" | null | null |
+------+------+------+--------+---------+------------+----------+-----------+
2 rows selected (0.112 seconds)
Goal:
Test spill behavior of sort-based and hash-based operations.Know how to change the spill directory.
Workshop:
1. Sort-based Operation test
Parameter "planner.memory.max_query_memory_per_node" sets the maximum estimate of memory for a query per node. If the estimate is too low, Drill re-plans the query without memory-constrained operators. By default, it is set to 2G per node. To make spilling happens easily, we will decrease it to 20MB in below case.Create a 400MB CSV hive table named "passwords_csv_middle", and do sorting:
alter session set `planner.memory.max_query_memory_per_node`=20000000; select count(*) from ( select columns[5] from dfs.root.`user/hive/warehouse/passwords_csv_middle` order by columns[0], columns[1],columns[2] );By monitoring the spill directory, we can find out that the spill directory architecture is:
/tmp/drill/spill/<SQL ID>/<major_fragment_ID>/<minor_fragment_ID>/<operator_ID>/<Spill file and CRC file>
For example:
[root@h1 spill]# ls -altr 2aa86ed9-68e6-3345-ef5d-1a0a9bd06768/major_fragment_*/minor_fragment_*/operator* 2aa86ed9-68e6-3345-ef5d-1a0a9bd06768/major_fragment_2/minor_fragment_5/operator_2: total 50052 drwxr-xr-x 3 mapr mapr 4096 May 16 18:49 .. -rw-r--r-- 1 mapr mapr 151248 May 16 18:49 .0.crc -rw-r--r-- 1 mapr mapr 19358710 May 16 18:49 0 drwxr-xr-x 2 mapr mapr 4096 May 16 18:49 . -rw-r--r-- 1 mapr mapr 245444 May 16 18:49 .1.crc -rw-r--r-- 1 mapr mapr 31415694 May 16 18:49 1drillbit.log will show spilling for that SQL ID:
[root@h1 logs]# grep -i '2aa86ed9-68e6-3345-ef5d-1a0a9bd06768' drillbit.log |grep -i spill |more 2015-05-16 18:49:16,377 [2aa86ed9-68e6-3345-ef5d-1a0a9bd06768:frag:2:1] INFO o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to /tmp/drill/spill/2aa86ed9-68e6-3345-ef5d-1a0a9bd06768/major_fr agment_2/minor_fragment_1/operator_2/0
2. Hash-based Operation test
Let's reduce DRILL_MAX_DIRECT_MEMORY from default 8G to 2G and run a huge hash join.select count(*) from ( select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, dfs.root.`user/hive/warehouse/passwords_csv_big` b where a.columns[1]=b.columns[1] );The visualized plan contains "HashJoin":
Drill will not spill and it will fail with below error in drillbit.log:
Caused by: java.lang.OutOfMemoryError: Direct buffer memoryIf so, we can disable memory hungry hash join by setting "planner.enable_hashjoin" to false:
alter session set `planner.enable_hashjoin`=false;The query will probably become sort-based merge join:
3. How to change spill directory.
For MapR clusters, use MapReduce volumes or set up local volumes to use for spooling purposes.3.1 Create a "spill" directory in "mapred" local volume for each node.
[root@h1 ~]# hadoop fs -ls /var/mapr/local/h1.poc.com/mapred Found 2 items drwxr-xr-x - mapr mapr 0 2015-05-16 19:57 /var/mapr/local/h1.poc.com/mapred/drill drwxr-xr-x - mapr mapr 5 2015-05-05 22:41 /var/mapr/local/h1.poc.com/mapred/nodeManager3.2 Modify drill-override.conf.
$ cat drill-override.conf
drill.exec: {
cluster-id: "MyCluster-drillbits",
zk.connect: "h2.poc.com:5181,h3.poc.com:5181,h4.poc.com:5181",
sort.external.spill.directories: ["/var/mapr/local/h1.poc.com/mapred/drill"],
sort.external.spill.fs: "maprfs:///"
}
Note: make sure drill.exec.sort.external.spill.directories is a List type instead of a String. Otherwise below error will show up when spilling happens:Error: SYSTEM ERROR: com.typesafe.config.ConfigException$WrongType: drill-override.conf: 27: drill.exec.sort.external.spill.directories has type STRING rather than LIST3.3 Run above sort-based operations to verify spilling happens in correct directory.
Reference:
Drill Configuration Options IntroductionDrill Planning and Execution Options
Drill Start-up Options


9A3D75AD26
ReplyDeletekiralık hacker
hacker arıyorum
belek
kadriye
serik
DB4140A395
ReplyDeleteTakipçi Satın Al
Beğeni Hilesi
Film İzleme Siteleri
EF9BC1F1
ReplyDeletebatum esçort
kumluca esçort
esçort bayan kastamonu
İlkadım esçort
selçuk esçort
eskişehir esçort numaraları
osmangazi esçort
bodrum esçort
aksaray esçort