Thursday, November 6, 2014

Drill commands cheat sheet

| Command Part

1. Restart drill
drillbit.sh restart
maprcli node services -name drill-bits -action restart -nodes <node IP addresses separated by a space> 
Or
maprcli node services -name drill-bits -action restart -filter csvc=="drill-bits"
2. Sqlline
sqlline -u jdbc:drill:zk=<zk1host>:<port>,<zk2host>:<port>,<zk3host>:<port>
3. Check SQL IDs in jstack
$ clush -a "jstack \`jps -m|grep -i drill|cut -d' ' -f1\`|grep tid|grep frag|cut -d ':' -f1|sort|uniq -c"|sort
node1:       3 "2939d224-6b1f-3fda-5a42-4b741e5283a6
node2:       3 "2939d224-6b1f-3fda-5a42-4b741e5283a6
node3:       4 "2939d224-6b1f-3fda-5a42-4b741e5283a6
...
4. Check memory consumption of drillbit
$ clush -aB "ps auwwxx|grep -i drill|grep -v grep|grep -v internal_start|awk '{print \$6}'"
---------------
node1
---------------
25761828
---------------
node2
---------------
25417036
---------------
5. Take jstack on all drillbits and collect them
clush -a "jstack \`jps -m|grep -i drill|cut -d' ' -f1\` > /home/mapr/hao/jstack_drillbit.out"
mkdir /home/mapr/hao/jstackall
clush -a --rcopy /home/mapr/hao/jstack_drillbit.out --dest /home/mapr/hao/jstackall

| SQL Part

1. Verify drillbits
select * from sys.drillbits;
2. Schema information
select * from INFORMATION_SCHEMA.SCHEMATA;
3. Explain plan
!set maxwidth 10000
explain plan for <query> ;
explain plan without implementation for <query> ;
explain plan including all attributes for <query>;
4. System options
SELECT * FROM sys.options WHERE type='BOOT';
5. Record output to a file
!record /tmp/abc.txt
!record
6. Enable verbose error mode at system/session level
alter system set `exec.errors.verbose`=true;
7. Store as json file for CTAS.
alter session set `store.format`='json';
8. Enable compiler debug.
ALTER SESSION SET `exec.java_compiler_debug` = true;
9. Decrease store.parquet.vector_fill_check_threshold in case of too large parquet row.
alter session set `store.parquet.vector_fill_check_threshold` = 2; 
10. Get drill commit ID.
select commit_id from `sys`.`version`;
11. UTF16 character set in query text.
select columns[2] from `chinese.csv` where columns[0]= _UTF16'北京';
12. Enable decimal support.
alter session set `planner.enable_decimal_data_type`=true;
13. Disable hash based on a single key. 
alter session set `planner.enable_hash_single_key` = false ;
14. Increase plan memory component.
alter session set `planner.memory_limit` = 1073741824;

| Function Part

1.  Current unix time & Convert unix time to timestamp.
Current unix time:
unix_timestamp()
Convert unix time to timestamp:
select from_unixtime(unix_timestamp()) from `sys`.`version`;
+----------------------+
|        EXPR$0        |
+----------------------+
| 2015-05-29 19:39:43  |
+----------------------+
1 row selected (0.082 seconds)
Or:
select TO_TIMESTAMP(unix_timestamp()) from `sys`.`version`;
+------------------------+
|         EXPR$0         |
+------------------------+
| 2015-05-29 20:30:41.0  |
+------------------------+
1 row selected (0.081 seconds)
2. Flatten -- to generate each row for each element of one array.
$ cat flatten1.json
{"fixed_column":"abc", "list_column":[{"id1":"1","name":"zhu"},{"id1":"2","name":"hao"}]}

0: jdbc:drill:zk=local>  select t.`fixed_column` as fixed_column, flatten(t.`list_column`)  from dfs.tmp.`drilltest/flatten1.json` as t;
+--------------+------------+
| fixed_column |   EXPR$1   |
+--------------+------------+
| abc          | {"id1":"1","name":"zhu"} |
| abc          | {"id1":"2","name":"hao"} |
+--------------+------------+
2 rows selected (0.165 seconds)
3. kvgen -- to convert unions/maps to lists of key-value pairs.
test.json:
{"data":
   {"id":1,"id2":2,"id3":3}
}

select kvgen(tab.data) from `test.json` tab limit 1;
+------------+
|   EXPR$0   |
+------------+
| [{"key":"id","value":1},{"key":"id2","value":2},{"key":"id3","value":3}] |
+------------+
1 row selected (0.153 seconds)

with tmp as (
select kvgen(tab.data)[0] as newdata  from `test.json` tab limit 1
)
select tmp.newdata.`key`,tmp.newdata.`value` from tmp;

+------------+------------+
|   EXPR$0   |   EXPR$1   |
+------------+------------+
| id         | 1          |
+------------+------------+
1 row selected (0.09 seconds)
4. repeated_count and repeated_contains -- for array.
a.json:

{"data": ["apple","orange","abc"]}

select repeated_count(data) from  dfs.tmp.`/drilltest/a.json`;  
+------------+
|   EXPR$0   |
+------------+
| 3          |
+------------+

select repeated_contains(data,'apple') from  dfs.tmp.`/drilltest/a.json`;  
+------------+
|   EXPR$0   |
+------------+
| true       |
+------------+

select repeated_contains(data,'nothing') from  dfs.tmp.`/drilltest/a.json`;  
+------------+
|   EXPR$0   |
+------------+
| false      |
+------------+
5. byte_substr
In hbase shell, create a hbase or maprdb table:
create '/testtable','cf'
put '/testtable','ABCD_9223370655563575807','cf:c','abc'
Then we can get the substring of rowkey in Drill:
select convert_from(byte_substr(row_key,1,4),'UTF8') from dfs.`/testtable`;
+------------+
|   EXPR$0   |
+------------+
| ABCD       |
+------------+
1 row selected (0.293 seconds)
 select convert_from(byte_substr(row_key,6,length(row_key)),'UTF8') from dfs.`/testtable`;
+------------+
|   EXPR$0   |
+------------+
| 9223370655563575807 |
+------------+
1 row selected (0.263 seconds)
6. left and right
> select `left`('blahblah',2) from sys.options limit 1;
+------------+
|   EXPR$0   |
+------------+
| bl         |
+------------+
1 row selected (0.248 seconds)
> select `right`('blahblah',2) from sys.options limit 1;
+------------+
|   EXPR$0   |
+------------+
| ah         |
+------------+
1 row selected (0.201 seconds)

| Configuration Part

1. Memory Leak debug 
drill-override.conf:
debug.error_on_leak: true
2. PStore location
> select * from sys.options where name like  '%store.provider%';
+------------+------------+------------+------------+------------+------------+------------+
|    name    |    kind    |    type    |  num_val   | string_val |  bool_val  | float_val  |
+------------+------------+------------+------------+------------+------------+------------+
| drill.exec.sys.store.provider.hbase.config."hbase.zookeeper.property.clientPort" | LONG       | BOOT       | 2181       | null       | null       | null       |
| drill.exec.sys.store.provider.mongo.connection | null       | BOOT       | null       | "mongodb://localhost:27017/?connectTimeoutMS=60000&maxPoolSize=1000&safe=true" | null       | null       |
| drill.exec.sys.store.provider.hbase.config."hbase.zookeeper.quorum" | null       | BOOT       | null       | "localhost" | null       | null       |
| drill.exec.sys.store.provider.local.write | BOOLEAN    | BOOT       | null       | null       | true       | null       |
| drill.exec.sys.store.provider.hbase.table | null       | BOOT       | null       | "drill_store" | null       | null       |
| drill.exec.sys.store.provider.local.path | null       | BOOT       | null       | "/tmp/drill" | null       | null       |
| drill.exec.sys.store.provider.class | null       | BOOT       | null       | "org.apache.drill.exec.store.sys.zk.ZkPStoreProvider" | null       | null       |
+------------+------------+------------+------------+------------+------------+------------+
Note: Starting from Drill 1.0, boot options are moved to sys.boot.
3. Sort and spilling.
> select * from sys.options where name like  '%exec.sort%';
+------------+------------+------------+------------+------------+------------+------------+
|    name    |    kind    |    type    |  num_val   | string_val |  bool_val  | float_val  |
+------------+------------+------------+------------+------------+------------+------------+
| drill.exec.sort.external.spill.fs | null       | BOOT       | null       | "file:///" | null       | null       |
| drill.exec.sort.purge.threshold | LONG       | BOOT       | 100        | null       | null       | null       |
| drill.exec.sort.external.spill.threshold | LONG       | BOOT       | 200        | null       | null       | null       |
| drill.exec.sort.external.spill.batch.size | LONG       | BOOT       | 4000       | null       | null       | null       |
| drill.exec.sort.external.spill.group.size | LONG       | BOOT       | 100        | null       | null       | null       |
| drill.exec.sort.external.spill.directories | null       | BOOT       | null       | [
    # jar:file:/opt/mapr/drill/drill-0.6.0/jars/drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar!/drill-module.conf: 134
    "/tmp/drill/spill"
] | null       | null       |
| drill.exec.sort.external.batch.size | LONG       | BOOT       | 4000       | null       | null       | null       |
+------------+------------+------------+------------+------------+------------+------------+
Note: Starting from Drill 1.0, boot options are moved to sys.boot. 
4. Planner
> select * from sys.options where name like  '%planner%';
+------------+------------+------------+------------+------------+------------+------------+
|    name    |    kind    |    type    |  num_val   | string_val |  bool_val  | float_val  |
+------------+------------+------------+------------+------------+------------+------------+
| planner.memory.max_query_memory_per_node | LONG       | SYSTEM     | 2048       | null       | null       | null       |
| planner.join.row_count_estimate_factor | DOUBLE     | SYSTEM     | null       | null       | null       | 1.0        |
| planner.affinity_factor | DOUBLE     | SYSTEM     | null       | null       | null       | 1.2        |
| planner.disable_exchanges | BOOLEAN    | SYSTEM     | null       | null       | false      | null       |
| planner.enable_mergejoin | BOOLEAN    | SYSTEM     | null       | null       | true       | null       |
| planner.enable_broadcast_join | BOOLEAN    | SYSTEM     | null       | null       | true       | null       |
| planner.enable_hashagg | BOOLEAN    | SYSTEM     | null       | null       | true       | null       |
| planner.memory.average_field_width | LONG       | SYSTEM     | 8          | null       | null       | null       |
| planner.broadcast_threshold | LONG       | SYSTEM     | 1000000    | null       | null       | null       |
| planner.add_producer_consumer | BOOLEAN    | SYSTEM     | null       | null       | false      | null       |
| planner.producer_consumer_queue_size | LONG       | SYSTEM     | 10         | null       | null       | null       |
| planner.enable_hash_single_key | BOOLEAN    | SYSTEM     | null       | null       | true       | null       |
| planner.memory.hash_agg_table_factor | DOUBLE     | SYSTEM     | null       | null       | null       | 1.1        |
| planner.slice_target | LONG       | SYSTEM     | 100000     | null       | null       | null       |
| planner.width.max_per_node | LONG       | SYSTEM     | 2          | null       | null       | null       |
| planner.memory.enable_memory_estimation | BOOLEAN    | SYSTEM     | null       | null       | false      | null       |
| planner.enable_multiphase_agg | BOOLEAN    | SYSTEM     | null       | null       | true       | null       |
| planner.memory.non_blocking_operators_memory | LONG       | SYSTEM     | 64         | null       | null       | null       |
| planner.enable_streamagg | BOOLEAN    | SYSTEM     | null       | null       | true       | null       |
| planner.memory.hash_join_table_factor | DOUBLE     | SYSTEM     | null       | null       | null       | 1.1        |
| planner.width.max_per_query | LONG       | SYSTEM     | 1000       | null       | null       | null       |
| planner.enable_hashjoin | BOOLEAN    | SYSTEM     | null       | null       | true       | null       |
| planner.producer_consumer_queue_size | LONG       | SESSION    | 100000000000000000 | null       | null       | null       |
| planner.add_producer_consumer | BOOLEAN    | SESSION    | null       | null       | false      | null       |
+------------+------------+------------+------------+------------+------------+------------+
24 rows selected (0.219 seconds)
5. Query scheduling
> select * from sys.options where name like  '%exec.queue%';
+------------+------------+------------+------------+------------+------------+------------+
|    name    |    kind    |    type    |  num_val   | string_val |  bool_val  | float_val  |
+------------+------------+------------+------------+------------+------------+------------+
| exec.queue.timeout_millis | LONG       | SYSTEM     | 300000     | null       | null       | null       |
| exec.queue.small | LONG       | SYSTEM     | 100        | null       | null       | null       |
| exec.queue.threshold | LONG       | SYSTEM     | 30000000   | null       | null       | null       |
| exec.queue.large | LONG       | SYSTEM     | 10         | null       | null       | null       |
| exec.queue.enable | BOOLEAN    | SYSTEM     | null       | null       | false      | null       |
+------------+------------+------------+------------+------------+------------+------------+

==

No comments:

Post a Comment

Popular Posts