Impala SQL tunning tips

This article shows some tips for SQL tunning in Cloudera Impala.

1. Join order is listed in FROM clause.

Compare below 2 plans:

Query: explain select count(*) from passwords a,passwords2 b,passwords3 c where a.col0=b.col0 and b.col0=c.col0
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=4.06GB VCores=2                            |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.passwords, default.passwords2, default.passwords3                          |
|                                                                                    |
| 09:AGGREGATE [MERGE FINALIZE]                                                      |
| |  output: sum(count(*))                                                           |
| |                                                                                  |
| 08:EXCHANGE [PARTITION=UNPARTITIONED]                                              |
| |                                                                                  |
| 05:AGGREGATE                                                                       |
| |  output: count(*)                                                                |
| |                                                                                  |
| 04:HASH JOIN [INNER JOIN, BROADCAST]                                               |
| |  hash predicates: b.col0 = c.col0                                                |
| |                                                                                  |
| |--07:EXCHANGE [BROADCAST]                                                         |
| |  |                                                                               |
| |  02:SCAN HDFS [default.passwords3 c]                                             |
| |     partitions=1/1 size=10.42KB                                                  |
| |                                                                                  |
| 03:HASH JOIN [INNER JOIN, BROADCAST]                                               |
| |  hash predicates: a.col0 = b.col0                                                |
| |                                                                                  |
| |--06:EXCHANGE [BROADCAST]                                                         |
| |  |                                                                               |
| |  01:SCAN HDFS [default.passwords2 b]                                             |
| |     partitions=1/1 size=2.60KB                                                   |
| |                                                                                  |
| 00:SCAN HDFS [default.passwords a]                                                 |
|    partitions=1/1 size=10.42KB                                                     |
+------------------------------------------------------------------------------------+

Query: explain select count(*) from passwords2 b,passwords a,passwords3 c where a.col0=b.col0 and b.col0=c.col0
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=2.06GB VCores=2                            |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.passwords, default.passwords2, default.passwords3                          |
|                                                                                    |
| 09:AGGREGATE [MERGE FINALIZE]                                                      |
| |  output: sum(count(*))                                                           |
| |                                                                                  |
| 08:EXCHANGE [PARTITION=UNPARTITIONED]                                              |
| |                                                                                  |
| 05:AGGREGATE                                                                       |
| |  output: count(*)                                                                |
| |                                                                                  |
| 04:HASH JOIN [INNER JOIN, BROADCAST]                                               |
| |  hash predicates: b.col0 = c.col0                                                |
| |                                                                                  |
| |--07:EXCHANGE [BROADCAST]                                                         |
| |  |                                                                               |
| |  02:SCAN HDFS [default.passwords3 c]                                             |
| |     partitions=1/1 size=10.42KB                                                  |
| |                                                                                  |
| 03:HASH JOIN [INNER JOIN, BROADCAST]                                               |
| |  hash predicates: b.col0 = a.col0                                                |
| |                                                                                  |
| |--06:EXCHANGE [BROADCAST]                                                         |
| |  |                                                                               |
| |  01:SCAN HDFS [default.passwords a]                                              |
| |     partitions=1/1 size=10.42KB                                                  |
| |                                                                                  |
| 00:SCAN HDFS [default.passwords2 b]                                                |
|    partitions=1/1 size=2.60KB                                                      |
+------------------------------------------------------------------------------------+

So largest table(or fact table) should be put on most left side.

2. Hash Join types -- Broadcast VS Shuffle.

Broadcast

Default join type. As it name implies, the right table will be copied(broadcasted) to all nodes, left table is read locally.
Best for large table join small table. (For example, fact table join dimension table).

Shuffle

Both left and right tables will be hashed and shuffled based on join column, and each node will get 1/Nth of them.
Best for large table join large table.
We can use hint "[shuffle]" to force a shuffle join.
Compare below 2 plans:

Query: explain select count(*) from passwords a join passwords2 b where a.col0=b.col0
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=2.04GB VCores=2                            |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.passwords, default.passwords2                                              |
|                                                                                    |
| 06:AGGREGATE [MERGE FINALIZE]                                                      |
| |  output: sum(count(*))                                                           |
| |                                                                                  |
| 05:EXCHANGE [PARTITION=UNPARTITIONED]                                              |
| |                                                                                  |
| 03:AGGREGATE                                                                       |
| |  output: count(*)                                                                |
| |                                                                                  |
| 02:HASH JOIN [INNER JOIN, BROADCAST]                                               |
| |  hash predicates: a.col0 = b.col0                                                |
| |                                                                                  |
| |--04:EXCHANGE [BROADCAST]                                                         |
| |  |                                                                               |
| |  01:SCAN HDFS [default.passwords2 b]                                             |
| |     partitions=1/1 size=2.60KB                                                   |
| |                                                                                  |
| 00:SCAN HDFS [default.passwords a]                                                 |
|    partitions=1/1 size=10.42KB                                                     |
+------------------------------------------------------------------------------------+

Query: explain select count(*) from passwords a join [shuffle] passwords2 b where a.col0=b.col0
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=2.04GB VCores=2                            |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.passwords, default.passwords2                                              |
|                                                                                    |
| 07:AGGREGATE [MERGE FINALIZE]                                                      |
| |  output: sum(count(*))                                                           |
| |                                                                                  |
| 06:EXCHANGE [PARTITION=UNPARTITIONED]                                              |
| |                                                                                  |
| 03:AGGREGATE                                                                       |
| |  output: count(*)                                                                |
| |                                                                                  |
| 02:HASH JOIN [INNER JOIN, PARTITIONED]                                             |
| |  hash predicates: a.col0 = b.col0                                                |
| |                                                                                  |
| |--05:EXCHANGE [PARTITION=HASH(b.col0)]                                            |
| |  |                                                                               |
| |  01:SCAN HDFS [default.passwords2 b]                                             |
| |     partitions=1/1 size=2.60KB                                                   |
| |                                                                                  |
| 04:EXCHANGE [PARTITION=HASH(a.col0)]                                               |
| |                                                                                  |
| 00:SCAN HDFS [default.passwords a]                                                 |
|    partitions=1/1 size=10.42KB                                                     |
+------------------------------------------------------------------------------------+

Note: Similar to scenarios in Greenplum: 2 tables have the same distribution key, so that local join is done.

3. Partition Pruning

For example:

Query: explain select count(*) from logs where year='2013' and month='07' and day='28'
+------------------------------------------------------------------------+
| Explain String                                                         |
+------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=42.00MB VCores=1               |
|                                                                        |
| F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED]                            |
|   03:AGGREGATE [MERGE FINALIZE]                                        |
|   |  output: sum(count(*))                                             |
|   |  hosts=4 per-host-mem=unavailable                                  |
|   |  tuple-ids=1 row-size=8B cardinality=1                             |
|   |                                                                    |
|   02:EXCHANGE [PARTITION=UNPARTITIONED]                                |
|      hosts=4 per-host-mem=unavailable                                  |
|      tuple-ids=1 row-size=8B cardinality=1                             |
|                                                                        |
| F00:PLAN FRAGMENT [PARTITION=RANDOM]                                   |
|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, PARTITION=UNPARTITIONED] |
|   01:AGGREGATE                                                         |
|   |  output: count(*)                                                  |
|   |  hosts=4 per-host-mem=10.00MB                                      |
|   |  tuple-ids=1 row-size=8B cardinality=1                             |
|   |                                                                    |
|   00:SCAN HDFS [external_partitions.logs, PARTITION=RANDOM]            |
|      partitions=2/4 size=30B                                           |
|      table stats: 4 rows total                                         |
|      column stats: all                                                 |
|      hosts=4 per-host-mem=32.00MB                                      |
|      tuple-ids=0 row-size=45B cardinality=2                            |
+------------------------------------------------------------------------+

Note: Greenplum supports dynamic partition pruning.

4. Impala does not spill to disks.

Impala's pipelines are in memory. Minimum of 128G RAM is suggested for each Impala node.
Greenplum will spill, so recently it introduces workfile limit to control the workfile size.

5. Compression Types

Type:	Snappy	Gzip/Zlib
Speed:	Faster	Slower
CPU:	Less	More
Compression Ratio:	Lower	Higher

Wednesday, July 2, 2014